Version history is one of the most powerful—and most overlooked—sources of forensic evidence in Excel investigations. Every save, every edit, every collaborator leaves a trail that platforms like OneDrive, SharePoint, and Google Drive preserve automatically. Knowing how to extract and analyze this history can reconstruct exactly what happened to a document, when, and by whom.
When a forensic investigator receives an Excel file, the current state of the document tells only part of the story. The file you see today may have been edited dozens of times, by multiple people, across weeks or months. Rows may have been added and then removed. Formulas may have been replaced with static values. Entire sheets may have been deleted. Without version history, these changes are invisible.
Version history changes this equation entirely. Modern cloud platforms and operating systems automatically capture snapshots of files at regular intervals, creating a chronological record of every significant change. For a forensic investigator, this means the ability to compare the document at different points in time, identify exactly what changed, determine who made each change, and establish whether the document was manipulated.
Version history is not stored in the Excel file itself. It is maintained by the platform that hosts the file—whether that is a cloud service, a file server, or the local operating system. Each platform has different retention policies, access methods, and levels of detail. Understanding where to look is the first step in any version-based forensic analysis.
Microsoft's cloud platforms provide the most detailed version history for Excel files, especially when co-authoring is enabled. SharePoint and OneDrive for Business can retain hundreds of versions spanning months or years.
Retention Policies
What Each Version Records
# Access SharePoint version history via REST API
GET https://{site}/_api/web/GetFileByServerRelativePath(
decodedurl='/sites/team/Documents/report.xlsx'
)/Versions
# Download a specific version
GET https://{site}/_api/web/GetFileByServerRelativePath(
decodedurl='/sites/team/Documents/report.xlsx'
)/Versions('512')/$value
Google Drive maintains version history for uploaded Excel files, while Google Sheets tracks a more granular edit history for files converted to native Google format. The two systems work differently and offer different forensic opportunities.
Google Drive (XLSX uploads)
Google Sheets (native format)
# List revisions via Google Drive API
GET https://www.googleapis.com/drive/v3/files/
{fileId}/revisions?fields=*
# Download a specific revision
GET https://www.googleapis.com/drive/v3/files/
{fileId}/revisions/{revisionId}?alt=media
Even without cloud storage, operating systems provide version history mechanisms that capture file snapshots. These are often overlooked during forensic investigations but can contain critical evidence.
Windows
macOS
# Windows: List Volume Shadow Copies
vssadmin list shadows
# Windows: Access a shadow copy
mklink /d C:\ShadowAccess \\?\GLOBALROOT\Device\
HarddiskVolumeShadowCopy1\
# macOS: List local APFS snapshots
tmutil listlocalsnapshots /
# macOS: Mount a specific snapshot
tmutil mountlocalsnapshot / {snapshot_date}
Analyzing version history requires a systematic approach. You are not simply opening old files—you are reconstructing a timeline of changes, identifying anomalies, and building an evidence chain that can withstand scrutiny. The following methodology provides a structured framework for version-based forensic analysis.
Before any analysis begins, download and preserve every available version of the file. Cloud platforms may automatically delete older versions based on retention policies, so collection is time-sensitive. Each downloaded version should be hashed (SHA-256) immediately to establish chain of custody.
# Download all versions and hash them
mkdir -p evidence/versions
# After downloading each version file:
sha256sum evidence/versions/report_v1.xlsx >> evidence/hashes.txt
sha256sum evidence/versions/report_v2.xlsx >> evidence/hashes.txt
sha256sum evidence/versions/report_v3.xlsx >> evidence/hashes.txt
# Record the collection timestamp and source
echo "Collected from SharePoint on $(date -u)" >> evidence/collection_log.txt
Important: If the investigation may lead to legal proceedings, follow your organization's evidence preservation procedures. Use write-blockers where applicable, document every step, and maintain a clear chain of custody. A forensic image of the cloud storage account may be necessary.
Create a chronological timeline of all versions, including the timestamp, the user who saved the version, the file size, and any version comments. This timeline becomes the backbone of your analysis and helps identify periods of unusual activity.
| Version | Timestamp | User | Size | Notes |
|---|---|---|---|---|
| v1 | 2025-11-10 09:14 | j.smith | 245 KB | Initial creation |
| v2 | 2025-11-10 14:32 | j.smith | 312 KB | Size increase — data added |
| v3 | 2025-12-02 23:47 | m.jones | 198 KB | Size decrease — data removed? |
| v4 | 2025-12-03 08:15 | m.jones | 201 KB | Minor changes |
Red flags in the timeline include: significant file size decreases (data deletion), edits at unusual hours, edits by unexpected users, long gaps followed by sudden activity, and multiple rapid saves in succession (possible cover-up attempts).
The core of version history forensics is comparing consecutive versions to identify exactly what changed. This involves comparing the raw XML contents of each XLSX file, not just opening them in Excel. Excel's visual interface can hide changes that are visible at the XML level.
# Extract both versions
mkdir v2_contents v3_contents
unzip report_v2.xlsx -d v2_contents/
unzip report_v3.xlsx -d v3_contents/
# Compare worksheet XML (shows cell-level changes)
diff v2_contents/xl/worksheets/sheet1.xml \
v3_contents/xl/worksheets/sheet1.xml
# Compare shared strings (reveals deleted text)
diff v2_contents/xl/sharedStrings.xml \
v3_contents/xl/sharedStrings.xml
# Compare document properties (metadata changes)
diff v2_contents/docProps/core.xml \
v3_contents/docProps/core.xml
# List all files that differ between versions
diff -rq v2_contents/ v3_contents/
Pay special attention to changes in sharedStrings.xml (text content), worksheet files (cell data and formulas), workbook.xml (sheet structure), and docProps/core.xml (metadata like author and dates).
Each version carries its own metadata—author, last modified by, creation date, modification date, and application version. Comparing metadata across versions can reveal attempts to disguise the document's history.
Metadata Red Flags
What This Indicates
Metadata inconsistencies between versions suggest the file was opened in a different application, transferred to a different machine, or deliberately tampered with. A creation date that differs between v1 and v3 is a strong indicator that someone recreated the file to alter its history.
Version history is particularly effective at detecting deliberate document manipulation. The following patterns are common in fraud, litigation, and compliance investigations.
One of the most common manipulation patterns in forensic cases involves altering financial figures. Version history can reveal when numbers were changed, what the original values were, and whether the changes follow a suspicious pattern.
Signs of Financial Manipulation
Investigation Approach
Backdating involves creating or modifying a document to make it appear as if it was created or last edited at an earlier date. Version history is devastating evidence against backdating because the platform's timestamps are controlled by the server, not the user.
How Backdating Is Exposed
Application and AppVersionfields in docProps/app.xml record which version of Excel created the file. If the file claims to be from 2023 but was created with a 2025 version of Excel, the timeline does not add upKey insight: Cloud platform timestamps are authoritative. A user can change the timestamps inside an Excel file (in docProps/core.xml), but they cannot change the timestamp recorded by SharePoint or OneDrive when the version was saved. This discrepancy between internal and external timestamps is a powerful indicator of manipulation.
A common pattern in litigation and regulatory investigations is the "cleanup" edit: a user removes sensitive data from a file shortly before it is shared or produced in discovery. Version history makes this cleanup visible.
Cleanup Patterns
Forensic Value
The previous version—before the cleanup—contains the data the user tried to hide. Comparing the pre-cleanup and post-cleanup versions reveals exactly what was removed, providing direct evidence of what the user considered sensitive enough to delete.
Beyond basic version comparison, several advanced techniques can extract deeper forensic intelligence from version history data.
In multi-user environments, version history reveals the collaboration pattern: who worked on the file, in what sequence, and how their contributions relate to each other. This is critical for establishing responsibility and identifying who had access to specific data at specific times.
Questions Collaboration Analysis Can Answer
For SharePoint and OneDrive for Business, the Microsoft 365 unified audit log provides additional detail beyond version history, including file access events (not just saves), IP addresses, device information, and session details. Use the Search-UnifiedAuditLog PowerShell cmdlet to extract these records.
Standard diff tools work on XLSX XML contents, but purpose-built XML comparison reveals structural changes that line-based diff may miss. This is especially important for detecting subtle formula changes or style manipulations.
# Pretty-print XML for better comparison
xmllint --format v2_contents/xl/worksheets/sheet1.xml \
> v2_sheet1_formatted.xml
xmllint --format v3_contents/xl/worksheets/sheet1.xml \
> v3_sheet1_formatted.xml
# Structural XML diff
diff --color v2_sheet1_formatted.xml v3_sheet1_formatted.xml
# Count cells in each version
grep -c "<c r=" v2_contents/xl/worksheets/sheet1.xml
grep -c "<c r=" v3_contents/xl/worksheets/sheet1.xml
# Extract all formulas from each version
grep -oP "<f>.*?</f>" v2_contents/xl/worksheets/sheet1.xml \
> v2_formulas.txt
grep -oP "<f>.*?</f>" v3_contents/xl/worksheets/sheet1.xml \
> v3_formulas.txt
diff v2_formulas.txt v3_formulas.txt
This technique is especially useful for detecting formula substitution—where a calculation like =SUM(B2:B50) is replaced with a hardcoded value—because the formula element disappears entirely from the cell XML.
Version history becomes significantly more powerful when cross-referenced with platform audit logs, email records, and access logs. This creates a comprehensive event timeline that connects document changes to user actions and communications.
Useful Audit Log Sources
Correlation Questions
Version history evidence can be compelling in legal proceedings, but it must be collected and presented properly to be admissible. Understanding the legal framework around digital evidence is critical for forensic investigators.
Chain of Custody
Authentication
In litigation contexts, parties have a duty to preserve relevant evidence, including version history. Failure to preserve version history when litigation is anticipated can result in spoliation sanctions.
Bringing everything together, here is a practical end-to-end workflow for a version-history-based forensic investigation of an Excel file.
Identify all storage locations
Determine where the file is stored: cloud platforms, local machines, file servers, email attachments, backup systems. Each location may have independent version history.
Preserve version history before it expires
Download all available versions from every source. Hash each file. Document the collection process. Adjust retention policies if versions may expire soon.
Collect platform audit logs
Export audit logs from SharePoint, OneDrive, Google Workspace, or other platforms. These logs provide access records that complement version history.
Build the version timeline
Create a chronological record of all versions with timestamps, users, file sizes, and any available comments. Flag anomalies such as size changes, unusual timing, or unexpected users.
Extract and compare XML contents
Unzip each XLSX version and perform structural XML comparisons. Focus on worksheet data, shared strings, styles, formulas, and document properties.
Document findings with evidence
For each finding, record the specific versions compared, the exact changes identified, the user and timestamp, and the forensic significance. Include screenshots and XML excerpts.
Cross-reference with external evidence
Correlate version changes with emails, calendar events, audit logs, and business events to establish context and motive.
Version history is a powerful forensic tool, but it has important limitations that investigators must understand to avoid overreaching conclusions.
Retention policies delete history
Version history is not permanent. Cloud platforms automatically delete older versions based on retention policies. If you do not collect versions before they expire, they are gone. The absence of older versions does not necessarily indicate tampering.
Local edits may not create versions
If a user downloads a file, edits it locally, and re-uploads it, the cloud platform records this as a new version but does not capture the intermediate edits. The gap between downloading and re-uploading is a blind spot.
Not all saves create versions
Some platforms batch rapid saves into a single version. If a user makes multiple changes in quick succession, they may all appear as one version, obscuring the sequence of individual changes.
Admin users can delete versions
SharePoint and OneDrive administrators can delete specific versions or entire version histories. If an admin is involved in the investigation subject, version history alone may not be reliable. Cross-reference with backup systems and audit logs.
Version history does not prove intent
Showing that data was deleted between versions proves the deletion occurred, but it does not prove intent. The deletion may have been accidental, routine cleanup, or legitimate editing. Additional evidence is needed to establish motive.
Excel version history transforms forensic investigations from a study of static documents into a reconstruction of dynamic events. Where a single file shows you the final state, version history shows you the journey: who changed what, when they changed it, and what the document looked like before and after each edit.
The key principles are straightforward. Collect all versions as early as possible before retention policies delete them. Build a timeline that maps versions to users and timestamps. Compare versions at the XML level to catch changes that are invisible in the Excel interface. Cross-reference with platform audit logs to establish the full context of each change.
Whether you are investigating potential fraud, responding to a regulatory inquiry, resolving an internal dispute, or conducting due diligence, version history analysis provides a level of documentary evidence that the current file alone cannot match. The trail is already there—recorded automatically by the platforms we use every day. The forensic skill lies in knowing where to find it, how to preserve it, and how to interpret it.
Use our metadata analyzer to examine document properties, hidden data, orphaned strings, and forensic artifacts that version history analysis can complement