Back to Blog
Forensics

Using Excel Version History for Digital Forensics

Version history is one of the most powerful—and most overlooked—sources of forensic evidence in Excel investigations. Every save, every edit, every collaborator leaves a trail that platforms like OneDrive, SharePoint, and Google Drive preserve automatically. Knowing how to extract and analyze this history can reconstruct exactly what happened to a document, when, and by whom.

By Forensics TeamFebruary 17, 202621 min read

Why Version History Matters in Forensic Investigations

When a forensic investigator receives an Excel file, the current state of the document tells only part of the story. The file you see today may have been edited dozens of times, by multiple people, across weeks or months. Rows may have been added and then removed. Formulas may have been replaced with static values. Entire sheets may have been deleted. Without version history, these changes are invisible.

Version history changes this equation entirely. Modern cloud platforms and operating systems automatically capture snapshots of files at regular intervals, creating a chronological record of every significant change. For a forensic investigator, this means the ability to compare the document at different points in time, identify exactly what changed, determine who made each change, and establish whether the document was manipulated.

What Version History Can Reveal

  • Data manipulation: Rows or columns that were added, modified, or deleted between versions
  • Formula tampering: Calculations that were changed to produce different results
  • Backdating attempts: Files edited long after the claimed creation date
  • Unauthorized access: Edits made by users who should not have had access
  • Evidence destruction: Deliberate deletion of incriminating data before a file was shared
  • Collaboration patterns: Who worked on the file, in what order, and for how long
  • Metadata evolution: Changes to document properties, author names, and titles over time

Where Version History Lives

Version history is not stored in the Excel file itself. It is maintained by the platform that hosts the file—whether that is a cloud service, a file server, or the local operating system. Each platform has different retention policies, access methods, and levels of detail. Understanding where to look is the first step in any version-based forensic analysis.

Microsoft OneDrive and SharePoint

Microsoft's cloud platforms provide the most detailed version history for Excel files, especially when co-authoring is enabled. SharePoint and OneDrive for Business can retain hundreds of versions spanning months or years.

Retention Policies

  • • OneDrive Personal: 30 days or 25 versions
  • • OneDrive for Business: 500 versions by default
  • • SharePoint Online: up to 50,000 major versions
  • • SharePoint can also retain minor (draft) versions
  • • Admins can configure custom retention policies

What Each Version Records

  • • Complete file snapshot at time of save
  • • User who saved the version
  • • Exact timestamp of the save
  • • File size at that point in time
  • • Optional version comment (if the user added one)

# Access SharePoint version history via REST API

GET https://{site}/_api/web/GetFileByServerRelativePath(

  decodedurl='/sites/team/Documents/report.xlsx'

)/Versions

 

# Download a specific version

GET https://{site}/_api/web/GetFileByServerRelativePath(

  decodedurl='/sites/team/Documents/report.xlsx'

)/Versions('512')/$value

Google Drive and Google Sheets

Google Drive maintains version history for uploaded Excel files, while Google Sheets tracks a more granular edit history for files converted to native Google format. The two systems work differently and offer different forensic opportunities.

Google Drive (XLSX uploads)

  • • Retains 100 versions or 30 days (whichever comes first)
  • • Workspace accounts may have extended retention
  • • Each version is a complete file download
  • • "Keep forever" option prevents auto-deletion

Google Sheets (native format)

  • • Cell-level edit history with user attribution
  • • Named versions created manually by users
  • • Revision history grouped by editing session
  • • Workspace audit logs capture access events

# List revisions via Google Drive API

GET https://www.googleapis.com/drive/v3/files/

  {fileId}/revisions?fields=*

 

# Download a specific revision

GET https://www.googleapis.com/drive/v3/files/

  {fileId}/revisions/{revisionId}?alt=media

Local Operating System Version History

Even without cloud storage, operating systems provide version history mechanisms that capture file snapshots. These are often overlooked during forensic investigations but can contain critical evidence.

Windows

  • Volume Shadow Copy Service (VSS): Creates automatic snapshots during system restore points and backups
  • File History: Hourly backups to an external drive (if enabled)
  • Previous Versions tab: Access via right-click > Properties
  • • Shadow copies can be accessed forensically even if the UI is disabled

macOS

  • Time Machine: Hourly backups for 24 hours, daily for a month, weekly beyond that
  • APFS snapshots: Automatic local snapshots even without Time Machine
  • Versions: macOS native versioning (File > Revert To > Browse All Versions)
  • • Local snapshots persist until disk space is needed

# Windows: List Volume Shadow Copies

vssadmin list shadows

 

# Windows: Access a shadow copy

mklink /d C:\ShadowAccess \\?\GLOBALROOT\Device\

  HarddiskVolumeShadowCopy1\

 

# macOS: List local APFS snapshots

tmutil listlocalsnapshots /

 

# macOS: Mount a specific snapshot

tmutil mountlocalsnapshot / {snapshot_date}

Forensic Analysis Methodology

Analyzing version history requires a systematic approach. You are not simply opening old files—you are reconstructing a timeline of changes, identifying anomalies, and building an evidence chain that can withstand scrutiny. The following methodology provides a structured framework for version-based forensic analysis.

1

Preserve and Collect All Versions

Before any analysis begins, download and preserve every available version of the file. Cloud platforms may automatically delete older versions based on retention policies, so collection is time-sensitive. Each downloaded version should be hashed (SHA-256) immediately to establish chain of custody.

# Download all versions and hash them

mkdir -p evidence/versions

 

# After downloading each version file:

sha256sum evidence/versions/report_v1.xlsx >> evidence/hashes.txt

sha256sum evidence/versions/report_v2.xlsx >> evidence/hashes.txt

sha256sum evidence/versions/report_v3.xlsx >> evidence/hashes.txt

 

# Record the collection timestamp and source

echo "Collected from SharePoint on $(date -u)" >> evidence/collection_log.txt

Important: If the investigation may lead to legal proceedings, follow your organization's evidence preservation procedures. Use write-blockers where applicable, document every step, and maintain a clear chain of custody. A forensic image of the cloud storage account may be necessary.

2

Build a Version Timeline

Create a chronological timeline of all versions, including the timestamp, the user who saved the version, the file size, and any version comments. This timeline becomes the backbone of your analysis and helps identify periods of unusual activity.

VersionTimestampUserSizeNotes
v12025-11-10 09:14j.smith245 KBInitial creation
v22025-11-10 14:32j.smith312 KBSize increase — data added
v32025-12-02 23:47m.jones198 KBSize decrease — data removed?
v42025-12-03 08:15m.jones201 KBMinor changes

Red flags in the timeline include: significant file size decreases (data deletion), edits at unusual hours, edits by unexpected users, long gaps followed by sudden activity, and multiple rapid saves in succession (possible cover-up attempts).

3

Perform Version-to-Version Comparison

The core of version history forensics is comparing consecutive versions to identify exactly what changed. This involves comparing the raw XML contents of each XLSX file, not just opening them in Excel. Excel's visual interface can hide changes that are visible at the XML level.

# Extract both versions

mkdir v2_contents v3_contents

unzip report_v2.xlsx -d v2_contents/

unzip report_v3.xlsx -d v3_contents/

 

# Compare worksheet XML (shows cell-level changes)

diff v2_contents/xl/worksheets/sheet1.xml \

     v3_contents/xl/worksheets/sheet1.xml

 

# Compare shared strings (reveals deleted text)

diff v2_contents/xl/sharedStrings.xml \

     v3_contents/xl/sharedStrings.xml

 

# Compare document properties (metadata changes)

diff v2_contents/docProps/core.xml \

     v3_contents/docProps/core.xml

 

# List all files that differ between versions

diff -rq v2_contents/ v3_contents/

Pay special attention to changes in sharedStrings.xml (text content), worksheet files (cell data and formulas), workbook.xml (sheet structure), and docProps/core.xml (metadata like author and dates).

4

Analyze Metadata Across Versions

Each version carries its own metadata—author, last modified by, creation date, modification date, and application version. Comparing metadata across versions can reveal attempts to disguise the document's history.

Metadata Red Flags

  • • Creator name changing between versions
  • • Creation date being modified (backdating)
  • • Application version changing (opened on different machine)
  • • "Last modified by" not matching the version save user
  • • Total editing time decreasing between versions

What This Indicates

Metadata inconsistencies between versions suggest the file was opened in a different application, transferred to a different machine, or deliberately tampered with. A creation date that differs between v1 and v3 is a strong indicator that someone recreated the file to alter its history.

Detecting Specific Types of Document Manipulation

Version history is particularly effective at detecting deliberate document manipulation. The following patterns are common in fraud, litigation, and compliance investigations.

Financial Data Manipulation

One of the most common manipulation patterns in forensic cases involves altering financial figures. Version history can reveal when numbers were changed, what the original values were, and whether the changes follow a suspicious pattern.

Signs of Financial Manipulation

  • • Revenue figures that increase between versions without explanation
  • • Expense line items that disappear between versions
  • • Formulas replaced with hardcoded values in later versions
  • • Rounding adjustments that consistently move totals in one direction
  • • Summary totals that change without corresponding detail changes

Investigation Approach

  • • Extract all numeric cell values from each version
  • • Build a comparison matrix showing value changes
  • • Flag cells where formulas were replaced with constants
  • • Correlate changes with the user and timestamp
  • • Check if changes align with financial reporting deadlines

Backdating and Timeline Manipulation

Backdating involves creating or modifying a document to make it appear as if it was created or last edited at an earlier date. Version history is devastating evidence against backdating because the platform's timestamps are controlled by the server, not the user.

How Backdating Is Exposed

  • Server timestamps vs. file metadata: If the file's internal creation date says January 15 but the first version in SharePoint was uploaded on March 3, the file was likely created offline and uploaded later
  • Version gaps: A file that was supposedly in active use from January through March but only has versions from March 1–3 suggests it was created all at once and made to look historical
  • Application metadata: The Application and AppVersionfields in docProps/app.xml record which version of Excel created the file. If the file claims to be from 2023 but was created with a 2025 version of Excel, the timeline does not add up
  • Content anachronisms: References to events, prices, or data that did not exist at the claimed creation date

Key insight: Cloud platform timestamps are authoritative. A user can change the timestamps inside an Excel file (in docProps/core.xml), but they cannot change the timestamp recorded by SharePoint or OneDrive when the version was saved. This discrepancy between internal and external timestamps is a powerful indicator of manipulation.

Evidence Destruction Before Sharing

A common pattern in litigation and regulatory investigations is the "cleanup" edit: a user removes sensitive data from a file shortly before it is shared or produced in discovery. Version history makes this cleanup visible.

Cleanup Patterns

  • • Large file size decrease in the version immediately before sharing
  • • Entire sheets deleted in the final version
  • • Comments and notes removed in a single edit
  • • Document Inspector run just before the file was shared
  • • Metadata scrubbed (author changed, properties cleared)

Forensic Value

The previous version—before the cleanup—contains the data the user tried to hide. Comparing the pre-cleanup and post-cleanup versions reveals exactly what was removed, providing direct evidence of what the user considered sensitive enough to delete.

Advanced Forensic Techniques

Beyond basic version comparison, several advanced techniques can extract deeper forensic intelligence from version history data.

Collaboration Pattern Analysis

In multi-user environments, version history reveals the collaboration pattern: who worked on the file, in what sequence, and how their contributions relate to each other. This is critical for establishing responsibility and identifying who had access to specific data at specific times.

Questions Collaboration Analysis Can Answer

  • • Who had access to the file before the data was deleted?
  • • Did User A edit the file after User B added the sensitive data?
  • • Is there a pattern of one user "fixing" another user's entries?
  • • Were the edits made during normal business hours or outside them?
  • • Did anyone access the file from an unusual location or device?

For SharePoint and OneDrive for Business, the Microsoft 365 unified audit log provides additional detail beyond version history, including file access events (not just saves), IP addresses, device information, and session details. Use the Search-UnifiedAuditLog PowerShell cmdlet to extract these records.

Differential XML Analysis

Standard diff tools work on XLSX XML contents, but purpose-built XML comparison reveals structural changes that line-based diff may miss. This is especially important for detecting subtle formula changes or style manipulations.

# Pretty-print XML for better comparison

xmllint --format v2_contents/xl/worksheets/sheet1.xml \

  > v2_sheet1_formatted.xml

xmllint --format v3_contents/xl/worksheets/sheet1.xml \

  > v3_sheet1_formatted.xml

 

# Structural XML diff

diff --color v2_sheet1_formatted.xml v3_sheet1_formatted.xml

 

# Count cells in each version

grep -c "<c r=" v2_contents/xl/worksheets/sheet1.xml

grep -c "<c r=" v3_contents/xl/worksheets/sheet1.xml

 

# Extract all formulas from each version

grep -oP "<f>.*?</f>" v2_contents/xl/worksheets/sheet1.xml \

  > v2_formulas.txt

grep -oP "<f>.*?</f>" v3_contents/xl/worksheets/sheet1.xml \

  > v3_formulas.txt

diff v2_formulas.txt v3_formulas.txt

This technique is especially useful for detecting formula substitution—where a calculation like =SUM(B2:B50) is replaced with a hardcoded value—because the formula element disappears entirely from the cell XML.

Cross-Referencing with Audit Logs

Version history becomes significantly more powerful when cross-referenced with platform audit logs, email records, and access logs. This creates a comprehensive event timeline that connects document changes to user actions and communications.

Useful Audit Log Sources

  • • Microsoft 365 Unified Audit Log
  • • Google Workspace Admin audit logs
  • • SharePoint access logs
  • • VPN and network access logs
  • • Email server logs (file sharing events)
  • • DLP (Data Loss Prevention) alerts

Correlation Questions

  • • Was the file accessed by anyone just before the edit?
  • • Did the editor send any emails around the same time?
  • • Were there failed access attempts before the edit?
  • • Did the user download the file before making changes?
  • • Was the file shared with new users after the edit?

Legal and Evidentiary Considerations

Version history evidence can be compelling in legal proceedings, but it must be collected and presented properly to be admissible. Understanding the legal framework around digital evidence is critical for forensic investigators.

Admissibility Requirements

Chain of Custody

  • • Document who collected the versions and when
  • • Hash all files immediately upon collection
  • • Store evidence on write-protected media
  • • Maintain a detailed evidence log
  • • Use forensically sound collection tools

Authentication

  • • Platform server logs confirm version authenticity
  • • User authentication records verify who made changes
  • • Server-side timestamps cannot be manipulated by users
  • • API-collected metadata provides additional verification
  • • Expert testimony may be needed to explain the analysis

Preservation Obligations

In litigation contexts, parties have a duty to preserve relevant evidence, including version history. Failure to preserve version history when litigation is anticipated can result in spoliation sanctions.

  • Litigation hold: When litigation is anticipated, issue a hold that covers version history, not just current files. Cloud platform retention policies must be adjusted to prevent automatic deletion.
  • Cloud retention: Default retention policies may delete older versions before they can be collected. Work with IT to extend retention immediately.
  • Backup systems: Ensure backup systems are capturing version history, not just the latest version of each file.
  • Employee departures: When an employee under investigation leaves, their OneDrive and cloud storage must be preserved before the account is deleted.

Practical Investigation Workflow

Bringing everything together, here is a practical end-to-end workflow for a version-history-based forensic investigation of an Excel file.

Step-by-Step Investigation Checklist

Identify all storage locations

Determine where the file is stored: cloud platforms, local machines, file servers, email attachments, backup systems. Each location may have independent version history.

Preserve version history before it expires

Download all available versions from every source. Hash each file. Document the collection process. Adjust retention policies if versions may expire soon.

Collect platform audit logs

Export audit logs from SharePoint, OneDrive, Google Workspace, or other platforms. These logs provide access records that complement version history.

Build the version timeline

Create a chronological record of all versions with timestamps, users, file sizes, and any available comments. Flag anomalies such as size changes, unusual timing, or unexpected users.

Extract and compare XML contents

Unzip each XLSX version and perform structural XML comparisons. Focus on worksheet data, shared strings, styles, formulas, and document properties.

Document findings with evidence

For each finding, record the specific versions compared, the exact changes identified, the user and timestamp, and the forensic significance. Include screenshots and XML excerpts.

Cross-reference with external evidence

Correlate version changes with emails, calendar events, audit logs, and business events to establish context and motive.

Limitations and Caveats

Version history is a powerful forensic tool, but it has important limitations that investigators must understand to avoid overreaching conclusions.

Important Limitations

Retention policies delete history

Version history is not permanent. Cloud platforms automatically delete older versions based on retention policies. If you do not collect versions before they expire, they are gone. The absence of older versions does not necessarily indicate tampering.

Local edits may not create versions

If a user downloads a file, edits it locally, and re-uploads it, the cloud platform records this as a new version but does not capture the intermediate edits. The gap between downloading and re-uploading is a blind spot.

Not all saves create versions

Some platforms batch rapid saves into a single version. If a user makes multiple changes in quick succession, they may all appear as one version, obscuring the sequence of individual changes.

Admin users can delete versions

SharePoint and OneDrive administrators can delete specific versions or entire version histories. If an admin is involved in the investigation subject, version history alone may not be reliable. Cross-reference with backup systems and audit logs.

Version history does not prove intent

Showing that data was deleted between versions proves the deletion occurred, but it does not prove intent. The deletion may have been accidental, routine cleanup, or legitimate editing. Additional evidence is needed to establish motive.

Conclusion

Excel version history transforms forensic investigations from a study of static documents into a reconstruction of dynamic events. Where a single file shows you the final state, version history shows you the journey: who changed what, when they changed it, and what the document looked like before and after each edit.

The key principles are straightforward. Collect all versions as early as possible before retention policies delete them. Build a timeline that maps versions to users and timestamps. Compare versions at the XML level to catch changes that are invisible in the Excel interface. Cross-reference with platform audit logs to establish the full context of each change.

Whether you are investigating potential fraud, responding to a regulatory inquiry, resolving an internal dispute, or conducting due diligence, version history analysis provides a level of documentary evidence that the current file alone cannot match. The trail is already there—recorded automatically by the platforms we use every day. The forensic skill lies in knowing where to find it, how to preserve it, and how to interpret it.

Analyze Your Excel Files for Forensic Evidence

Use our metadata analyzer to examine document properties, hidden data, orphaned strings, and forensic artifacts that version history analysis can complement