Fraudulent Excel documents are used in insurance claims, financial reporting, contract disputes, regulatory filings, and legal proceedings every day. Whether a spreadsheet has been fabricated from scratch, backdated to appear older than it is, or selectively altered to change key figures, the metadata layer almost always preserves evidence of the deception. This guide teaches you how to find it.
Every Excel file carries an invisible layer of information that its creator rarely thinks about. Document properties, timestamps, author records, editing history, application version strings, and internal XML structures all record the true story of how a file was created and modified—regardless of what the visible cell data claims.
Fraudsters focus on the content: the numbers, the dates displayed in cells, the formulas that produce the desired totals. They almost never clean every metadata artifact. The result is a gap between what the document claims to be and what its metadata reveals it actually is. That gap is where forensic investigators operate.
Timestamps are the most frequently exploited—and most frequently revealing—metadata in fraudulent documents. Excel files carry multiple independent timestamps that are difficult to manipulate consistently. When these timestamps contradict each other or contradict the claimed history of the document, fraud is likely.
Every Excel file has at least four independent sources of timestamp data. A legitimate document shows consistency across all four. A fraudulent document almost always has discrepancies.
1. File System Timestamps
2. OPC Core Properties
dcterms:created — Document creation timedcterms:modified — Last modification timedocProps/core.xml3. Extended Properties
TotalTime — Total editing time in minutesApplication — Excel version usedAppVersion — Specific build numberdocProps/app.xml4. Internal XML Artifacts
# Extract timestamps from an XLSX file
mkdir xlsx_extract && unzip suspect.xlsx -d xlsx_extract/
# View core properties (created/modified dates, author)
cat xlsx_extract/docProps/core.xml
# View extended properties (editing time, app version)
cat xlsx_extract/docProps/app.xml
# Compare with file system timestamps
stat suspect.xlsx
These timestamp anomalies are strong indicators of document fraud. Any single red flag warrants deeper investigation; multiple red flags together constitute compelling evidence.
Creation Date After Modification Date
If dcterms:created is later than dcterms:modified, someone has manually edited the creation timestamp to make the document appear older. Excel never naturally produces this condition.
TotalTime Inconsistent with Date Span
A document claiming to have been created six months ago but showing a TotalTime of 5 minutes has almost certainly been fabricated recently. A document with six months of legitimate use would show hours of editing time.
Application Version Did Not Exist at Claimed Date
If the document claims to have been created in 2021 but the AppVersion field references Excel 2024 (version 16.0090), the file cannot have been created at the claimed date. The application version is written automatically and is rarely manipulated.
Timezone and Working Hours Anomalies
A document purportedly created by a London office but with creation timestamps at 3:00 AM GMT, or a document from a 9-to-5 organization with editing at 2:00 AM on a weekend, suggests the document was not created in the normal course of business.
File System Date Precedes Document Date
If the file system creation date is significantly earlier than the document's internal dcterms:created timestamp, the internal timestamp may have been modified. While file copying can cause the reverse, this specific pattern is suspicious.
Fraudulent documents often need to appear to come from a specific person or organization. Metadata analysis can verify or disprove claimed authorship by examining multiple independent author indicators that are difficult to fake consistently.
Excel records authorship in multiple locations. A legitimate document has consistent authorship data. A fabricated document frequently has mismatches that reveal the true creator.
| Metadata Field | Location | What It Reveals |
|---|---|---|
| dc:creator | docProps/core.xml | Original author's name or username |
| cp:lastModifiedBy | docProps/core.xml | Last person who edited the file |
| Manager | docProps/app.xml | Manager field (if populated) |
| Company | docProps/app.xml | Organization tied to the license |
| Printer paths | xl/printerSettings/ | Network printer names revealing office location |
| Comments/annotations | xl/comments*.xml | Comment author names from earlier edits |
| Custom properties | docProps/custom.xml | Organization-specific metadata tags |
# Extract all author-related metadata
unzip -o suspect.xlsx -d suspect_contents/
# Check core properties for author fields
grep -i "creator\|lastModifiedBy" suspect_contents/docProps/core.xml
# Check company and manager fields
grep -i "company\|manager" suspect_contents/docProps/app.xml
# Check comment authors
grep -i "author" suspect_contents/xl/comments*.xml 2>/dev/null
# Check printer settings for network paths
strings suspect_contents/xl/printerSettings/*.bin 2>/dev/null
These patterns frequently appear in documents where authorship has been misrepresented. Each pattern tells a different story about how the fraud was constructed.
Creator and LastModifiedBy Are the Same
In a document that has purportedly been reviewed and approved by multiple people over months, finding that dc:creator and cp:lastModifiedBy are the same person—especially the person submitting the document—suggests it was created in a single session by one person.
Company Field Mismatch
A document claiming to originate from "Acme Corp" but with a Company field showing "Smith Consulting" was created on a machine licensed to a different organization. The Company field is set by the Office installation and not typically modified by users.
Printer Settings from Wrong Location
Binary printer settings embedded in the file can contain network printer names like \\OFFICE-NYC-3F\HP-LaserJet. If the document was purportedly created at a different location, the printer metadata contradicts the claimed origin.
Comment Authors from the Wrong Team
If a file purportedly from the finance department contains comments authored by people known to work in a completely different department or external organization, the file's provenance is suspect. Comment author names are set by Office and not easily faked without XML editing.
Beyond timestamps and author fields, the internal structure of an XLSX file contains subtle artifacts that are extremely difficult to fake. These structural indicators can reveal whether a file was genuinely created incrementally over time or manufactured in a single session to appear authentic.
Excel stores all text cell values in a shared string table (xl/sharedStrings.xml). The order of strings in this table reflects the order in which text was entered into the workbook. This ordering is a powerful forensic artifact.
Legitimate File Pattern
In a file built incrementally over time, the shared string table reflects the natural order of data entry. Headers appear early, data added later appears later in the table. Strings from different editing sessions are interspersed naturally.
Fabricated File Pattern
A file created in one session by pasting data has a string table that mirrors the cell layout exactly—left to right, top to bottom. There is no interspersion. The string order is too perfect, suggesting a single-pass creation.
# Extract and examine the shared string table
cat xlsx_extract/xl/sharedStrings.xml | \
grep -oP '<t[^>]*>[^<]+</t>' | head -30
# Count total unique strings
grep -c '<si>' xlsx_extract/xl/sharedStrings.xml
# Look for the uniqueCount vs count attributes
# count = total references, uniqueCount = unique strings
# A very low ratio suggests simple data without reuse
head -1 xlsx_extract/xl/sharedStrings.xml
The styles file (xl/styles.xml) contains every cell format, number format, font, fill, and border used in the workbook. The structure of this file reveals how the document was built.
Format Proliferation
A legitimate document that evolved over time accumulates styles incrementally. It typically has unused styles from earlier iterations, duplicate-but-slightly-different formats from different editors, and a large cellXfs (cell format) count relative to the actual variety of visible formatting. A fabricated document has minimal, clean styles with no orphaned formats.
Number Format Clues
Custom number formats in numFmts can reveal locale information. A document claiming to be from the US that uses European-style number formats (commas for decimals, periods for thousands) was likely created on a machine with European locale settings.
Font Availability
The fonts referenced in styles.xml reveal the operating system and software available to the creator. A document using Calibri (Windows default) claiming to be created on a Mac (which defaults to different fonts) is inconsistent. Similarly, fonts specific to certain Office versions can contradict the claimed creation date.
The calculation chain (xl/calcChain.xml) records the order in which Excel recalculates formulas. This ordering is determined by cell dependencies and is rebuilt when the workbook structure changes. It provides a subtle but reliable indicator of document authenticity.
What to Look For
# Check if calculation chain exists
ls -la xlsx_extract/xl/calcChain.xml 2>/dev/null
# Count formula references in calc chain
grep -c '<c ' xlsx_extract/xl/calcChain.xml 2>/dev/null
# Compare with actual formulas in sheets
grep -c '<f>\|<f ' xlsx_extract/xl/worksheets/sheet1.xml
The Excel version that created a document is recorded in the file and is one of the most reliable fraud indicators. Fraudsters can change the dates in a document, but they rarely think to change the application version—and even when they do, the internal XML schemas and feature usage betray the true version.
Cross-referencing the AppVersion value with the claimed creation date can immediately expose backdating. If the Excel version did not exist when the document was supposedly created, the document is fraudulent.
| AppVersion Value | Excel Version | Release Date |
|---|---|---|
| 12.0000 | Excel 2007 | January 2007 |
| 14.0300 | Excel 2010 | June 2010 |
| 15.0300 | Excel 2013 | January 2013 |
| 16.0300 | Excel 2016 | September 2015 |
| 16.0300 | Excel 2019 | September 2018 |
| 16.0### | Microsoft 365 | Varies by update channel |
Important: Excel 2016, 2019, 2021, and Microsoft 365 all use major version 16.0. The minor version number (the digits after 16.0) can help distinguish between them, as Microsoft 365 is continuously updated and has higher minor version numbers than the perpetual-license releases.
# Extract application version
grep -i "AppVersion\|Application" xlsx_extract/docProps/app.xml
# Example output showing Excel version:
# <Application>Microsoft Excel</Application>
# <AppVersion>16.0300</AppVersion>
# Check XML namespace versions for additional clues
head -5 xlsx_extract/xl/workbook.xml
# Newer Excel versions use updated namespace URIs
The editing history of a document tells a story about how it was used. A legitimate business document that has been actively used for months shows evidence of repeated editing. A fabricated document that was created in one session lacks this depth of history—and the absence itself is evidence.
Signs of Genuine Editing History
TotalTime consistent with the file's agelastModifiedBy users over timeSigns of Fabrication
TotalTime of zero or near-zero minutesdc:creator and cp:lastModifiedBy identical# Check total editing time
grep -i "TotalTime" xlsx_extract/docProps/app.xml
# Check for hidden sheets
grep -i 'state="hidden\|state="veryHidden' \
xlsx_extract/xl/workbook.xml
# Check for named ranges
grep -i '<definedName' xlsx_extract/xl/workbook.xml
# Count worksheets
grep -c '<sheet ' xlsx_extract/xl/workbook.xml
# Check for printer settings
ls xlsx_extract/xl/printerSettings/ 2>/dev/null
Paradoxically, one of the strongest indicators of a fabricated document is that it is too clean. Real business documents are messy. They accumulate artifacts from multiple users, multiple editing sessions, format changes, and iterative development. A document that lacks all of these artifacts despite claiming a long history is suspicious.
The Cleanliness Checklist
Backdating—creating a document now but claiming it existed at an earlier date—is one of the most common forms of document fraud. It appears in insurance claims, regulatory filings, financial audits, and contract disputes. Metadata analysis provides multiple independent methods to detect it.
1. Application Version vs. Claimed Date
The most definitive test. If the AppVersion value identifies an Excel version that was released after the claimed creation date, the document cannot have been created when it claims. This artifact is rarely manipulated because most fraudsters are unaware it exists.
# A document claiming creation in March 2012
# but showing AppVersion 15.0300 (Excel 2013)
# could not have been created at the claimed date
grep "AppVersion" xlsx_extract/docProps/app.xml
2. XML Schema Namespace Analysis
Different Excel versions use different XML namespace URIs. A document using namespace URIs introduced in Excel 2019 cannot have been created in 2016. Namespace URIs are deeply embedded in the XML structure and are almost never manipulated by fraudsters.
# Check namespace declarations
head -3 xlsx_extract/xl/workbook.xml
head -3 xlsx_extract/[Content_Types].xml
# Look for newer content types or relationships
cat xlsx_extract/[Content_Types].xml
3. Feature Usage Analysis
Excel features like dynamic arrays (XLOOKUP, FILTER, UNIQUE), new chart types (funnel, map, treemap), or Power Query connections were introduced in specific versions. A document claiming to be from 2015 that uses XLOOKUP (introduced in 2020) is definitively backdated.
4. Theme and Template Dating
Excel themes change between versions. The default color palette, font selections, and theme names in xl/theme/theme1.xml can reveal which version of Excel created the document, independent of the claimed date.
# Check the theme file for version-specific defaults
grep -i "name=\|majorFont\|minorFont" \
xlsx_extract/xl/theme/theme1.xml | head -10
5. ZIP Archive Metadata
XLSX files are ZIP archives, and each file entry in the ZIP has its own modification timestamp. These timestamps are set when Excel writes the file and are independent of the document properties. Most fraudsters who editcore.xml forget to modify the ZIP entry timestamps.
# List ZIP entries with their timestamps
unzip -l suspect.xlsx
# Compare ZIP entry dates with claimed creation date
# All entries should post-date the creation date
Sometimes the fraud is not in creating a new document but in modifying an existing one—changing a few key values while leaving the rest intact to maintain an appearance of authenticity. Detecting selective alterations requires comparing what the metadata says about the editing pattern with the document's content.
Formula vs. Hardcoded Value Inconsistency
In a legitimate financial spreadsheet, totals are calculated by formulas. If specific total cells contain hardcoded values while surrounding cells use formulas, the totals may have been manually overwritten. Check whether sum cells actually contain =SUM() formulas or just static numbers.
Formatting Inconsistencies in Modified Cells
When individual cells are edited, they sometimes acquire different formatting from surrounding cells—a different number format, precision, or style index. These formatting anomalies can pinpoint exactly which cells were modified after the original document was created.
Shared String Table Position
If a cell value was changed, the new text is appended to the end of the shared string table while the old value may still exist as an orphan. Finding the replacement value at the end of the string table—far from where similar data appears—suggests a later modification.
# Check for cells with hardcoded values where formulas are expected
# In sheet XML, <v> without <f> means hardcoded
# Look for patterns where most cells in a column have formulas
# but specific cells have only values
# Extract all cell entries from a worksheet
grep -oP '<c r="[^"]+"[^>]*>.*?</c>' \
xlsx_extract/xl/worksheets/sheet1.xml | head -20
# Find cells with values but no formulas in a formula column
# Compare the style index (s attribute) of modified vs neighboring cells
The strongest fraud detection combines internal metadata analysis with external corroboration. When a document's metadata contradicts external records, the case becomes significantly stronger.
External Corroboration Sources
Hash Comparison
If a copy of the original document exists in backups, email attachments, or cloud storage, comparing file hashes immediately reveals whether the current version has been modified. Even a single changed byte produces a completely different hash, making this the most definitive test for alteration.
Preserve and hash the original file
Create a forensic copy immediately. Hash both the original and your working copy (MD5 and SHA-256). Document the chain of custody from the moment you receive the file.
Extract and examine document properties
Unzip the XLSX and review docProps/core.xml and docProps/app.xml. Record the creator, last modified by, creation date, modification date, total editing time, and application version.
Verify timestamp consistency
Compare the four timestamp layers: file system, OPC core properties, ZIP entry timestamps, and any internal date references. Flag any discrepancies between creation date, modification date, and total editing time.
Validate application version against claimed date
Check that the AppVersion value corresponds to an Excel version that existed at the claimed creation date. Examine XML namespaces and feature usage for additional version indicators.
Analyze author and origin metadata
Verify that the creator, company, printer paths, and comment authors are consistent with the claimed origin. Cross-reference with known organizational data.
Examine structural artifacts
Review the shared string table ordering, style complexity, calculation chain, named ranges, and hidden sheets. Assess whether the structural complexity matches the claimed document history.
Check for selective alterations
Look for hardcoded values where formulas are expected, formatting inconsistencies in individual cells, and string table positions that suggest late modifications.
Cross-reference with external records
Compare the file against backups, email attachments, cloud storage versions, print logs, and any other external records that can corroborate or contradict the document's claimed history.
Document findings with evidence preservation
Compile all findings into a structured report. Include exact metadata values, screenshots, hash values, and the specific inconsistencies found. Ensure all evidence is preserved in a manner suitable for legal proceedings.
Understanding how these techniques work in practice helps investigators know what to look for. These scenarios illustrate common fraud patterns and the metadata artifacts that expose them.
A claimant submits an inventory spreadsheet supposedly created before a loss event, listing valuable items for an insurance claim. The spreadsheet's creation date in core.xml shows a date three months before the loss.
Detection Evidence
AppVersion shows Excel version released two months after the claimed creation dateTotalTime is 8 minutes for a 200-row inventory that would take hours to compileConclusion: Five independent metadata indicators contradict the claimed creation date. The document was fabricated after the loss event and backdated.
A quarterly financial report submitted to auditors shows favorable revenue figures. A whistleblower alleges the numbers were changed after the reporting period closed.
Detection Evidence
cp:lastModifiedBy shows the CFO's username; earlier backup copies show a different preparerConclusion: Selective cell modifications with formatting inconsistencies, confirmed by comparison with the pre-modification backup, establish that revenue figures were manually overwritten after the reporting period.
An employee submits a vendor quote spreadsheet to justify a procurement decision, claiming it was received from the vendor. Investigation reveals the employee has a financial relationship with the vendor.
Detection Evidence
dc:creator is the employee's username, not the vendor'sCompany field shows the employee's organization, not the vendor'sConclusion: Multiple origin indicators confirm the document was created internally by the employee, not received from the vendor. The quote was fabricated.
Metadata-based fraud detection often produces evidence intended for legal proceedings. The way evidence is collected, preserved, and documented determines whether it will be admissible and persuasive.
Chain of Custody
Report Documentation
Reproducibility
Important Caveats
Fraudulent Excel documents carry the seeds of their own detection. The very metadata systems that make Excel files functional—timestamps, author records, version tracking, style management, and internal XML structures—also create an audit trail that is extraordinarily difficult to fake completely.
The key principle for investigators is convergence. No single metadata artifact is conclusive on its own. File system timestamps can be manipulated. Document properties can be edited. Even application versions could theoretically be changed by someone with XML editing skills. But when multiple independent indicators all point to the same conclusion—that a document was created at a different time, by a different person, or on a different machine than claimed—the convergence of evidence becomes compelling.
Fraudsters think about the visible content: the numbers in cells, the dates in headers, the names on the document. They rarely think about the invisible layer of metadata that records the true history of every file they create. That asymmetry—between what the fraudster controls and what they overlook—is what makes metadata analysis one of the most powerful tools in document fraud detection.
Use our metadata analyzer to inspect Excel files for timestamp inconsistencies, author discrepancies, and structural anomalies that may indicate document fraud