When you receive an Excel file claiming to be an original financial report, a contract appendix, or an audit workpaper, how do you know it is genuine? Metadata analysis provides a systematic, evidence-based approach to verifying document authenticity—examining internal properties, structural artifacts, application fingerprints, and temporal consistency to determine whether a file is what it claims to be or has been fabricated, altered, or misrepresented.
Document authenticity verification is a cornerstone of digital forensics, legal proceedings, regulatory compliance, and business due diligence. In a world where anyone can create, copy, and modify Excel files with ease, the ability to determine whether a document is genuine—and has not been tampered with—is essential. Metadata provides the forensic fingerprints that make this possible.
Unlike the visible content of a spreadsheet, which can be altered freely, metadata is written automatically by the application, the operating system, and the file format specification. These metadata layers create an internal consistency that is extremely difficult to fabricate convincingly. A forger may change cell values and formatting, but matching every metadata field, XML namespace, ZIP archive timestamp, internal relationship record, and application version signature is a far more demanding task—and usually one where mistakes reveal the truth.
The first layer of authenticity verification begins with the document's core properties stored in docProps/core.xml. These fields record the claimed author, creation date, last editor, and modification date. While these are the easiest properties to manipulate, they also provide the baseline claims that all subsequent checks will validate against.
The dc:creator and cp:lastModifiedBy fields identify the original author and most recent editor. These should match known user identities within the organization. Inconsistencies here are often the first sign that something is wrong.
What to Check
Red Flags
# Extract and examine core properties
unzip -o document.xlsx -d doc_extract/
cat doc_extract/docProps/core.xml
# Example output revealing a mismatch:
# <dc:creator>Sarah Johnson</dc:creator>
# <cp:lastModifiedBy>User</cp:lastModifiedBy>
# The generic "User" as last editor is suspicious
# when the claimed workflow involves named reviewers
The creation and modification timestamps must be logically consistent with each other and with the document's claimed history. These timestamps are stored in ISO 8601 format and should always have the creation date before the modification date.
Logical Consistency Rules
# Parse and compare timestamps
grep -oP "(?<=<dcterms:created[^>]*>).*?(?=</dcterms:created>)" \
doc_extract/docProps/core.xml
grep -oP "(?<=<dcterms:modified[^>]*>).*?(?=</dcterms:modified>)" \
doc_extract/docProps/core.xml
# Suspicious example:
# Created: 2025-03-15T08:30:00Z
# Modified: 2025-03-15T08:30:00Z
# Identical timestamps on a complex workbook suggest
# it was saved exactly once — unusual for a document
# that claims multiple rounds of review
Every application that creates or modifies an Excel file leaves a distinctive fingerprint in the file's metadata. The extended properties in docProps/app.xml record the application name, version, and other characteristics that can be used to confirm or contradict claims about how the file was created. This is one of the most powerful authenticity checks because forgers rarely think to match application signatures.
The Application and AppVersion fields identify the software that last saved the file. Each version of Excel produces a specific version string, and the format of the generated XML varies slightly between versions. Mismatches between the claimed creation environment and the actual application fingerprint are strong evidence of inauthenticity.
Known Excel Version Signatures
| AppVersion | Excel Version | Release Year |
|---|---|---|
| 12.0000 | Excel 2007 | 2007 |
| 14.0300 | Excel 2010 | 2010 |
| 15.0300 | Excel 2013 | 2013 |
| 16.0300 | Excel 2016/2019/365 | 2016+ |
| 16.0300 | Excel 365 (current) | 2020+ |
# Check application fingerprint
cat doc_extract/docProps/app.xml
# Look for key fields:
# <Application>Microsoft Excel</Application>
# <AppVersion>16.0300</AppVersion>
# <TotalTime>487</TotalTime>
# A file claiming to be from 2009 but showing
# AppVersion 16.0300 was actually last saved
# with Excel 2016 or later
The TotalTime property in app.xml records the cumulative minutes the document was open for editing. This value should be proportional to the document's complexity and claimed history. It is one of the hardest properties for forgers to set correctly because it requires understanding the realistic editing time for the document in question.
Plausibility Checks
Cross-Reference Technique
Real-World Example
An employee submitted a quarterly financial report claiming it had been developed over three weeks. Metadata analysis revealed a TotalTime of 7 minutes and an AppVersion corresponding to LibreOffice Calc rather than the company's standard Microsoft Excel installation. Further investigation showed the file was generated from a Python script and manually saved once to add superficial metadata, but the editing time and application fingerprint betrayed its true origin.
Files created by third-party libraries (openpyxl, Apache POI, SheetJS) or non-Excel applications (Google Sheets, LibreOffice Calc) have distinctive signatures that differ from genuine Microsoft Excel output. Detecting these signatures is critical when a document claims to have been created in Excel.
Common Third-Party Signatures
Application to "Microsoft Excel" but uses different XML formatting, namespace prefixes, and element ordering than real ExcelApplication as "Apache POI" or mimic Excel; check for non-standard XML comments or processing instructionsapp.xml entirely or include Google-specific custom propertiesmeta:generator tags identifying the LibreOffice version# Check for non-standard content types
cat doc_extract/[Content_Types].xml | xmllint --format -
# Check for Google Sheets signature
grep -r "google\|Sheets\|spreadsheets.google" doc_extract/
# Check for LibreOffice signature
grep -r "LibreOffice\|meta:generator" doc_extract/
# Check for openpyxl/POI artifacts
grep -r "openpyxl\|Apache POI" doc_extract/
Every version of Excel produces XLSX files with characteristic XML patterns—specific namespace declarations, element ordering, default style definitions, and relationship structures. These patterns form a structural fingerprint that is extremely difficult to replicate perfectly. Analyzing these structural elements is one of the most reliable methods for detecting fabricated documents.
XLSX files declare XML namespaces that correspond to specific Office Open XML schema versions. These namespaces must be consistent across all XML files within the archive. Mismatched namespaces are a strong indicator that files were manually assembled from different sources.
Key Namespace Checks
# Extract all namespace declarations across the archive
grep -rh "xmlns" doc_extract/ | sort -u
# Check for mixed transitional/strict namespaces
# Transitional uses: schemas.openxmlformats.org
# Strict uses: purl.oclc.org/ooxml/
grep -r "purl.oclc.org" doc_extract/
grep -r "schemas.openxmlformats.org" doc_extract/
# A genuine file uses one or the other consistently
# Finding both is a strong fabrication indicator
XLSX files use relationship files (.rels) to define how internal components connect to each other. Every sheet, style definition, shared string table, and embedded object must have a corresponding relationship entry. Missing, orphaned, or inconsistent relationships reveal manual file manipulation.
Integrity Checks
Fabrication Indicators
# List all relationship files
find doc_extract/ -name "*.rels" -exec echo "=== ===" \; -exec cat \;
# Cross-reference: list all files in the archive
find doc_extract/ -type f | sort
# Check for orphaned files not in any .rels
# or .rels targets that don't exist on disk
# Mismatches indicate manual file assembly
The shared string table (xl/sharedStrings.xml) stores all unique text strings used in the workbook. Excel adds strings to this table in the order they are first entered, creating a chronological record of data entry. The count attribute (total string references) and uniqueCount attribute (unique strings) should match the actual cell content of the workbook.
Verification Technique
count attribute. Mismatches indicate post-processing<si> elements and compare against uniqueCount. A discrepancy means the file was manually edited# Check shared string table attributes
head -5 doc_extract/xl/sharedStrings.xml
# Example: <sst count="1847" uniqueCount="523">
# Count actual <si> elements
grep -c "<si>\|<si " doc_extract/xl/sharedStrings.xml
# If uniqueCount says 523 but there are 519 <si>
# elements, the file has been manually edited after
# the shared string table was generated
XLSX files are ZIP archives containing the XML files and other resources that compose the workbook. The ZIP format itself contains metadata—file modification timestamps, compression methods, and internal ordering—that provides an independent layer of forensic evidence. Because most forgers focus on the XML content, ZIP-level metadata often preserves the truth about a file's actual history.
Each file within a ZIP archive has its own modification timestamp. When Excel saves a workbook, it writes all internal files with timestamps that should be close to the save time. Inconsistencies in these timestamps reveal manual archive assembly.
What to Look For
dcterms:modified value from core.xml# List all ZIP entries with timestamps
unzip -l document.xlsx
# Detailed view including compression method
zipinfo document.xlsx
# Example output showing suspicious mixed dates:
# Length Method Size Cmpr Date Time Name
# -------- ------ ------- ---- ---------- ----- ----
# 1368 Defl:N 432 68% 2025-03-15 08:30 [Content_Types].xml
# 588 Defl:N 243 59% 2025-03-15 08:30 _rels/.rels
# 23847 Defl:N 5692 76% 2024-11-02 14:22 xl/worksheets/sheet1.xml
# ^^^^^^^^^
# This sheet has a timestamp 4 months older than
# the rest of the archive — it was likely copied
# from a different file and inserted manually
Excel uses specific compression settings when creating XLSX files. The compression method (Deflate) and compression level should be consistent across all entries. Different compression methods or levels for different files within the archive suggest that the archive was assembled using a tool other than Excel.
Excel Defaults
Anomaly Indicators
The styles file (xl/styles.xml) contains every cell format, number format, font definition, fill pattern, and border style used in the workbook. Excel generates this file with a set of default styles that varies by version and locale. The presence, absence, and ordering of these styles provides another layer of authenticity evidence.
Every version of Excel includes a specific set of built-in styles (Normal, Comma, Currency, Percent, etc.) with version-specific default fonts and formatting. These defaults serve as a version fingerprint that should match the AppVersion claim.
Version-Specific Style Indicators
# Check default font in styles
grep -i "font" doc_extract/xl/styles.xml | head -10
# Count number formats
grep -c "numFmt" doc_extract/xl/styles.xml
# Check for theme reference
cat doc_extract/xl/theme/theme1.xml | head -20
# A file claiming Excel 2007 origin but containing
# Excel 2016 theme colors was not created when claimed
When cells are deleted or reformatted, their style definitions often remain in the styles file. These orphaned styles can reveal the history of editing and indicate whether content was imported from other workbooks.
What Orphaned Styles Reveal
The most powerful authenticity verification comes from cross-referencing evidence across all the layers examined above. No single metadata field proves or disproves authenticity on its own, but inconsistencies between independent evidence sources create a compelling case. This is where forensic analysis becomes truly effective.
Build a verification matrix that compares claims across all metadata layers. Each row represents a factual claim about the document, and each column represents an independent source of evidence. Contradictions between sources are highlighted as anomalies.
| Claim | core.xml | app.xml | ZIP Headers | Structural |
|---|---|---|---|---|
| Created March 2025 | 2025-03-15 | TotalTime: 487 | 2026-01-08 | CONFLICT |
| Created in Excel 2016 | N/A | AppVersion: 16.0300 | N/A | 2016+ themes |
| Author: Sarah Johnson | dc:creator match | N/A | N/A | Default printer: Home_HP |
| Reviewed by 3 people | lastModifiedBy: User | TotalTime: 487 min | Single timestamp | CONFLICT |
Follow this structured workflow to systematically verify document authenticity using all available metadata evidence.
Extract and Preserve
Create a forensic copy of the file. Extract the ZIP archive to a working directory. Record file hashes (SHA-256) before any analysis to establish evidence integrity.
Document the Claims
Record what the document claims to be: who created it, when, using what software, and what its revision history should look like. These are the assertions you will test.
Extract All Metadata Layers
Examine core properties, extended properties, ZIP archive metadata, XML structure, shared strings, and styles. Record all findings in a structured format.
Cross-Reference and Identify Conflicts
Build the verification matrix. Compare each claim against every evidence source. Flag all inconsistencies, no matter how minor. A pattern of small inconsistencies is more significant than any single anomaly.
Assess and Report
Classify the document as authentic, questionable, or fabricated based on the weight of evidence. Document your methodology, findings, and conclusions in a forensic report that can withstand scrutiny.
Beyond the standard verification steps, several advanced techniques can provide additional evidence when standard checks are inconclusive or when dealing with sophisticated forgeries.
One of the most effective advanced techniques is comparing the suspect document against a known-authentic reference document from the same environment. If the claimed author creates documents regularly on the same system, comparing metadata patterns between the suspect file and a verified file can reveal inconsistencies that are invisible in isolation.
What to Compare
# Compare core properties between suspect and reference
diff <(cat suspect_extract/docProps/core.xml) \
<(cat reference_extract/docProps/core.xml)
# Compare style definitions
diff <(cat suspect_extract/xl/styles.xml) \
<(cat reference_extract/xl/styles.xml)
# Compare theme files
diff <(cat suspect_extract/xl/theme/theme1.xml) \
<(cat reference_extract/xl/theme/theme1.xml)
# Identical themes confirm same environment;
# different themes suggest different origin
The calculation chain file (xl/calcChain.xml) records the order in which Excel evaluates formulas. This chain is built incrementally as formulas are added and provides a hidden chronological record of formula creation. Its presence and structure can help verify that a complex workbook was built over time rather than generated all at once.
Calculation Chain Indicators
# Check if calculation chain exists
ls -la doc_extract/xl/calcChain.xml
# View calculation chain structure
cat doc_extract/xl/calcChain.xml
# Count formula cells in sheet vs. chain entries
grep -c "<f>\|<f " doc_extract/xl/worksheets/sheet1.xml
grep -c "<c " doc_extract/xl/calcChain.xml
# Mismatched counts indicate post-generation editing
Excel workbooks often contain references to external files, data connections, and linked objects. These references embed file paths, server names, and network locations that reveal the environment where the document was actually used—regardless of what the core properties claim.
Environmental Evidence in Links
C:\Users\RealAuthor\Documents\source.xlsx that reveal the actual user and system\\server\share\folder identify the network environment where the file was used# Search for external references
grep -r "externalLink\|connection\|oleObject" doc_extract/
# Look for file paths in sheet XML
grep -r "C:\\\|/Users/\|\\\\\\\\" doc_extract/xl/
# Check for data connection files
ls doc_extract/xl/connections.xml 2>/dev/null
ls doc_extract/xl/externalLinks/ 2>/dev/null
# A document supposedly created by "Sarah Johnson"
# but with external links referencing
# C:\Users\MikeW\Desktop\ reveals the true author
Understanding the common patterns used to forge or misrepresent Excel documents helps investigators know what to look for. Each pattern has characteristic metadata signatures that betray the forgery.
A document is created today but the creation date is changed to make it appear older. This is the most common forgery pattern, used to fabricate evidence of prior knowledge, meet deadlines retroactively, or establish false timelines.
Detection Signatures
A document created by one person has its author metadata changed to attribute it to someone else. This is used to fabricate work product, avoid accountability, or create false evidence of authorship.
Detection Signatures
A document is generated by a script or application but modified to appear as if it was created manually in Excel. This pattern is used to fabricate financial reports, audit evidence, and compliance documentation.
Detection Signatures
An authentic document is modified to change specific values while preserving the overall appearance of authenticity. This is the hardest pattern to detect because most metadata remains genuine.
Detection Signatures
When document authenticity verification is performed for legal, regulatory, or corporate governance purposes, the findings must be documented in a structured forensic report that can withstand cross-examination and peer review.
1. Evidence Preservation Record
Document the chain of custody: how you received the file, the original file hash (SHA-256), where the working copy is stored, and what tools were used for analysis. This establishes that your analysis was performed on an unmodified copy.
2. Claims Under Test
Explicitly state what the document claims to be: the claimed author, creation date, modification history, application used, and any other assertions. These form the hypotheses that your analysis will test.
3. Methodology
Describe the analysis steps performed, the tools used (including versions), and the order of operations. This allows another examiner to reproduce your findings independently.
4. Findings and Evidence
Present each metadata finding with its source, the expected value, the actual value, and the significance of any discrepancy. Include raw XML excerpts, ZIP listings, and command outputs as supporting evidence.
5. Cross-Reference Matrix
Include the full verification matrix showing how each claim was tested against multiple evidence sources, with clear highlighting of contradictions and anomalies.
6. Conclusion and Confidence Level
State your conclusion about the document's authenticity and assign a confidence level. Use qualified language: "The metadata evidence is consistent with fabrication" rather than "The document is fake." Distinguish between what the evidence shows and what it suggests.
While manual forensic analysis gives you the deepest understanding of file metadata, MetaData Analyzer automates many of these verification steps, providing rapid assessment that can guide deeper investigation.
Core Property Extraction
Instantly view author, creation date, modification date, and revision count with visual highlighting of suspicious patterns.
Application Fingerprint Display
See the application name, version, and editing time at a glance. Cross-reference against the claimed creation environment.
Timestamp Consistency Analysis
Automatic comparison of creation, modification, and access timestamps to flag logical impossibilities and suspicious patterns.
Hidden Content Detection
Detect hidden sheets, comments, tracked changes, and embedded objects that may contradict the document's claimed content.
Metadata Removal for Clean Sharing
Once verification is complete, strip all metadata before sharing to prevent exposing your forensic methodology or sensitive investigation details.
Never rely on a single metadata field. True authenticity verification requires cross-referencing evidence from core properties, application fingerprints, XML structure, ZIP archives, and file content to build a complete picture.
Each Excel version produces distinctive XML patterns, default styles, and structural characteristics. These fingerprints are often overlooked by forgers and provide some of the most reliable authenticity evidence.
Core properties, ZIP headers, and file system metadata all record timestamps independently. A forger who changes one source rarely changes all three consistently, creating detectable contradictions.
Forensic findings are only valuable if they can withstand scrutiny. Always preserve evidence integrity, document your analysis steps, and present findings with appropriate confidence qualifiers.