In forensic investigations, establishing when events occurred is often more important than establishing what happened. Excel files carry a rich layer of temporal metadata—creation timestamps, modification records, author sequences, editing durations, and internal structural artifacts—that can be used to reconstruct a detailed chronology of document activity. This guide teaches you how to extract, correlate, and interpret these temporal signals to build forensically sound timelines.
Every investigation revolves around a question of sequence: what happened first, what caused what, and who knew what when. In disputes involving Excel files—whether financial audits, intellectual property claims, regulatory investigations, or litigation—the timeline of document creation and modification is often the central piece of evidence. A reconstructed timeline can prove that a report was altered after a deadline, that a financial model was created before a supposedly independent decision, or that multiple files were fabricated in a single session despite claiming different origins.
Excel files are particularly rich sources of temporal evidence because the XLSX format stores time-related data in multiple independent locations. These locations are written by different parts of the Excel application, making them difficult to manipulate consistently. By extracting and correlating timestamps across these locations—and across multiple files—investigators can build timelines that are far more reliable than any single timestamp.
The first step in timeline reconstruction is identifying and extracting every source of temporal information from each file under investigation. Excel files contain at least six independent categories of time-related data, each written by a different mechanism and each offering a different perspective on the document's history.
The docProps/core.xml file contains the primary document timestamps: creation date, last modification date, and authorship records. These are the most visible timestamps and the ones most likely to have been manipulated in a fraud scenario—but they still form the backbone of any timeline.
Key Fields
dcterms:created — Document creation timestampdcterms:modified — Last modification timestampdc:creator — Original author identitycp:lastModifiedBy — Last editor identitycp:revision — Revision counter (if present)Timeline Value
# Extract the XLSX archive
mkdir timeline_extract && unzip target.xlsx -d timeline_extract/
# View all core properties with timestamps
cat timeline_extract/docProps/core.xml
# Example output:
# <dcterms:created xsi:type="dcterms:W3CDTF">
# 2025-09-15T14:23:00Z
# </dcterms:created>
# <dcterms:modified xsi:type="dcterms:W3CDTF">
# 2026-01-08T09:45:00Z
# </dcterms:modified>
The docProps/app.xml file records the cumulative editing time, the application version, and document structure statistics. The TotalTime field is particularly valuable for timeline reconstruction because it records total minutes the document was open for editing—independent of the creation and modification dates.
Using TotalTime for Timeline Validation
TotalTime is 15 minutes but the created-to-modified span is 6 months, the document was not edited regularly over that periodTotalTime by typical session length (30-60 min) gives a rough minimum session count# Extract editing time and application info
grep -i "TotalTime\|Application\|AppVersion" \
timeline_extract/docProps/app.xml
# Example output:
# <TotalTime>342</TotalTime>
# <Application>Microsoft Excel</Application>
# <AppVersion>16.0300</AppVersion>
XLSX files are ZIP archives, and every file entry within the archive carries its own modification timestamp. These timestamps are set by Excel when writing the file and are independent of the document properties in core.xml. Most investigators overlook these, and most fraudsters never think to manipulate them.
What ZIP Timestamps Reveal
dcterms:modified suggest XML editingLimitations
# List all ZIP entries with timestamps
unzip -l target.xlsx
# Example output:
# Length Date Time Name
# -------- ---- ---- ----
# 1580 01-08-26 09:45 [Content_Types].xml
# 590 01-08-26 09:45 _rels/.rels
# 2841 01-08-26 09:45 xl/workbook.xml
# 14523 01-08-26 09:45 xl/worksheets/sheet1.xml
# All entries should show the same timestamp
# Mixed timestamps indicate post-save XML manipulation
The operating system records its own timestamps for every file: creation time, last modification time, and last access time. These are independent of both the document properties and the ZIP entry timestamps, providing a third layer of temporal data.
Important Caveats for File System Timestamps
ctime), which is different# macOS/Linux: Get file system timestamps
stat target.xlsx
# Windows PowerShell: Get detailed timestamps
# Get-ItemProperty target.xlsx | Select-Object *Time*
# Compare file system modified time with internal metadata
# They should be close (within seconds) for an unmanipulated file
Beyond explicit timestamps, the internal structure of an XLSX file contains implicit temporal information. The order of entries in the shared string table, the sequence of style definitions, and the structure of the calculation chain all reflect the order in which content was added to the workbook over time.
Shared String Table Ordering
Text values in xl/sharedStrings.xml are stored in the order they were first entered. Headers entered when the workbook was created appear near the beginning; data added in later sessions appears later. This ordering creates a chronological record of data entry that survives even if timestamps are manipulated.
Style Accumulation Patterns
Cell formats in xl/styles.xml accumulate over time. Early styles reflect the original formatting; later additions reflect modifications by different editors or in different sessions. The complexity and layering of styles can indicate whether a document was built incrementally or all at once.
Named Range and Defined Name History
Named ranges in xl/workbook.xml accumulate as a document evolves. Stale named ranges pointing to deleted sheets or moved ranges indicate iterative development. Their absence in a document claiming a long history is a temporal red flag.
Temporal data from outside the file itself often provides the most trustworthy anchoring points for a timeline. These external records are typically outside the control of anyone who might manipulate the file's internal metadata.
Email Records
Cloud Storage
System Logs
Raw timestamps from multiple sources must be organized into a coherent chronology. This requires a systematic process: extract all temporal data, normalize it to a common timezone, assess the reliability of each source, and then construct the timeline with appropriate confidence indicators.
For each file under investigation, create a comprehensive inventory of every temporal data point available. Use a consistent extraction process to ensure nothing is missed.
# Complete temporal data extraction script
# Run this for each file under investigation
FILE="target.xlsx"
EXTRACT_DIR="${FILE%.xlsx}_extracted"
# 1. File system timestamps
echo "=== File System Timestamps ==="
stat "$FILE"
# 2. Extract archive
mkdir -p "$EXTRACT_DIR"
unzip -o "$FILE" -d "$EXTRACT_DIR"
# 3. ZIP entry timestamps
echo "=== ZIP Entry Timestamps ==="
unzip -l "$FILE"
# 4. Core properties
echo "=== Core Properties ==="
cat "$EXTRACT_DIR/docProps/core.xml"
# 5. Extended properties
echo "=== Extended Properties ==="
cat "$EXTRACT_DIR/docProps/app.xml"
# 6. Custom properties (if present)
echo "=== Custom Properties ==="
cat "$EXTRACT_DIR/docProps/custom.xml" 2>/dev/null
Forensic note: Always work on a forensic copy, never the original. Hash the original file before extraction and verify the hash has not changed after your analysis. Document the extraction process with exact commands for reproducibility.
Timestamps from different sources may use different timezone conventions. OPC core properties use UTC. File system timestamps use local time. Email headers may use the sender's timezone. All timestamps must be normalized to a single reference (typically UTC) before comparison.
Common Timezone Pitfalls
core.xml timestamps are UTC (ending in Z)stat output uses the examiner's local timezoneNormalization Checklist
# Convert a local timestamp to UTC (macOS/Linux)
# If the creating machine was in US Eastern (UTC-5):
TZ="America/New_York" date -d "2026-01-08 09:45:00" -u
# Output: Thu Jan 8 14:45:00 UTC 2026
# The ZIP timestamp of 09:45 local = 14:45 UTC
# The core.xml timestamp should show 14:45:00Z
# If they match: timestamps are consistent
# If they differ: one layer has been manipulated
Not all temporal data sources are equally reliable. Some are trivially manipulated; others are deeply embedded and rarely altered even by sophisticated actors. Assigning reliability tiers to each data source allows you to weight evidence appropriately.
| Source | Reliability | Manipulation Difficulty | Notes |
|---|---|---|---|
| External server logs | High | Requires server access | Email servers, cloud platforms, backup systems |
| AppVersion / XML namespaces | High | Requires XML knowledge | Rarely manipulated; version-locks the document |
| Shared string ordering | High | Requires deep XML editing | Reordering would break cell references |
| ZIP entry timestamps | Medium | Requires ZIP tools | Rarely manipulated; low precision (2-second) |
| TotalTime | Medium | Requires XML editing | Cumulative; hard to fake a realistic value |
| OPC core timestamps | Medium | Simple XML edit | Easy to change but often forgotten by fraudsters |
| File system timestamps | Low | Trivial with OS tools | Reset by copying, downloading, syncing |
With all temporal data extracted, normalized, and assessed for reliability, construct the timeline as a sequence of events. Each event should include the timestamp, the source of the timestamp, the reliability tier, and any corroborating or contradicting evidence.
Timeline Entry Template
Event: Document created
Timestamp: 2025-09-15T14:23:00Z
Source: dcterms:created (core.xml)
Reliability: Medium
Corroboration:
+ AppVersion 16.0300 existed at this date (consistent)
+ Email sent 2025-09-15T15:01:00Z with file attached
- File system creation date is 2026-01-05 (inconsistent,
but explainable by file copy operation)
Confidence: High (corroborated by email record)
Key Principles for Timeline Construction
Many investigations involve multiple Excel files that are claimed to have been created independently at different times. Cross-file analysis can reveal hidden relationships between files—shared origins, common authors, simultaneous creation sessions, or copy-and-modify patterns—that contradict the claimed narrative.
Files derived from a common template or parent file share structural DNA that betrays their relationship, even if their timestamps have been manipulated to appear independent.
Shared String Table Overlap
If two files share an unusual sequence of strings in the same order at the beginning of their shared string tables, they were likely derived from the same template or one was copied from the other. The probability of two independently created files having identical string table prefixes is extremely low.
Style Fingerprinting
The styles.xml file accumulates formatting definitions over a document's lifetime. If two files have identical or near-identical style definitions—including the same custom number formats, the same font list, and the same cell format combinations—they share a common ancestor. This is a strong indicator even if the visible content differs.
Theme and Template Identity
The xl/theme/theme1.xml file is copied from the template when a workbook is created. Files created from the same template have identical theme files. If two supposedly independent files have byte-identical theme files with non-default customizations, they share an origin.
# Compare shared string tables between files
diff <(head -20 file1_extract/xl/sharedStrings.xml) \
<(head -20 file2_extract/xl/sharedStrings.xml)
# Compare style definitions
diff file1_extract/xl/styles.xml file2_extract/xl/styles.xml
# Compare theme files (byte-level)
md5sum file1_extract/xl/theme/theme1.xml \
file2_extract/xl/theme/theme1.xml
# Compare printer settings (binary)
md5sum file1_extract/xl/printerSettings/*.bin \
file2_extract/xl/printerSettings/*.bin 2>/dev/null
When multiple files are fabricated in a single session but claim different creation dates, temporal analysis across the file set reveals the clustering.
Clustering Indicators
AppVersion build number across all files (updates happen frequently)dc:creator across files claiming different authorsTotalTime Distribution Analysis
TotalTime across all filesTotalTime values should be plausible for one person# Extract TotalTime from multiple files for comparison
for file in *.xlsx; do
dir="${file%.xlsx}_ext"
mkdir -p "$dir" && unzip -o "$file" -d "$dir" >/dev/null 2>&1
time=$(grep -oP '<TotalTime>\K[^<]+' "$dir/docProps/app.xml" 2>/dev/null)
created=$(grep -oP 'created[^>]*>\K[^<]+' "$dir/docProps/core.xml" 2>/dev/null)
echo "$file | Created: $created | TotalTime: ${time:-N/A} min"
done
When one file was created by modifying another, the derivation relationship can often be established through metadata comparison. This is critical in intellectual property disputes, where proving that Document B was derived from Document A establishes priority and ownership.
Derivation Evidence Hierarchy
dcterms:created as File A but a later dcterms:modified, File B is likely a modified copy of File Adc:creator matches File B's dc:creator but File B's cp:lastModifiedBy is different, File B was created by one person and modified by anotherWhen someone has deliberately altered timestamps to construct a false timeline, the manipulation leaves characteristic patterns. These patterns arise because it is nearly impossible to modify all temporal data sources consistently—there are too many independent locations recording time information.
Timestamp Layer Disagreement
The most common manipulation pattern: core.xml timestamps have been edited to show desired dates, but ZIP entry timestamps still show the actual save date. A document claiming creation in June with ZIP entries dated in December has had its core properties manipulated.
TotalTime Impossibility
A file with a dcterms:created of January 1 and a dcterms:modified of June 30 (182-day span) but a TotalTime of 3 minutes cannot have been actively used over that period. The editing time implies the document was open for only one brief session, contradicting the claimed six-month lifespan.
Version Anachronism
The AppVersion identifies an Excel version that did not exist at the claimed creation date. Similarly, XML features, namespace URIs, or function usage (like XLOOKUP) that were introduced after the claimed date prove the timeline is fabricated.
Creation After Modification
If dcterms:created is later than dcterms:modified, the creation timestamp has been manually set. Excel never naturally produces this condition because modification always occurs at or after creation.
Unnatural Timestamp Precision
Legitimate Excel timestamps typically have second-level precision with non-round values (e.g., 14:23:17Z). Manually entered timestamps often use round values like 14:00:00Z or 09:30:00Z. A pattern of suspiciously round timestamps across core properties suggests manual editing.
Structural Immaturity
A document claiming a long editing history but showing the structural simplicity of a newly created file: minimal styles, no orphaned strings, no stale named ranges, no printer settings, and a perfectly ordered shared string table. The structural age does not match the claimed chronological age.
For a systematic approach to detecting manipulation, construct a consistency matrix that cross-references every temporal source against every other source. A legitimate file shows consistency across the matrix; a manipulated file shows a characteristic pattern of isolated inconsistencies.
| Source | core.xml | ZIP dates | TotalTime | AppVersion | Structure | File system |
|---|---|---|---|---|---|---|
| core.xml | — | CONFLICT | CONFLICT | CONFLICT | CONFLICT | OK |
| ZIP dates | CONFLICT | — | OK | OK | OK | OK |
| TotalTime | CONFLICT | OK | — | OK | OK | OK |
Reading the matrix: In this example, core.xmlconflicts with every other source, but all other sources are consistent with each other. This pattern isolates core.xml as the manipulated layer—someone edited the document properties XML to show false dates, but the ZIP timestamps, editing time, application version, and structural artifacts all tell the true story.
To illustrate how these techniques work together, consider a scenario involving a financial reporting dispute. A company is being investigated for submitting altered quarterly reports. The investigation has five Excel files that are claimed to represent five months of independent financial reporting.
The five quarterly report files claim creation dates spanning January through May 2025. Extracting metadata from all five files reveals the following:
| File | Claimed Created | ZIP Date | TotalTime | Creator |
|---|---|---|---|---|
| Q1-Jan-Report.xlsx | 2025-01-31 | 2025-06-12 | 4 min | jsmith |
| Q1-Feb-Report.xlsx | 2025-02-28 | 2025-06-12 | 6 min | jsmith |
| Q1-Mar-Report.xlsx | 2025-03-31 | 2025-06-12 | 5 min | jsmith |
| Q1-Apr-Report.xlsx | 2025-04-30 | 2025-06-12 | 3 min | jsmith |
| Q1-May-Report.xlsx | 2025-05-31 | 2025-06-12 | 4 min | jsmith |
Immediate red flags: All five files have the same ZIP entry date (June 12, 2025) despite claiming creation dates spanning January through May. All have TotalTime values of 3–6 minutes—impossibly short for monthly financial reports that should each involve hours of work. All share the same creator.
Deeper analysis of the five files reveals additional evidence of batch creation:
Findings
styles.xml content—the same styles, same order, same count. Independently created monthly reports would accumulate different formatting over timeAppVersion in all five files is 16.0089, a build number released in May 2025. The January and February files could not have been created with this versionBased on the convergence of evidence from multiple independent sources, the reconstructed timeline tells a very different story from the claimed narrative:
All five files created in a single session
User "jsmith" created all five files in approximately 22 minutes total (sum of TotalTime values), using a common template. The core.xmlcreation dates were manually edited to show January through May. The ZIP timestamps and AppVersion confirm the true creation date.
Files submitted as historical records
Email records show all five files were attached to a single email sent at 16:42 UTC on June 12—consistent with the ZIP timestamps and inconsistent with the claimed creation dates.
No evidence of document activity
No email records, file server logs, backup snapshots, or print logs show any of these files existing before June 12. The claimed five-month history is unsupported by any external evidence.
Conclusion: Seven independent lines of evidence (ZIP timestamps, TotalTime values, AppVersion dating, structural identity, string table patterns, absence of editing artifacts, and email timing) converge on the same conclusion: all five files were fabricated on June 12, 2025, and backdated to create a false appearance of monthly reporting.
Preserve evidence and establish chain of custody
Hash all original files (SHA-256). Create forensic copies. Document how and when files were received. Work only on copies.
Extract all six categories of temporal data from each file
OPC core properties, extended properties (TotalTime, AppVersion), ZIP entry timestamps, file system timestamps, internal structural artifacts, and external correlation records.
Normalize all timestamps to UTC
Identify the timezone of each source. Convert everything to UTC. Note DST transitions. Record original values alongside converted values.
Build the consistency matrix
Cross-reference every temporal source against every other source. Identify consistent clusters and isolated outliers. The outlier sources are the manipulated ones.
Perform cross-file analysis (if multiple files)
Compare string tables, styles, themes, printer settings, and author metadata across files. Identify common origins, simultaneous creation, and derivation chains.
Validate against external records
Cross-reference with email logs, cloud version history, backup snapshots, file server audits, and print logs. External records provide the strongest anchoring points.
Construct the reconstructed timeline
Build the chronology event by event, noting the source, reliability, corroboration, and confidence for each entry. Distinguish facts from inferences.
Document findings for legal or audit use
Include raw metadata values, extraction commands, hash values, consistency matrices, and the complete reconstructed timeline. Ensure reproducibility by a qualified peer reviewer.
Timeline reconstruction from Excel metadata is one of the most powerful techniques in document forensics because it leverages the fundamental architecture of the XLSX format. Every Excel file records its history in multiple independent locations: document properties, ZIP archive metadata, editing time counters, application version stamps, and structural artifacts in the XML layers. These sources are written by different mechanisms, stored in different locations, and require different tools and knowledge to manipulate.
The strength of a reconstructed timeline comes from convergence. A single timestamp can be wrong for innocent reasons—clock skew, timezone confusion, file copying. But when ZIP entry dates, TotalTime values, AppVersion dating, structural maturity analysis, cross-file comparison, and external records all point to the same conclusion, the convergence of independent evidence creates a timeline that is difficult to dispute.
For investigators, the key insight is that manipulating a timeline requires changing not just one timestamp but every temporal trace across all six categories of evidence. Most people who alter document timestamps only modify the most visible layer—the document properties in core.xml—while leaving the deeper layers untouched. It is this gap between what was manipulated and what was overlooked that makes timeline reconstruction possible, and it is the systematic extraction and correlation of all temporal evidence that makes it reliable.
Use our metadata analyzer to extract timestamps, author records, and structural artifacts from Excel files and build forensically sound document timelines