Back to Blog
Forensics

Timeline Reconstruction Using Excel File Metadata

In forensic investigations, establishing when events occurred is often more important than establishing what happened. Excel files carry a rich layer of temporal metadata—creation timestamps, modification records, author sequences, editing durations, and internal structural artifacts—that can be used to reconstruct a detailed chronology of document activity. This guide teaches you how to extract, correlate, and interpret these temporal signals to build forensically sound timelines.

By Forensics TeamFebruary 22, 202622 min read

Why Timeline Reconstruction Matters

Every investigation revolves around a question of sequence: what happened first, what caused what, and who knew what when. In disputes involving Excel files—whether financial audits, intellectual property claims, regulatory investigations, or litigation—the timeline of document creation and modification is often the central piece of evidence. A reconstructed timeline can prove that a report was altered after a deadline, that a financial model was created before a supposedly independent decision, or that multiple files were fabricated in a single session despite claiming different origins.

Excel files are particularly rich sources of temporal evidence because the XLSX format stores time-related data in multiple independent locations. These locations are written by different parts of the Excel application, making them difficult to manipulate consistently. By extracting and correlating timestamps across these locations—and across multiple files—investigators can build timelines that are far more reliable than any single timestamp.

What a Reconstructed Timeline Can Establish

  • Document creation order: Which file was genuinely created first when multiple files claim similar dates
  • Editing sessions: How many distinct sessions occurred and when, based on timestamp gaps and editing time accumulation
  • Author activity windows: When specific individuals interacted with documents, corroborated by author metadata
  • Modification sequences: The order in which changes were made, even when modification dates have been manipulated
  • Fabrication timing: When a backdated document was actually created, using version-locked artifacts
  • Cross-file relationships: Whether files that claim independence were actually created together or derived from common sources
  • Gap analysis: Periods of suspicious inactivity or impossibly compressed activity in a document's history

Extracting Temporal Data from Excel Files

The first step in timeline reconstruction is identifying and extracting every source of temporal information from each file under investigation. Excel files contain at least six independent categories of time-related data, each written by a different mechanism and each offering a different perspective on the document's history.

Source 1: OPC Core Properties

The docProps/core.xml file contains the primary document timestamps: creation date, last modification date, and authorship records. These are the most visible timestamps and the ones most likely to have been manipulated in a fraud scenario—but they still form the backbone of any timeline.

Key Fields

  • dcterms:created — Document creation timestamp
  • dcterms:modified — Last modification timestamp
  • dc:creator — Original author identity
  • cp:lastModifiedBy — Last editor identity
  • cp:revision — Revision counter (if present)

Timeline Value

  • • Establishes the claimed creation and last-edit dates
  • • Identifies the first and most recent editors
  • • Revision count indicates minimum number of save operations
  • • Comparison between created and modified reveals total lifespan
  • • Author transitions indicate handoff points

# Extract the XLSX archive

mkdir timeline_extract && unzip target.xlsx -d timeline_extract/

 

# View all core properties with timestamps

cat timeline_extract/docProps/core.xml

 

# Example output:

# <dcterms:created xsi:type="dcterms:W3CDTF">

# 2025-09-15T14:23:00Z

# </dcterms:created>

# <dcterms:modified xsi:type="dcterms:W3CDTF">

# 2026-01-08T09:45:00Z

# </dcterms:modified>

Source 2: Extended Properties and Editing Time

The docProps/app.xml file records the cumulative editing time, the application version, and document structure statistics. The TotalTime field is particularly valuable for timeline reconstruction because it records total minutes the document was open for editing—independent of the creation and modification dates.

Using TotalTime for Timeline Validation

  • Expected range: A document created 4 months ago and used weekly should show hundreds of minutes of editing time
  • Compression indicator: If TotalTime is 15 minutes but the created-to-modified span is 6 months, the document was not edited regularly over that period
  • Session estimation: Dividing TotalTime by typical session length (30-60 min) gives a rough minimum session count
  • Cross-file comparison: Documents from the same workflow should show proportional editing times relative to their complexity

# Extract editing time and application info

grep -i "TotalTime\|Application\|AppVersion" \

  timeline_extract/docProps/app.xml

 

# Example output:

# <TotalTime>342</TotalTime>

# <Application>Microsoft Excel</Application>

# <AppVersion>16.0300</AppVersion>

Source 3: ZIP Archive Entry Timestamps

XLSX files are ZIP archives, and every file entry within the archive carries its own modification timestamp. These timestamps are set by Excel when writing the file and are independent of the document properties in core.xml. Most investigators overlook these, and most fraudsters never think to manipulate them.

What ZIP Timestamps Reveal

  • • The actual date and time the file was last saved
  • • Whether all components were written simultaneously (normal) or at different times (suspicious)
  • • Discrepancies with dcterms:modified suggest XML editing

Limitations

  • • ZIP timestamps have 2-second resolution (less precise)
  • • They only record the last save, not the full history
  • • Can be manipulated with ZIP editing tools (but rarely are)

# List all ZIP entries with timestamps

unzip -l target.xlsx

 

# Example output:

# Length Date Time Name

# -------- ---- ---- ----

# 1580 01-08-26 09:45 [Content_Types].xml

# 590 01-08-26 09:45 _rels/.rels

# 2841 01-08-26 09:45 xl/workbook.xml

# 14523 01-08-26 09:45 xl/worksheets/sheet1.xml

 

# All entries should show the same timestamp

# Mixed timestamps indicate post-save XML manipulation

Source 4: File System Metadata

The operating system records its own timestamps for every file: creation time, last modification time, and last access time. These are independent of both the document properties and the ZIP entry timestamps, providing a third layer of temporal data.

Important Caveats for File System Timestamps

  • Copying resets creation time: When a file is copied, most operating systems set the creation time to the copy time, not the original creation time
  • Email attachments: Saving an email attachment creates a new file system creation time at the time of saving, not the original authoring time
  • Cloud sync: Files synced via OneDrive, Dropbox, or Google Drive may have file system timestamps reflecting the sync time
  • NTFS vs. other filesystems: NTFS stores creation time; many Linux filesystems store only change time (ctime), which is different
  • Manipulation: File system timestamps can be changed with readily available tools, so they are the least reliable layer

# macOS/Linux: Get file system timestamps

stat target.xlsx

 

# Windows PowerShell: Get detailed timestamps

# Get-ItemProperty target.xlsx | Select-Object *Time*

 

# Compare file system modified time with internal metadata

# They should be close (within seconds) for an unmanipulated file

Source 5: Internal Structural Artifacts

Beyond explicit timestamps, the internal structure of an XLSX file contains implicit temporal information. The order of entries in the shared string table, the sequence of style definitions, and the structure of the calculation chain all reflect the order in which content was added to the workbook over time.

Shared String Table Ordering

Text values in xl/sharedStrings.xml are stored in the order they were first entered. Headers entered when the workbook was created appear near the beginning; data added in later sessions appears later. This ordering creates a chronological record of data entry that survives even if timestamps are manipulated.

Style Accumulation Patterns

Cell formats in xl/styles.xml accumulate over time. Early styles reflect the original formatting; later additions reflect modifications by different editors or in different sessions. The complexity and layering of styles can indicate whether a document was built incrementally or all at once.

Named Range and Defined Name History

Named ranges in xl/workbook.xml accumulate as a document evolves. Stale named ranges pointing to deleted sheets or moved ranges indicate iterative development. Their absence in a document claiming a long history is a temporal red flag.

Source 6: External Correlation Points

Temporal data from outside the file itself often provides the most trustworthy anchoring points for a timeline. These external records are typically outside the control of anyone who might manipulate the file's internal metadata.

Email Records

  • • Sent/received timestamps
  • • Attachment file sizes
  • • SMTP header timestamps
  • • Server-side delivery logs

Cloud Storage

  • • Version history snapshots
  • • Sync timestamps
  • • Access logs
  • • Sharing event records

System Logs

  • • File server audit logs
  • • Backup system records
  • • Print server logs
  • • DLP event logs

Building the Timeline: A Structured Approach

Raw timestamps from multiple sources must be organized into a coherent chronology. This requires a systematic process: extract all temporal data, normalize it to a common timezone, assess the reliability of each source, and then construct the timeline with appropriate confidence indicators.

Step 1: Extract All Temporal Data Points

For each file under investigation, create a comprehensive inventory of every temporal data point available. Use a consistent extraction process to ensure nothing is missed.

# Complete temporal data extraction script

# Run this for each file under investigation

 

FILE="target.xlsx"

EXTRACT_DIR="${FILE%.xlsx}_extracted"

 

# 1. File system timestamps

echo "=== File System Timestamps ==="

stat "$FILE"

 

# 2. Extract archive

mkdir -p "$EXTRACT_DIR"

unzip -o "$FILE" -d "$EXTRACT_DIR"

 

# 3. ZIP entry timestamps

echo "=== ZIP Entry Timestamps ==="

unzip -l "$FILE"

 

# 4. Core properties

echo "=== Core Properties ==="

cat "$EXTRACT_DIR/docProps/core.xml"

 

# 5. Extended properties

echo "=== Extended Properties ==="

cat "$EXTRACT_DIR/docProps/app.xml"

 

# 6. Custom properties (if present)

echo "=== Custom Properties ==="

cat "$EXTRACT_DIR/docProps/custom.xml" 2>/dev/null

Forensic note: Always work on a forensic copy, never the original. Hash the original file before extraction and verify the hash has not changed after your analysis. Document the extraction process with exact commands for reproducibility.

Step 2: Normalize Timestamps to a Common Reference

Timestamps from different sources may use different timezone conventions. OPC core properties use UTC. File system timestamps use local time. Email headers may use the sender's timezone. All timestamps must be normalized to a single reference (typically UTC) before comparison.

Common Timezone Pitfalls

  • core.xml timestamps are UTC (ending in Z)
  • • ZIP timestamps use local time of the creating machine
  • stat output uses the examiner's local timezone
  • • Windows file properties display in local time
  • • Daylight saving transitions can create apparent 1-hour shifts

Normalization Checklist

  • • Identify the timezone of the creating machine
  • • Convert all timestamps to UTC
  • • Note DST transitions that fall within the timeline
  • • Record the original timezone alongside the UTC value
  • • Flag any timestamps that lack timezone information

# Convert a local timestamp to UTC (macOS/Linux)

# If the creating machine was in US Eastern (UTC-5):

TZ="America/New_York" date -d "2026-01-08 09:45:00" -u

# Output: Thu Jan 8 14:45:00 UTC 2026

 

# The ZIP timestamp of 09:45 local = 14:45 UTC

# The core.xml timestamp should show 14:45:00Z

# If they match: timestamps are consistent

# If they differ: one layer has been manipulated

Step 3: Assess Source Reliability

Not all temporal data sources are equally reliable. Some are trivially manipulated; others are deeply embedded and rarely altered even by sophisticated actors. Assigning reliability tiers to each data source allows you to weight evidence appropriately.

SourceReliabilityManipulation DifficultyNotes
External server logsHighRequires server accessEmail servers, cloud platforms, backup systems
AppVersion / XML namespacesHighRequires XML knowledgeRarely manipulated; version-locks the document
Shared string orderingHighRequires deep XML editingReordering would break cell references
ZIP entry timestampsMediumRequires ZIP toolsRarely manipulated; low precision (2-second)
TotalTimeMediumRequires XML editingCumulative; hard to fake a realistic value
OPC core timestampsMediumSimple XML editEasy to change but often forgotten by fraudsters
File system timestampsLowTrivial with OS toolsReset by copying, downloading, syncing

Step 4: Construct the Chronology

With all temporal data extracted, normalized, and assessed for reliability, construct the timeline as a sequence of events. Each event should include the timestamp, the source of the timestamp, the reliability tier, and any corroborating or contradicting evidence.

Timeline Entry Template

Event: Document created

Timestamp: 2025-09-15T14:23:00Z

Source: dcterms:created (core.xml)

Reliability: Medium

Corroboration:

+ AppVersion 16.0300 existed at this date (consistent)

+ Email sent 2025-09-15T15:01:00Z with file attached

- File system creation date is 2026-01-05 (inconsistent,

but explainable by file copy operation)

Confidence: High (corroborated by email record)

Key Principles for Timeline Construction

  • Anchor with high-reliability sources first: Start with external records and version-locked artifacts, then fill in with less reliable sources
  • Flag contradictions explicitly: When two sources disagree, record both and explain which you consider more reliable and why
  • Distinguish established facts from inferences: "The file was emailed at 15:01 UTC" is a fact; "the file was likely created shortly before the email" is an inference
  • Account for innocent explanations: A timestamp discrepancy might indicate fraud, but it might also indicate file copying, timezone confusion, or clock skew

Cross-File Timeline Analysis

Many investigations involve multiple Excel files that are claimed to have been created independently at different times. Cross-file analysis can reveal hidden relationships between files—shared origins, common authors, simultaneous creation sessions, or copy-and-modify patterns—that contradict the claimed narrative.

Detecting Common Origin

Files derived from a common template or parent file share structural DNA that betrays their relationship, even if their timestamps have been manipulated to appear independent.

Shared String Table Overlap

If two files share an unusual sequence of strings in the same order at the beginning of their shared string tables, they were likely derived from the same template or one was copied from the other. The probability of two independently created files having identical string table prefixes is extremely low.

Style Fingerprinting

The styles.xml file accumulates formatting definitions over a document's lifetime. If two files have identical or near-identical style definitions—including the same custom number formats, the same font list, and the same cell format combinations—they share a common ancestor. This is a strong indicator even if the visible content differs.

Theme and Template Identity

The xl/theme/theme1.xml file is copied from the template when a workbook is created. Files created from the same template have identical theme files. If two supposedly independent files have byte-identical theme files with non-default customizations, they share an origin.

# Compare shared string tables between files

diff <(head -20 file1_extract/xl/sharedStrings.xml) \

     <(head -20 file2_extract/xl/sharedStrings.xml)

 

# Compare style definitions

diff file1_extract/xl/styles.xml file2_extract/xl/styles.xml

 

# Compare theme files (byte-level)

md5sum file1_extract/xl/theme/theme1.xml \

      file2_extract/xl/theme/theme1.xml

 

# Compare printer settings (binary)

md5sum file1_extract/xl/printerSettings/*.bin \

      file2_extract/xl/printerSettings/*.bin 2>/dev/null

Detecting Simultaneous Creation

When multiple files are fabricated in a single session but claim different creation dates, temporal analysis across the file set reveals the clustering.

Clustering Indicators

  • • ZIP timestamps cluster within minutes despite claiming weeks or months apart
  • • Same AppVersion build number across all files (updates happen frequently)
  • • Identical dc:creator across files claiming different authors
  • • File system creation times within the same session window
  • • Total editing times are all very short (single-session creation)

TotalTime Distribution Analysis

  • • Compare TotalTime across all files
  • • Files created in one batch have similar, short editing times
  • • Legitimate files from different periods show varied editing times
  • • Sum of all TotalTime values should be plausible for one person
  • • If 10 files each show 5 min total = 50 min batch creation session

# Extract TotalTime from multiple files for comparison

for file in *.xlsx; do

  dir="${file%.xlsx}_ext"

  mkdir -p "$dir" && unzip -o "$file" -d "$dir" >/dev/null 2>&1

  time=$(grep -oP '<TotalTime>\K[^<]+' "$dir/docProps/app.xml" 2>/dev/null)

  created=$(grep -oP 'created[^>]*>\K[^<]+' "$dir/docProps/core.xml" 2>/dev/null)

  echo "$file | Created: $created | TotalTime: ${time:-N/A} min"

done

Establishing Document Derivation Chains

When one file was created by modifying another, the derivation relationship can often be established through metadata comparison. This is critical in intellectual property disputes, where proving that Document B was derived from Document A establishes priority and ownership.

Derivation Evidence Hierarchy

  • Identical creation timestamps: If File B has the same dcterms:created as File A but a later dcterms:modified, File B is likely a modified copy of File A
  • Shared string table prefix: If File B's string table begins with File A's complete string table plus additional entries, File B was built from File A
  • Style superset: If File B's styles contain all of File A's styles plus additional ones, File B extends File A
  • Author transition: If File A's dc:creator matches File B's dc:creator but File B's cp:lastModifiedBy is different, File B was created by one person and modified by another
  • Printer setting inheritance: If File B carries printer settings from a printer that the claimed author has never used, the file was inherited from someone who has

Detecting Manipulated Timelines

When someone has deliberately altered timestamps to construct a false timeline, the manipulation leaves characteristic patterns. These patterns arise because it is nearly impossible to modify all temporal data sources consistently—there are too many independent locations recording time information.

Red Flags for Timeline Manipulation

Timestamp Layer Disagreement

The most common manipulation pattern: core.xml timestamps have been edited to show desired dates, but ZIP entry timestamps still show the actual save date. A document claiming creation in June with ZIP entries dated in December has had its core properties manipulated.

TotalTime Impossibility

A file with a dcterms:created of January 1 and a dcterms:modified of June 30 (182-day span) but a TotalTime of 3 minutes cannot have been actively used over that period. The editing time implies the document was open for only one brief session, contradicting the claimed six-month lifespan.

Version Anachronism

The AppVersion identifies an Excel version that did not exist at the claimed creation date. Similarly, XML features, namespace URIs, or function usage (like XLOOKUP) that were introduced after the claimed date prove the timeline is fabricated.

Creation After Modification

If dcterms:created is later than dcterms:modified, the creation timestamp has been manually set. Excel never naturally produces this condition because modification always occurs at or after creation.

Unnatural Timestamp Precision

Legitimate Excel timestamps typically have second-level precision with non-round values (e.g., 14:23:17Z). Manually entered timestamps often use round values like 14:00:00Z or 09:30:00Z. A pattern of suspiciously round timestamps across core properties suggests manual editing.

Structural Immaturity

A document claiming a long editing history but showing the structural simplicity of a newly created file: minimal styles, no orphaned strings, no stale named ranges, no printer settings, and a perfectly ordered shared string table. The structural age does not match the claimed chronological age.

Consistency Matrix Analysis

For a systematic approach to detecting manipulation, construct a consistency matrix that cross-references every temporal source against every other source. A legitimate file shows consistency across the matrix; a manipulated file shows a characteristic pattern of isolated inconsistencies.

Sourcecore.xmlZIP datesTotalTimeAppVersionStructureFile system
core.xmlCONFLICTCONFLICTCONFLICTCONFLICTOK
ZIP datesCONFLICTOKOKOKOK
TotalTimeCONFLICTOKOKOKOK

Reading the matrix: In this example, core.xmlconflicts with every other source, but all other sources are consistent with each other. This pattern isolates core.xml as the manipulated layer—someone edited the document properties XML to show false dates, but the ZIP timestamps, editing time, application version, and structural artifacts all tell the true story.

Case Study: Reconstructing a Financial Reporting Timeline

To illustrate how these techniques work together, consider a scenario involving a financial reporting dispute. A company is being investigated for submitting altered quarterly reports. The investigation has five Excel files that are claimed to represent five months of independent financial reporting.

Phase 1: Initial Metadata Extraction

The five quarterly report files claim creation dates spanning January through May 2025. Extracting metadata from all five files reveals the following:

FileClaimed CreatedZIP DateTotalTimeCreator
Q1-Jan-Report.xlsx2025-01-312025-06-124 minjsmith
Q1-Feb-Report.xlsx2025-02-282025-06-126 minjsmith
Q1-Mar-Report.xlsx2025-03-312025-06-125 minjsmith
Q1-Apr-Report.xlsx2025-04-302025-06-123 minjsmith
Q1-May-Report.xlsx2025-05-312025-06-124 minjsmith

Immediate red flags: All five files have the same ZIP entry date (June 12, 2025) despite claiming creation dates spanning January through May. All have TotalTime values of 3–6 minutes—impossibly short for monthly financial reports that should each involve hours of work. All share the same creator.

Phase 2: Structural Comparison

Deeper analysis of the five files reveals additional evidence of batch creation:

Findings

  • • All five files have identical styles.xml content—the same styles, same order, same count. Independently created monthly reports would accumulate different formatting over time
  • • All five theme files are byte-identical, including the same custom color modifications
  • • The shared string tables begin with the same 47 header strings in the same order, followed by month-specific data. This indicates they were created from a common template in a single session
  • • None of the five files contain printer settings, hidden sheets, named ranges, or comments—structural artifacts that accumulate in documents used in normal business operations
  • • The AppVersion in all five files is 16.0089, a build number released in May 2025. The January and February files could not have been created with this version

Phase 3: Reconstructed Timeline

Based on the convergence of evidence from multiple independent sources, the reconstructed timeline tells a very different story from the claimed narrative:

June 12, 2025

All five files created in a single session

User "jsmith" created all five files in approximately 22 minutes total (sum of TotalTime values), using a common template. The core.xmlcreation dates were manually edited to show January through May. The ZIP timestamps and AppVersion confirm the true creation date.

June 12, 2025

Files submitted as historical records

Email records show all five files were attached to a single email sent at 16:42 UTC on June 12—consistent with the ZIP timestamps and inconsistent with the claimed creation dates.

Jan–May 2025

No evidence of document activity

No email records, file server logs, backup snapshots, or print logs show any of these files existing before June 12. The claimed five-month history is unsupported by any external evidence.

Conclusion: Seven independent lines of evidence (ZIP timestamps, TotalTime values, AppVersion dating, structural identity, string table patterns, absence of editing artifacts, and email timing) converge on the same conclusion: all five files were fabricated on June 12, 2025, and backdated to create a false appearance of monthly reporting.

Timeline Reconstruction Checklist

Complete Investigation Process

Preserve evidence and establish chain of custody

Hash all original files (SHA-256). Create forensic copies. Document how and when files were received. Work only on copies.

Extract all six categories of temporal data from each file

OPC core properties, extended properties (TotalTime, AppVersion), ZIP entry timestamps, file system timestamps, internal structural artifacts, and external correlation records.

Normalize all timestamps to UTC

Identify the timezone of each source. Convert everything to UTC. Note DST transitions. Record original values alongside converted values.

Build the consistency matrix

Cross-reference every temporal source against every other source. Identify consistent clusters and isolated outliers. The outlier sources are the manipulated ones.

Perform cross-file analysis (if multiple files)

Compare string tables, styles, themes, printer settings, and author metadata across files. Identify common origins, simultaneous creation, and derivation chains.

Validate against external records

Cross-reference with email logs, cloud version history, backup snapshots, file server audits, and print logs. External records provide the strongest anchoring points.

Construct the reconstructed timeline

Build the chronology event by event, noting the source, reliability, corroboration, and confidence for each entry. Distinguish facts from inferences.

Document findings for legal or audit use

Include raw metadata values, extraction commands, hash values, consistency matrices, and the complete reconstructed timeline. Ensure reproducibility by a qualified peer reviewer.

Conclusion

Timeline reconstruction from Excel metadata is one of the most powerful techniques in document forensics because it leverages the fundamental architecture of the XLSX format. Every Excel file records its history in multiple independent locations: document properties, ZIP archive metadata, editing time counters, application version stamps, and structural artifacts in the XML layers. These sources are written by different mechanisms, stored in different locations, and require different tools and knowledge to manipulate.

The strength of a reconstructed timeline comes from convergence. A single timestamp can be wrong for innocent reasons—clock skew, timezone confusion, file copying. But when ZIP entry dates, TotalTime values, AppVersion dating, structural maturity analysis, cross-file comparison, and external records all point to the same conclusion, the convergence of independent evidence creates a timeline that is difficult to dispute.

For investigators, the key insight is that manipulating a timeline requires changing not just one timestamp but every temporal trace across all six categories of evidence. Most people who alter document timestamps only modify the most visible layer—the document properties in core.xml—while leaving the deeper layers untouched. It is this gap between what was manipulated and what was overlooked that makes timeline reconstruction possible, and it is the systematic extraction and correlation of all temporal evidence that makes it reliable.

Reconstruct Document Timelines with Metadata Analysis

Use our metadata analyzer to extract timestamps, author records, and structural artifacts from Excel files and build forensically sound document timelines