Every XLSX with a single live formula carries a small, machine-generated file called xl/calcChain.xml. It is not a feature most users have ever heard of, and it does not appear anywhere in the Excel UI. But it is one of the most informative metadata layers in the entire workbook: a precise, topologically sorted record of every formula cell in the file, in the exact order Excel last evaluated them. The chain exposes which sheets feed which, where deleted formulas used to live, which cells participate in array calculations, which formulas Excel believes are volatile, and which calculation engine version produced the file. This post walks through what the calculation chain is, where it lives, what each attribute means, what it leaks, and how to actually clean or strip it before sharing workbooks externally.
When Excel opens a workbook with formulas, it has to decide which cell to compute first. =A1+B1 cannot be evaluated until both A1 and B1 are known. If A1 itself contains =Inputs!$D$2*1.1, the engine has to evaluate the input first. Multiply this across thousands of cells and dozens of sheets, and you have a directed acyclic graph that the calculation engine has to topologically sort before it can run any number through any formula.
Excel does that sort once, at save time, and writes the result into xl/calcChain.xml. The next time the file opens, the engine does not have to re-derive the dependency graph from scratch — it walks the chain in order and recomputes whatever has changed. That is a pure optimization. But it is also a remarkably detailed snapshot of how the workbook is wired internally and how the formulas were last evaluated.
No Excel ribbon command creates, edits, or removes calcChain.xml directly. It is generated automatically by the calculation engine and rebuilt whenever Excel decides the cached order is stale. From a privacy standpoint that means it is also never sanitized as a side effect of the things users do see — clearing a cell, deleting a sheet, or saving to a new filename. The chain just gets quietly updated to reflect the new state, often retaining traces of the old one.
An XLSX is a ZIP archive. Inside it, the calculation chain occupies a single file at the path xl/calcChain.xml. It is referenced by xl/_rels/workbook.xml.rels with the relationship type .../relationships/calcChain and listed in [Content_Types].xml as a content type override.
// XLSX layout fragment showing where calcChain lives
workbook.xlsx (zip)
├── [Content_Types].xml // declares the calcChain content type
├── xl/
│ ├── workbook.xml
│ ├── calcChain.xml // <-- the calculation chain
│ ├── sharedStrings.xml
│ ├── _rels/
│ │ └── workbook.xml.rels // links workbook to calcChain
│ └── worksheets/
│ ├── sheet1.xml
│ └── sheet2.xml ...
The presence of calcChain.xml is itself a small piece of metadata. Workbooks created by some non-Microsoft writers (older versions of openpyxl, xlsxwriter, LibreOffice, certain server-side libraries) do not emit a calculation chain at all; the file still opens fine in Excel, which simply rebuilds the chain on first open. A workbook missing calcChain.xml is a strong signal that the file did not last pass through Microsoft Excel before being saved.
The XML is a simple flat list. The root element is <calcChain>, and each child <c> represents one formula cell in the order Excel evaluated it.
// Excerpt from xl/calcChain.xml
<calcChain xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<c r="B2" i="3"/>
<c r="C2" i="3"/>
<c r="D2" i="1" l="1"/>
<c r="D3" i="1" t="1"/>
<c r="F18" i="2" a="1" s="1"/>
</calcChain>
Each <c> element carries a small set of attributes. Most are short single-character names because the chain in a real workbook can contain hundreds of thousands of entries.
| Attribute | Meaning |
|---|---|
| r | Cell reference, e.g. B2. Always present. |
| i | Sheet ID. Index into workbook.xml’s sheet list. Omitted when the entry is on the same sheet as the previous one (a compression trick). |
| l | 1 if the cell is the start of a new thread or chain segment — the engine treats this as a calculation entry point. |
| t | 1 if the cell’s formula is part of a shared formula group (multiple cells sharing one formula expression with offset references). |
| a | 1 if the cell is part of an array formula. Marks dynamic-array spills and legacy CSE arrays alike. |
| s | 1 if the entry should be evaluated in a child chain — nested array calculations and certain volatile chains use this. |
The order of the <c> elements is the calculation order. Walking from top to bottom of calcChain.xml is exactly walking the topological sort Excel produced when it last evaluated the workbook.
The chain is a topological sort, which means dependencies always come before the cells that use them. That single property gives a reader a compact map of the model:
i values cluster at the top of the chain on sheet 4 and at the bottom on sheet 1, sheet 4 is an input source and sheet 1 is downstream.l="1" are roots — calculation entry points where Excel started a new thread of dependent evaluation.i changes from one row to the next indicate heavy cross-sheet dependencies; long runs without a sheet change indicate a self-contained sheet doing local arithmetic.In a model the author considers proprietary, the calculation chain is effectively a free architecture diagram. Sheets do not need readable names — the order of evaluation between them tells the recipient which feeds which, where the bottleneck calculations are, and which inputs the entire model rests on. A competitor who reads the chain can rebuild the structure of the model from scratch, even if the formulas themselves are unintelligible.
Excel maintains calcChain.xml incrementally. When a formula is deleted, the corresponding entry is supposed to disappear; in practice, several deletion paths leave the chain inconsistent. Common causes of phantom entries:
A user pastes-special “values” over a formula range. The cell now contains a number, but if the workbook closes before a full recalc, the chain entry can persist for a save or two.
Some Excel builds and certain third-party libraries delete a sheet without immediately rewriting calcChain.xml. References to i values that no longer correspond to any sheet sit in the chain until Excel rebuilds it — which only happens on the next full recalc.
Files reopened after a crash or recovered through AutoRecover often carry chain entries that no longer match the current worksheet contents. The chain represents the state of calculations before the crash, not after.
Workbooks edited outside Excel — with scripts, server-side libraries, or text editors — often leave the chain stale. The worksheet XML and the calculation chain disagree, sometimes for the entire remaining life of the file.
Forensically, phantom calcChain entries are evidence that formulas used to exist at specific cell coordinates. The entry does not contain the formula text — that lived in the worksheet XML and is gone — but it pins down the precise cell that once held a formula and whether it was an array, shared, or volatile calculation. Combined with the orphaned shared strings and phantom defined names this paints a remarkably complete picture of what was removed.
Each entry’s i attribute is an index into the workbook’s ordered sheet list. The i value is small and dense, but combined with the order of entries it describes the entire dependency graph between sheets without ever quoting a formula.
// Reading sheet transitions in calcChain.xml
sheet i=4 (Inputs) : 187 entries // leaves
sheet i=2 (Assumptions) : 42 entries // depends on i=4
sheet i=5 (Pricing) : 1,204 entries // depends on i=2, i=4
sheet i=3 (CommissionPlan): 318 entries // depends on i=5
sheet i=6 (BoardDeck) : 27 entries // depends on everything above
sheet i=1 (Cover) : 4 entries // top-level outputs
The chain tells a reader which sheet is upstream of which other, even when the sheets have neutral names like Sheet1 through Sheet6, or have been deliberately renamed to obscure their roles. The structure is in the order, not the labels.
Volatile functions — NOW(), TODAY(), RAND(), RANDBETWEEN(), OFFSET(), INDIRECT(), CELL(), and certain forms of INFO() — recompute on every calculation cycle regardless of whether their inputs changed. Excel marks the cells that contain them, and every cell that transitively depends on them, as part of the volatile-recalc chain.
In calcChain.xml, volatile cells often appear with the l="1" flag because the engine starts a new evaluation thread from each volatile root. A reader counting l="1" entries gets a quick proxy for how volatile the workbook actually is — far more precise than scanning worksheets for function names, which misses indirect dependencies.
Heavy use of OFFSET and INDIRECT is often a signature of older modeling styles that predate dynamic arrays and structured references. The volatility footprint in calcChain.xml — visible without ever reading a formula — tells a recipient roughly when the model was built and how disciplined the author was about avoiding recalculation costs. In due-diligence contexts, this is a quick fingerprint of model quality.
The a, t, and s attributes encode the kind of formula at the cell. Reading them in aggregate gives a sense of how the workbook was authored:
t="1" entries — the workbook relies on shared formula groups, which Excel emits when a single formula is filled across many cells. Common in classic financial models.a="1" entries — the workbook uses array formulas, including dynamic-array spills (FILTER, SEQUENCE, UNIQUE, SORT, XLOOKUP with array results). A clear fingerprint of a modern Microsoft 365 build.s="1" entries — the workbook contains nested array calculations, often produced by LAMBDA or LET. A signature of advanced authors.The flags do not reveal which formulas the cells contain, but they do reveal the style of the model: legacy versus modern, simple versus advanced, cell-by-cell versus array-driven. For a recipient evaluating the sophistication of a financial model or the maintainability of a calculation, that signal is a useful starting point.
The chain is plain XML inside the ZIP. You can audit it in seconds without ever opening Excel. Three approaches:
# 1. Quick command-line look at the chain
unzip -p workbook.xlsx "xl/calcChain.xml" | \
xmllint --format -
# 2. Count entries per sheet to see the shape of the model
unzip -p workbook.xlsx "xl/calcChain.xml" | \
grep -oE 'i="[0-9]+"' | sort | uniq -c
# 3. Find calculation roots (entries marked l="1")
unzip -p workbook.xlsx "xl/calcChain.xml" | \
grep -oE '<c[^/]*l="1"[^/]*/>'
For programmatic processing, a small Python script gives you a structured view:
# Walk every entry in calcChain.xml and group by sheet
import zipfile, xml.etree.ElementTree as ET
from collections import Counter
with zipfile.ZipFile("workbook.xlsx") as z:
xml = z.read("xl/calcChain.xml").decode()
ns = {"m": "http://schemas.openxmlformats.org/spreadsheetml/2006/main"}
root = ET.fromstring(xml)
cur_sheet, by_sheet, flags = None, Counter(), Counter()
for c in root.findall("m:c", ns):
cur_sheet = c.get("i", cur_sheet)
by_sheet[cur_sheet] += 1
for attr in ("l", "t", "a", "s"):
if c.get(attr) == "1": flags[attr] += 1
print("Entries per sheet:", by_sheet)
print("Flag totals:", flags)
The output is a compact summary of the model: how many formula cells are on each sheet, how many calculation roots exist, and how heavily array, shared, and child-chain mechanisms are used. None of this requires opening Excel, none of it requires the user’s permission, and none of it is visible to the author of the file unless they specifically look for it.
In real audits, the calculation chain reveals a consistent set of things authors did not realize they were sharing:
i entries in the chain, and its calculation order relative to other sheets reveals it as the source of feeder values the recipient is not supposed to see.i values that no longer correspond to any sheet in workbook.xml — clear evidence the workbook used to be larger.l="1" entries is leaking that the model recalculates aggressively on every change — usually a sign of INDIRECT-heavy design that a recipient can use to time how much CPU the model demands.s attributes, ordering of array spills, omission of redundant i values) varies by Excel build — another fingerprint of the author’s Office version.The Document Inspector in Excel does not remove or sanitize calcChain.xml. Saving the file does not regenerate it from scratch unless the engine has decided the chain is stale. Cleaning it requires explicit action.
Press Ctrl+Alt+F9 (or Cmd+Option+F9 on macOS) before saving. Excel will rebuild the chain from scratch, which usually clears phantom entries from cells whose formulas have been replaced with values.
For deliverables that should ship without any live formulas, paste-special values over every formula cell, then delete xl/calcChain.xml from the ZIP. Excel does not require the chain to open the file — if the workbook contains any formulas it will rebuild on first open, and if there are none, the chain stays absent.
A short Python script removes calcChain.xml and the corresponding relationship and content-type entries:
import zipfile, shutil, re
from pathlib import Path
src, dst = Path("in.xlsx"), Path("out.xlsx")
with zipfile.ZipFile(src) as zin, \
zipfile.ZipFile(dst, "w", zipfile.ZIP_DEFLATED) as zout:
for item in zin.infolist():
data = zin.read(item.filename)
if item.filename == "xl/calcChain.xml": continue
if item.filename == "xl/_rels/workbook.xml.rels":
data = re.sub(rb"<Relationship[^/]*calcChain[^/]*/>", b"", data)
if item.filename == "[Content_Types].xml":
data = re.sub(rb"<Override[^/]*calcChain[^/]*/>", b"", data)
zout.writestr(item, data)
The script removes the chain file and the references to it. Excel happily rebuilds the chain on the next open if it needs one.
For internal pipelines, switching workbook.xml’s <calcPr fullCalcOnLoad="1"/> ensures every recipient’s Excel rebuilds the chain on first open. This does not strip the existing chain on its own, but it pairs well with deletion: a freshly-rebuilt chain on the recipient’s machine carries no traces of phantom entries from the author’s file.
Run this checklist against any workbook with formulas leaving your organization:
Ctrl+Alt+F9 to force a full recalc and rebuild the chain before saving?<c> entries pointing to cells whose formulas I replaced with values?i values reference sheets that have been deleted from workbook.xml?xl/calcChain.xml entirely?xl/workbook.xml defined names and xl/sharedStrings.xml for the parallel set of leaks the chain alone does not cover?xl/calcChain.xml is the most invisible layer of the workbook to its author and one of the most informative to anyone who unzips the file. It is generated automatically, never surfaced in the UI, never sanitized by the Document Inspector, and almost never thought about — but it carries the precise topological order of every formula cell, the cross-sheet dependency graph, traces of deleted formulas, the volatility footprint, and a fingerprint of the calculation engine that produced it.
For workbooks that are meant to ship as static deliverables, the simplest defense is to convert formulas to values and delete the chain entirely. For workbooks that need to remain live, force a full recalculation before save and treat the chain as a piece of metadata that travels with the file — one that future-proofs the workbook’s performance for the recipient and, in equal measure, hands them a free architecture diagram of how the model is wired.
The parallel symbol-table layer that calcChain entries refer into.
Where every metadata layer lives inside an XLSX, including the calculation chain.
How phantom calcChain entries pair with orphaned shared strings to reconstruct deleted work.