A cell that reads ='[Budget_2026.xlsx]Summary'!$B$4 looks innocent. What Excel actually stores to make that formula work is a full description of the source workbook — its UNC path, the colleague’s username embedded in the path, the sheet name, every cell referenced, and a cached copy of every value that was last pulled. This post walks through where external link metadata lives inside an XLSX, what it exposes, and how to actually remove it before sharing.
Excel calls any formula that reads a value from a different workbook an external reference or link. The most common forms are formulas like ='[Budget_2026.xlsx]Summary'!$B$4, =SUM('C:\Users\jsmith\Docs\[Regional.xlsx]North'!A:A), or =VLOOKUP(A2, '\\hq-fs01\finance\[Rates.xlsx]Rates'!A:C, 3, FALSE). Less visibly, a LINK field in a chart series, a conditional format that references another file, or a named range that resolves to an external workbook — all of these are external links too.
For each unique source workbook, Excel writes a dedicated part to the XLSX package describing that source: where it lives, which sheet is referenced, which cells are touched, and — critically — a cached copy of the values that were last retrieved. The link survives even after the formula that created it is deleted, because Excel treats the link as a workbook-level object, not a cell-level one.
xl/externalLinks/externalLink1.xml — one file per unique source workbook, naming every sheet and cached value referencedxl/externalLinks/_rels/externalLink1.xml.rels — the relationship file holding the actual URL or file path pointing at the sourcexl/workbook.xml — the <externalReferences> element listing every externalLink id the workbook knows aboutxl/calcChain.xml — calculation ordering hints that name every cell that depends on an external link, even when the formula is compressedUsers assume “Break Links” in Data → Edit Links fully deletes the reference. It does not delete the externalLink part — it replaces the formula with its cached value, leaves the externalLink XML in place if any reference elsewhere in the workbook (charts, named ranges, conditional formats, validation lists) still touches the source, and in many cases leaves orphaned entries pointing at source workbooks that the author no longer even has access to. The link ships with the file until it is deliberately stripped.
The externalLink XML is where Excel writes a structured description of the source. Open one in a text editor and you see something like this:
// Representative excerpt from xl/externalLinks/externalLink1.xml
<externalLink xmlns="…/spreadsheetml/2006/main">
<externalBook r:id="rId1">
<sheetNames>
<sheetName val="Summary"/>
<sheetName val="Q1_Forecast_DRAFT"/>
<sheetName val="DO_NOT_SHARE"/>
</sheetNames>
<sheetDataSet>
<sheetData sheetId="0">
<row r="4">
<cell r="B4" t="n"><v>4823117.55</v></cell>
</row>
</sheetData>
</sheetDataSet>
</externalBook>
</externalLink>
Notice what just leaked. The reference named one sheet — Summary — but the XML enumerated all three sheet names from the source workbook, because Excel needs the ordered list to resolve sheetId="0". The recipient now knows the source workbook has tabs called Q1_Forecast_DRAFT and DO_NOT_SHARE — names the author almost certainly intended to keep private.
The cached value 4823117.55 is present even if the formula ='[Budget_2026.xlsx]Summary'!$B$4 was later replaced by the user with "[redacted]" on the worksheet. The cache is what Excel uses to display a value when the source is unavailable; it is written once at save time and not cleared when the visible formula changes.
The externalLink XML describes what is referenced. The relationship file at xl/externalLinks/_rels/externalLink1.xml.rels describes where — and this is where the most damaging metadata lives.
// A UNC-path reference as written to externalLink1.xml.rels
<Relationships>
<Relationship Id="rId1"
Type="…/externalLinkPath"
Target="file:///\\hq-fs01.contoso.local\finance\2026\working\Budget_2026.xlsx"
TargetMode="External"/>
</Relationships>
Three lines of XML just handed the recipient a map of your filesystem: the internal DNS name of a file server, the department that owns it, the year/scenario the working budget is kept under, and the exact filename of the authoritative source. An attacker who obtains this workbook, and who also has a foothold inside the network, knows precisely where to go next.
The variants are worse. Each is common, each is written verbatim:
| Source Location | What the Path Discloses |
|---|---|
| C:\Users\jsmith\OneDrive - Contoso\Finance\… | The author’s Windows username (jsmith), their employer, and the folder structure of their OneDrive |
| \\hq-fs01.contoso.local\finance\… | Internal DNS name, departmental share, internal folder hierarchy |
| https://contoso.sharepoint.com/sites/finance/… | Tenant URL, site path, library and folder names, sometimes document IDs |
| /Users/jsmith/Library/CloudStorage/Box-Box/… | Mac username, Box tenant presence, folder structure inside Box |
| \\tsclient\c\projects\… | The user was working inside a Remote Desktop session, mounting their local machine into the server |
| Z:\shared\clients\AcmeCorp\… | A mapped drive letter, plus the name of a specific client — often disclosing client relationships |
A link that points at a path the recipient cannot reach is not a link that was removed. Excel shows an “update links” prompt, the recipient clicks “Don’t Update,” and the workbook opens as if nothing is wrong. The broken rels file is still inside the XLSX, and the reason it is broken — a path the recipient cannot see — is now fully visible to them as text. Broken links leak more aggressively than working ones because users stop paying attention to them.
The sheetDataSet inside each externalLink is not a pointer — it is a snapshot. Every cell that the referencing workbook ever touched in the source is written in full, with the value that was current the last time the link was refreshed. A VLOOKUP against a rate table caches the entire range of the rate table. A SUMIF over a column caches every cell in that column. A chart series that references an external range caches the whole series.
That means a workbook with a single visible formula =VLOOKUP(A2, '[Salaries.xlsx]Sheet1'!A:C, 3, FALSE) can easily ship with several thousand cached rows from the source salary table — the rows the user never saw, never intended to share, and does not realize are in the file. Open the externalLink XML and they are there, row by row, value by value.
Take a 12-row workbook whose only external reference is =VLOOKUP(A2, '[Compensation.xlsx]Grades'!A:G, 4, FALSE) copied down 12 times. Save it. Unzip it. Read xl/externalLinks/externalLink1.xml. Count the <cell> elements. In a typical compensation workbook the count lands in the low thousands — because Excel caches the full range A:G across every row that VLOOKUP might need to hit. The visible workbook is 12 rows. The shipped workbook carries a grade band for every job family the source defined.
The pivot cache problem is well-known: a pivot ships with a full copy of its source. External links have the same shape of risk but come from a different code path — they cache source-workbook cells, not source-sheet cells. A workbook can be clean of pivot caches and still disclose thousands of rows through externalLinks. Treat the two as independent exposure surfaces.
External links are not only formulas. Named ranges and defined names can also resolve across workbooks, and they are far easier to miss because they never appear in any cell. A defined name like FX_Rates whose formula is ='\\hq-fs01\treasury\[FX.xlsx]Rates'!$A:$C puts the UNC path and filename into the workbook’s name table even if no cell uses the name.
Excel also writes a <definedNames> section inside each externalLinkN.xml part enumerating the names that live in the source workbook. So a recipient reading the file learns:
Four representative scenarios to make the risk concrete:
A consultant builds an engagement deliverable for Client A. One formula uses a VLOOKUP into a reference rate card that the consultant maintains in their private OneDrive. The formula is later replaced with Paste Values; the consultant believes the link is gone. The deliverable is emailed to Client A.
Leak: The externalLink XML is still present, the rels file still carries the path C:\Users\consultant\OneDrive - BigFirm\Rates\ClientB_Rates.xlsx, and the sheetNames list advertises the existence of a competitor client’s pricing sheet in the same folder. Client A now knows Client B is a client of the same consultant.
A corporate development team prepares a one-page financial summary for a potential acquirer. The summary is built in a workbook that references the master model. The team generates the teaser, emails it to a shortlist of acquirers, and notices too late that Excel prompted the acquirer’s machine to “Update Links?”
Leak: The rels file contains the path to the master model, including the project codename (\\corpdev\projects\Project_Falcon\model_v47.xlsx). Every acquirer now knows the internal name of the deal, the version of the model, and that the seller is on version 47.
A recruiter maintains a master candidate database in one workbook and prepares a shortlist for a client in another, using lookups to pull name, role, and expected compensation from the master.
Leak: The shortlist’s externalLink XML caches every candidate in the master — not just the shortlisted ones — because the VLOOKUP is range-based and Excel caches the whole range. The client receives a six-row shortlist and a file containing 400 cached candidate rows, including rejected applicants, salary expectations, and private notes.
A government agency releases a redacted budget workbook under a public records request. The visible cells are redacted. The external links are not.
Leak: The rels files reference UNC paths into a restricted internal share containing the full unredacted working budgets, the cached values include non-redacted totals, and the sheetNames list enumerates tabs like Layoffs_Scenario_B. A journalist files a follow-up request naming those specific files and tab names, which the agency can no longer argue it does not hold.
A useful inspection does three things: enumerate every link, resolve every path, and count the cached cells. Three practical approaches:
Open the workbook and go to Data → Edit Links (grayed out if no links exist — a quick negative signal). The dialog lists every source the workbook depends on. Clicking Check Status tells you which are reachable and which are broken. This is necessary but not sufficient: Edit Links does not show cached values, does not always show orphaned externalLinks that no visible formula references, and does not show the path components that will leak in the rels file.
Rename a copy of the XLSX to .zip, extract, and read the files directly:
# List every external link part in the package
unzip -l report.xlsx | grep externalLink
# Extract every target path from the rels files
for f in xl/externalLinks/_rels/*.rels; do
echo "--- $f"
grep -oE 'Target="[^"]+"' "$f"
done
# Count cached cells in each externalLink — large counts mean lots of leaked data
for f in xl/externalLinks/externalLink*.xml; do
printf "%s: %d cached cells\n" "$f" "$(grep -c '<cell ' "$f")"
done
A dedicated metadata tool such as MetaData Analyzer enumerates every externalLink, classifies each path (local, UNC, cloud, tsclient), reports cached cell counts, flags broken-but-still-present references, and produces a pre-share report — the only approach that scales beyond a handful of files.
There is no single button that removes all traces of an external link. Depending on what the file needs to preserve, combine the following:
xl/externalLinks/ folder exists. If it does, something else in the workbook (a chart, a named range, a conditional format, a data validation list, a pivot table) is still referencing the source and keeping it alive.Break Links is a good start but routinely leaves orphaned references. Always verify at the ZIP level. Pay particular attention to charts and named ranges — these are the two most common reasons a link survives.
[ or contains a path.The queries, named ranges, charts, and external links do not come along. For most external deliverables, this is the right default — accept the loss of interactivity for the gain of certainty.
For pipelines — DLP integrations, CI/CD exports, pre-send scanners — remove the externalLinks parts directly and update the package relationships. A minimal sketch:
# Strip external link parts and their references from an XLSX
import zipfile, re, shutil
def strip_external_links(src, dst):
with zipfile.ZipFile(src) as zin, zipfile.ZipFile(dst, "w", zipfile.ZIP_DEFLATED) as zout:
for item in zin.infolist():
if item.filename.startswith("xl/externalLinks/"):
continue
data = zin.read(item.filename)
if item.filename in ("xl/workbook.xml", "xl/_rels/workbook.xml.rels", "[Content_Types].xml"):
data = re.sub(rb'<externalReference[^/>]*/>', b'', data)
data = re.sub(rb'<externalReferences>.*?</externalReferences>', b'', data, flags=re.S)
data = re.sub(rb'<Relationship[^/>]*externalLink[^/>]*/>', b'', data)
data = re.sub(rb'<Override[^/>]*externalLink[^/>]*/>', b'', data)
zout.writestr(item, data)
Removing the externalLink parts without updating [Content_Types].xml, workbook.xml, and the workbook-level rels will trigger a repair prompt when Excel opens the file. Any cell formula that still references the now-deleted link will become #REF!. For production pipelines, prefer a library like openpyxl that normalizes the package, or finish by having Excel do a Save As once to clean up.
Run through this checklist before releasing any workbook that ever referenced another file, even if you believe the references are gone:
xl/externalLinks/ folder?External links accumulate quietly in every team that builds models across multiple files. Organizations that manage the risk tend to combine several of the following practices:
Set a policy that workbooks leaving the organization must have no external references — enforced by an egress scanner that rejects any XLSX containing an xl/externalLinks/ folder.
For models shared externally, build the whole thing inside one file from the start. Tabs and named ranges scale further than most teams realize, and the metadata risk is dramatically smaller than a multi-file model.
When links are unavoidable internally, reference files on a neutral named share (\\shares\models\) rather than a personal OneDrive. The path that ships with the workbook then reveals much less about individuals and sensitive folder names.
Users do not believe the path is in the file until they see it. A three-minute demo that unzips a familiar workbook and prints the externalLink rels file is worth more than a page of policy text.
An external link looks like a formula. It is really a structured reference to another file, stored in the XLSX as a path, a sheet list, and a cached snapshot of every value that was ever pulled. A recipient of the workbook reads all of it with a text editor.
The fix is neither dramatic nor expensive — break the links, delete the defined names, rebuild charts without cross-file series, verify at the ZIP level, and prefer a values-only copy when the recipient does not need interactivity. But none of it happens by accident. Every one of these steps requires the author to know the problem exists. Treat every workbook that ever linked to another file as if it carried a map of your filesystem and a cached slice of the linked data, because it does — until you deliberately remove it.
How Power Query embeds M code and connection strings in XLSX files.
Why deleting the source sheet does not remove the data.
How XLSX files are structured internally.