Modern Excel ships two completely separate comment systems inside the same workbook. The legacy notes that yellow-stickie veterans remember live in xl/comments1.xml. The newer threaded comments — the ones that look like Word review bubbles, support replies, and let you @-mention colleagues — live in a parallel folder, xl/threadedComments/, alongside a separate directory of personas in xl/persons/person.xml. The persona file is the part nobody looks at. Inside it, each commenter is identified by a long persona ID that resolves directly to a Microsoft 365 tenant GUID, an Active Directory object ID, or an email address — even when the visible display name has been edited or the comment thread itself has been resolved and hidden. This post walks through the structure of threaded comments, what each field reveals about your organisation and your collaborators, why the Document Inspector’s comment toggle does not fully erase the persona layer, and how to strip every trace before sharing workbooks externally.
Excel’s comment story changed in 2018. The legacy “notes” (the yellow stickies attached to a single cell, addressed to nobody, never threaded) still ship for backwards compatibility, but the default in Microsoft 365 is now a thread-aware comment system designed for real-time co-authoring. The two systems coexist in the same workbook, are stored in different folders, and have completely different metadata implications.
| Feature | Legacy notes | Threaded comments |
|---|---|---|
| XML location | xl/comments1.xml | xl/threadedComments/threadedComment1.xml |
| Author identity | Free-text string in authors/author | Persona ID resolved through xl/persons/person.xml |
| Threading | No | Parent / reply tree by GUID |
| Mentions (@) | No | Yes — references additional persona IDs |
| Resolved state | No | Yes — done="1" attribute on the root comment |
| Hidden from default UI when resolved | N/A | Yes — user must explicitly show resolved threads |
| Document Inspector reach | Removed by “Comments and annotations” | Removed unevenly — persons.xml frequently survives |
The mismatch matters because authors who use legacy notes assume their commenter identity is a single editable string, while threaded comments embed a structurally richer identity that points outwards into Microsoft 365 and Active Directory. Strip the visible display name and you have not removed the link.
Workbooks created before threaded comments shipped occasionally carry a transitional layer in xl/commentsExt.xml that pre-dates the formal threadedComments folder. It is structurally similar but uses different namespace URIs. Treat any commentsExt file as part of the same audit surface.
An XLSX is a ZIP archive. Threaded comments and personas live in two parallel folders, both wired to worksheets and to each other through OPC relationships.
// XLSX layout fragment showing threaded comment storage
workbook.xlsx (zip)
├── [Content_Types].xml // declares threadedComment + person content types
├── xl/
│ ├── workbook.xml
│ ├── persons/
│ │ └── person.xml // <-- persona directory
│ ├── threadedComments/
│ │ ├── threadedComment1.xml // <-- per-sheet threads
│ │ └── threadedComment2.xml
│ ├── comments1.xml // legacy notes (often a stub)
│ └── worksheets/
│ ├── sheet1.xml
│ └── _rels/
│ └── sheet1.xml.rels // links sheet1 to threadedComment1.xml
The wiring is deliberately layered. Each worksheet’s _rels/sheetN.xml.rels points to a threadedComment part. That part references persona IDs, which in turn resolve through xl/persons/person.xml. Deleting the threaded comment file alone leaves orphaned personas behind; deleting the persons file alone leaves the threaded comments referring to phantom IDs that Excel will silently render as “Unknown User” while Excel itself still knows the original IDs sat in the file at some point.
xl/persons/person.xml is the single most identity-rich file most XLSX workbooks contain. Each <person> element carries a display name, a persona provider, a stable persona ID, and frequently an email-shaped userId that Excel uses to look the user up in Microsoft Graph.
// xl/persons/person.xml
<personList xmlns="http://schemas.microsoft.com/office/spreadsheetml/2018/threadedcomments">
<person
displayName="Alice Chen"
id="{5f3c8b81-3f17-4a8a-91b8-4c3b71a1e3d9}"
userId="alice.chen@contoso.com"
providerId="AD"/>
<person
displayName="Bob Marlow"
id="{9d8a4f1a-2c5e-4e63-b12c-7f0b2c8c8a13}"
userId="S-1-5-21-3623811015-3361044348-30300820-2113"
providerId="AD"/>
| Attribute | What it carries | What it reveals |
|---|---|---|
| displayName | User-facing label | Full name as it appears in the directory or local profile. |
| id | Per-document persona GUID | A stable identifier used to wire comments to authors. Cross-document correlation by GUID frequently works for the same user across workbooks created in the same session. |
| userId | Provider-specific user identifier | UPN/email for AAD users, NT-style SID for AD users, an SMTP address for ad-hoc people-picker entries, or an objectId GUID for Graph users. The most directly identifying field in the entire workbook. |
| providerId | Identity provider | Common values: AD (Azure AD/AD), PeoplePicker, None. Lets a reader infer whether the author signed in to Microsoft 365 or simply typed a name into a local Office install. |
When the provider is AD and the user signed in to Microsoft 365, userId is the user’s UPN: alice.chen@contoso.com. The domain part is the tenant’s primary domain. A workbook stripped of every creator and lastModifiedBy tag still names the tenant and the individual user the moment one threaded comment exists. For on-premises AD setups, the same field carries the user’s SID, which uniquely identifies the user against the corporate domain controller.
Each xl/threadedComments/threadedCommentN.xml stores all the threads attached to one worksheet. A thread is a chain of <threadedComment> elements bound by a parent ID. The root comment has no parent; each reply names the thread root in its parentId attribute.
// xl/threadedComments/threadedComment1.xml
<ThreadedComments xmlns="http://schemas.microsoft.com/office/spreadsheetml/2018/threadedcomments">
<threadedComment
ref="C7"
dT="2026-04-19T14:32:11.45"
personId="{5f3c8b81-3f17-4a8a-91b8-4c3b71a1e3d9}"
id="{aa11bb22-cc33-44dd-ee55-ff6677889900}"
done="1">
<text>Are we still using the Q3 forecast assumptions here?</text>
</threadedComment>
<threadedComment
ref="C7"
dT="2026-04-19T14:41:02.10"
personId="{9d8a4f1a-2c5e-4e63-b12c-7f0b2c8c8a13}"
parentId="{aa11bb22-cc33-44dd-ee55-ff6677889900}"
id="{bb22cc33-dd44-55ee-ff66-001122334455}">
<text><mentions><mention mentionpersonId="{cc33dd44-ee55-66ff-0011-223344556677}"
mentionId="{dd44ee55-ff66-7700-1122-334455667788}" startIndex="0" length="14"/></mentions>
@Carlos Iglesias confirmed the cost basis was updated yesterday.</text>
</threadedComment>
The fields are individually small, but together they reconstruct a surprisingly complete audit trail of the conversation that produced the file.
| Attribute | Meaning |
|---|---|
| ref | A1-style cell reference the thread is attached to. Reveals which cells attracted discussion and, by extension, which numbers were uncertain. |
| dT | UTC timestamp to hundredths of a second. Far more precise than the workbook’s coarse core.xml timestamps, and not a value the Document Inspector touches. |
| personId | Foreign key into person.xml. The actual identity is one indirection away. |
| id | Per-comment GUID. Stable across saves, used to anchor replies. |
| parentId | GUID of the root comment in the thread. Lets you reconstruct the entire reply tree. |
| done | Set to 1 when the user marks the thread resolved. The thread disappears from the default UI but the XML is preserved. |
| mentions/mention | Each @-mention names the persona it points at, plus a span (startIndex, length) inside the comment text. |
The <mentions> child of every threaded comment is the part most authors do not realise persists. When a user types @Carlos into a comment, Excel records:
person.xml — even if Carlos has never opened the workbook himself.userId, and provider, sourced from the typing user’s directory lookup.mentionId GUID for the specific occurrence of the mention in the text, allowing Excel to drive notification dispatch and badge UI.Multiplied across a workbook that has been collaborated on for weeks, the persona list ends up reading like a roll call of the project team — including people who only ever appeared as @-mentions and never typed a word in the file. A reader of the XLSX learns who reviewed the file, who was tagged for follow-up, and which questions were directed at whom. Even after the visible thread is resolved or deleted, the persona records frequently survive.
Deleting a single threaded comment that contained an @-mention removes the comment from the worksheet, but Excel does not garbage-collect the corresponding persona from person.xml. The mentioned colleague’s display name, UPN, and provider continue to ship inside the file. Audit a recently-edited workbook and you will frequently find personas with no remaining comment references — ghosts of conversations that were nominally erased.
Excel’s “Resolve thread” option flips the done attribute on the root comment to 1 and hides the bubble from the default review pane. The XML is otherwise untouched. A reader who unzips the workbook still sees:
done="1" flag itself, which signals to a forensic reader that the conversation reached closure — useful intelligence in disputed-document scenarios.Resolving a thread is a UI convenience, not a metadata-removal step. For workbooks heading outside the company, every resolved thread is a piece of internal deliberation hiding behind a Show resolved comments checkbox in Excel.
A motivated reader who unzips an XLSX with active threaded comments can reconstruct, with no special tooling, a very detailed picture of the workbook’s authoring history.
@contoso.com userId pinpoints the tenant and lets the reader probe public Graph endpoints for tenant ID, default domain, and federation status.Last, First (External) reveal organisational role.dT timestamps maps directly to working hours, time zones, and quiet windows.providerId="PeoplePicker" and free-text email addresses indicate external collaborators that the org may not have realised were inside the workbook’s metadata.Three quick recipes pull the comment and persona data out of a workbook without ever launching Office.
# 1. List every threaded comment and persona file
unzip -l workbook.xlsx | grep -E "persons|threadedComments"
# 2. Dump the persona directory
unzip -p workbook.xlsx "xl/persons/person.xml"
# 3. Pull every comment text plus its timestamp
unzip -p workbook.xlsx "xl/threadedComments/threadedComment1.xml" | \
xmlstarlet sel -t -m "//*[local-name()='threadedComment']" -v "@dT" -o " | " -v "." -n
For programmatic auditing, a small Python script joins personas to comments and prints the reconstructed conversation:
import zipfile
from xml.etree import ElementTree as ET
NS = "{http://schemas.microsoft.com/office/spreadsheetml/2018/threadedcomments}"
with zipfile.ZipFile("workbook.xlsx") as z:
people = {}
if "xl/persons/person.xml" in z.namelist():
root = ET.fromstring(z.read("xl/persons/person.xml"))
for p in root.iter(NS + "person"):
people[p.get("id")] = (p.get("displayName"),
p.get("userId"), p.get("providerId"))
for name in z.namelist():
if name.startswith("xl/threadedComments/"):
root = ET.fromstring(z.read(name))
for c in root.iter(NS + "threadedComment"):
person = people.get(c.get("personId"), ("?", "?", "?"))
text = (c.findtext(NS + "text") or "").strip()
print(c.get("dT"), person, c.get("ref"), text)
The output is a flat log of every threaded comment ever made in the workbook, with the full identity of every commenter, the cell they targeted, and the text they wrote — including the resolved threads that the Excel UI hides by default.
Document Inspector exposes a single “Comments and annotations” toggle. In practice its behaviour against threaded comments has shifted across Office builds and is inconsistent enough that you should not rely on it as the last line of defence.
done="1") threads are sometimes left behind in older Office builds because the inspector enumerates only the visible comments.xl/persons/person.xml frequently survives the cleanup, with the directory of every persona who ever participated or was mentioned still inside the file. The persona list is then orphaned but readable.comments1.xml file is removed by the same toggle, which can leave a dangling relationship in [Content_Types].xml that some downstream tools warn on.A clean removal touches four artefacts: the threaded comment files, the persona directory, the relationship entries that point at them, and the content-type overrides. Four approaches in increasing order of robustness.
The Review > Comments pane has a “Show resolved comments” toggle and a delete option for each thread. Manually resolve everything, show resolved, delete each one, then save. This removes the visible threads but is slow and frequently leaves orphaned persona records, especially around @-mentions.
File > Info > Check for Issues > Inspect Document, then tick “Comments and annotations” and run. Verify by unzipping the saved file and confirming both xl/threadedComments/ and xl/persons/ are gone. If the persons file remains, fall back to one of the next options.
A short Python script removes every threaded comment artefact in a single pass and rewrites the relationship and content-type files:
import zipfile, re
from pathlib import Path
src, dst = Path("in.xlsx"), Path("out.xlsx")
DROP = ("xl/threadedComments/", "xl/persons/", "xl/commentsExt.xml")
PATTERN = rb"<(?:Override|Relationship)[^/]*(?:threadedComment|person|commentsExt)[^/]*/>"
with zipfile.ZipFile(src) as zin, \
zipfile.ZipFile(dst, "w", zipfile.ZIP_DEFLATED) as zout:
for item in zin.infolist():
if any(item.filename.startswith(p) for p in DROP):
continue
data = zin.read(item.filename)
if item.filename.endswith((".rels", "[Content_Types].xml")):
data = re.sub(PATTERN, b"", data)
zout.writestr(item, data)
Excel reopens the resulting file cleanly. Because no worksheet XML element references the threaded-comment relationship by ID, no further fix-up is required.
For organisations that share workbooks regularly, build a server-side step into the file-sharing pipeline that strips both xl/threadedComments/ and xl/persons/, then runs a final scan to confirm no residual personId attributes remain in worksheet XML. This composes well with the parallel sanitisation steps for external links, defined names, and printer settings — the same pipeline can clean all four layers in one pass.
Run this checklist against any workbook leaving your organisation, especially if it has been collaborated on through Microsoft 365.
xl/threadedComments/ and xl/persons/ exist in the ZIP?displayName, userId, and providerId and verified none point to colleagues, mentions, or external email addresses that should not leave the organisation?userId contain a UPN that names the corporate Microsoft 365 tenant domain, or an AD SID that could be correlated against the directory?done="1" and confirmed the resolved-thread text contains nothing internal that I do not want a recipient to read?dT) for working-hours patterns or after-hours edits that I would not want to disclose?xl/threadedComments/ and xl/persons/ entirely, plus their .rels and [Content_Types].xml entries?xl/comments1.xml file does not contain residual notes that the threaded-comment cleanup ignored?Threaded comments and the persona directory are the most identity-rich metadata layer the modern XLSX format introduces. Designed for real-time co-authoring, they bind every comment to a Microsoft 365 or Active Directory persona that includes a stable user identifier, a tenant-naming domain, and a display name — plus a parallel @-mention graph that pulls in colleagues who never typed a single character into the file. The Document Inspector handles the visible threads inconsistently and frequently leaves the persona directory behind, where it sits as a complete name-and-tenant beacon long after the conversation it documented has been resolved or deleted.
For workbooks staying inside an organisation, the layer is exactly what it claims to be: a useful collaboration record. For workbooks crossing the perimeter, it is a tenant fingerprint, an org-chart fragment, and a transcript of internal deliberation, all in one. The only durable defence is to strip xl/threadedComments/ and xl/persons/ entirely, along with the relationship and content-type entries that point at them — a small ZIP-level operation that closes a leak most security teams never inspect.
The legacy notes side of the comment story, plus the practical UI steps for both systems.
Another metadata layer the Document Inspector ignores entirely, with parallel cleanup steps.
Where every metadata layer lives inside an XLSX, including the threadedComments and persons folders.