When a user pastes a photo into a worksheet, drags a Word document onto a tab, or uses Insert → Object to attach a PDF, Excel does not flatten any of it. The original image arrives with its EXIF, IPTC, and XMP blocks intact — including GPS coordinates, the camera serial number, and the originating Photoshop session. The embedded document arrives as a complete file inside the XLSX, carrying its own author, revision history, and tracked changes. This post walks through where embedded-object metadata lives inside an XLSX, what it exposes, and how to strip it before sharing.
Excel treats embedded content as two separate things. Pictures — PNG, JPEG, GIF, BMP, TIFF files dragged into a sheet, pasted from clipboard, or inserted via Insert → Pictures — are stored byte-for-byte inside the XLSX package. OLE objects — Word documents, PDFs, Outlook .msg files, other XLSX files, even custom application files registered with the OS — are stored as full embedded files (or a wrapper around them) plus a thumbnail rendered for preview.
In neither case does Excel re-encode, re-render, or normalize the embedded content. The 12-megapixel JPEG you dropped on a quarterly review tab is the same JPEG, with the same EXIF block written by your phone’s camera firmware. The 80-page negotiation draft you embedded on the cover sheet is the entire .docx file, with its full revision tree and every commenter’s name. The XLSX is a transport container, not a sanitizer.
xl/media/image1.png — one file per unique embedded picture, in its original encodingxl/embeddings/oleObject1.bin — one file per OLE-embedded document, often a Compound File Binary stream wrapping the originalxl/drawings/drawing1.xml — the placement, sizing, anchor, alt text, and title for every shape on a worksheetxl/drawings/_rels/drawing1.xml.rels — the relationship pointing each shape at its underlying media or embedding[Content_Types].xml — declares MIME types for every embedded format, which can itself disclose the source applicationWhen a user crops an image inside Excel, the visible region changes but the underlying file does not. The full original picture remains in xl/media/, and the crop is recorded as a clipping rectangle in the drawing XML. A recipient who unzips the file and opens the image gets the uncropped original — faces that were cropped out, sensitive text outside the visible window, the rest of a whiteboard photo. Use Picture Format → Compress Pictures → Delete cropped areas of pictures as a deliberate step; the default behavior is to keep everything.
A JPEG taken on a modern phone is not just pixels. It carries up to four parallel metadata blocks: EXIF (camera settings), GPS (location), IPTC (caption, byline, copyright), and XMP (Adobe-style structured metadata, often added during editing). Most users do not know any of these exist; all four travel intact when the photo is dropped into Excel.
// Representative output from `exiftool xl/media/image1.jpg` after unzipping an XLSX
Make : Apple
Model : iPhone 15 Pro
Software : 18.4.1
Date/Time Original : 2026:03:18 14:22:07
GPS Latitude : 47 deg 36' 32.41" N
GPS Longitude : 122 deg 19' 59.04" W
GPS Altitude : 56.3 m Above Sea Level
Lens Model : iPhone 15 Pro back triple camera 6.86mm f/1.78
Camera Owner Name : Jane Smith
Body Serial Number : C7QXM9XJYP
Image Description : IMG_4421 - draft P&L screenshot - DO NOT SHARE
Creator Tool : Adobe Photoshop 25.6 (Macintosh)
History Action : saved, edited, saved (3x)
Every line in that block is a real disclosure. The GPS coordinates resolve in any mapping tool to a specific street address. The camera owner name and serial number identify the device that took the photo. The image description, written by the user during a sort or favorites pass, contains an internal warning that the recipient now reads. The creator tool reveals that the image was edited in Photoshop, and the history action count tells the recipient there were edits worth saving three times — a hint that something was retouched, painted out, or cleaned up.
| Metadata Block | Typical Source | What Often Leaks |
|---|---|---|
| EXIF | Phone or camera firmware | Camera make/model, serial number, lens, exposure, original timestamp |
| GPS (in EXIF) | Phone GPS or geotagging service | Latitude, longitude, altitude, heading, sometimes a place name |
| IPTC | DAM systems, photo libraries | Byline, headline, caption, keywords, copyright owner, source organization |
| XMP | Adobe apps, Lightroom, Photos.app | Edit history, original document ID, color profile, ratings, faces, person tags |
| tEXt / iTXt (PNG) | Snipping tools, screen recorders | Source application name, screen capture timestamp, device info |
Many users believe a screenshot has no metadata. Most snipping tools write a PNG tEXt chunk identifying the tool, sometimes the OS user name, and the capture timestamp. macOS’s screenshot service writes XMP. The Snipping Tool on Windows writes Software, CreationTime, and on some versions a thumbnail of the screen. A screenshot of a draft pasted into Excel can ship with the username of the person who took the screenshot embedded in the PNG.
GPS leakage from embedded photos deserves to be called out separately because the consequence is so direct. A latitude/longitude pair with the typical six-decimal precision pins to roughly a 10-centimeter square. Embedded inside a workbook that ships to a counterparty, that translates to specific home addresses, hotel rooms, suppliers, dispute sites, and field locations.
Take a workbook used to track a sales team’s field visits. Each row references an account; each row has a small thumbnail photo of a meeting whiteboard or signed delivery slip dragged from a phone. Unzip the XLSX, run exiftool -GPSLatitude -GPSLongitude xl/media/*, and you have an unintended dataset: the home address of every customer whose photo was taken at their home, every regional office, every supplier site visited that week. The workbook’s visible columns may carry “City” only; the embedded photos quietly carry full coordinates.
Three categories of workbooks are particularly dangerous to share without sanitization:
Construction site visits, environmental sampling, insurance loss adjustment, real-estate inspections. Photos are taken on phones with location services on, dropped into a tab, and the workbook is shared with regulators or counsel. Every photo is a precise location pin.
HR investigations, harassment complaints, security incidents. A photo taken inside a sensitive location — a private office, a witness’s home, a location the investigator does not want disclosed — embeds the coordinates of where the photo was taken, even if the photo itself shows only an anonymized scene.
Equipment, art, fleet, fine wine, server rooms. The photo of the asset taken on a smartphone records where the asset is stored. An asset register shared with an insurer, an auditor, a bank, or a buyer carries a precise location for every photographed item.
Insert → Object → Create from File → check “Display as icon” looks innocent. The user sees a small icon on the worksheet labeled with the filename. What Excel did under the hood is far more involved: the entire source file was read in, wrapped in a Compound File Binary container, written to xl/embeddings/oleObject1.bin, and a rendered preview image was generated and stored alongside it.
The wrapper is a thin one. Inside the .bin a tool like olefile or ssview finds the original document streams — for a Word doc, the entire docx package; for an Outlook message, the full .msg with attachments and recipients; for a PDF, the original PDF bytes. Each of those carries its own metadata tree:
| Embedded Type | Metadata Tree Inside |
|---|---|
| .docx (Word) | Authors, last-modified-by, revision count, total edit time, comments, tracked changes, settings.xml, custom properties — the full Word document carries everything its own author left behind |
Author, Producer, CreationDate, ModDate, Title, Keywords, plus optional XMP block; PDFs from Word also carry the original docx’s metadata via /Custom entries | |
| .msg (Outlook) | Sender, all recipients (including BCC if the sender saved a sent copy), full message headers, attachments with their own metadata trees, conversation thread index |
| .xlsx | All the metadata of a regular workbook — authors, hidden sheets, pivot caches, external links, defined names — recursively |
| .eml | Full RFC822 headers including X-Mailer, Received chains, message-id, original sender IP in some configurations |
An XLSX with an embedded .msg with an attached .docx with an embedded image is not three artifacts — it is one delivery vehicle for all of them, with all of their metadata. Document Inspector cleans the outer XLSX. It does not open and clean each embedded file. The author of the workbook can run Inspect Document and pass; the recipient can extract a chain of nested attachments still carrying everyone’s names, all the way down.
Beyond the embedded files, Excel writes its own descriptive metadata in xl/drawings/drawing1.xml for every shape, picture, and OLE object. Two fields here matter for disclosure: the shape’s name attribute (often the original filename) and the accessibility descr field (alt text, used by screen readers).
// Excerpt from xl/drawings/drawing1.xml
<xdr:nvPicPr>
<xdr:cNvPr id="3"
name="IMG_4421 (1) - draft Q2 forecast - confidential.png"
descr="Whiteboard photo from offsite at lakehouse, March 18"/>
<xdr:cNvPicPr>
<a:picLocks noChangeAspect="1"/>
</xdr:cNvPicPr>
</xdr:nvPicPr>
The shape name is whatever the source filename was when the picture was inserted — including parenthetical version markers like (1) that suggest a duplicate-rename pattern, descriptive suffixes the user added, and the file extension. The descr alt text is whatever the user (or Excel’s automatic captioning, if enabled) typed for accessibility. Both fields are visible to anyone who unzips the file and reads the drawing XML — and to many automated ingest pipelines that index alt text as caption text.
Recent versions of Excel offer auto-generated alt text via the cloud accessibility service. The auto-text is structurally informative (“A graph showing increasing revenue,” “A man in a red shirt next to a whiteboard with handwritten numbers”) and tells the recipient something about the picture even before they look at it — useful for accessibility, less useful when the recipient was not supposed to know what was on the whiteboard.
Four scenarios drawn from real patterns of disclosure:
A broker prepares an investor pitch in Excel with thumbnail photos of comparable properties. The photos were taken on the broker’s phone during site visits; some are from properties on different deals.
Leak: Each photo’s GPS coordinates pinpoint the comparable property, but a few photos were taken inside another investor’s building during a previous showing. The recipient now has a precise list of what other deals the broker is also working, complete with addresses.
A finance team builds a deal model and uses Insert → Object to attach the latest term sheet (.docx) to the cover sheet for convenience. The model is sent to outside counsel for review.
Leak: The embedded .docx carries every tracked change, every comment, the “last modified by” name of an associate at the opposing firm who edited the doc earlier in the cycle, and a custom property naming the document repository it came from. Outside counsel now has the redline history they were never meant to see.
An HR investigator builds a workbook to track interviews and evidence. Witness statements are attached as embedded .msg files; photos taken during a site walk are pasted into the evidence tab. The workbook is shared with outside counsel.
Leak: The embedded .msg files carry the full BCC list of every email the investigator sent, including the names of executives who were privately copied. The site-walk photos carry GPS coordinates resolving to specific witnesses’ offices. The investigator believed the workbook had been sanitized; only the outer Excel layer was inspected.
A product manager photographs a strategy whiteboard, drops the photo into a roadmap workbook, and crops it inside Excel to remove a section showing competitor-specific pricing. The workbook is sent to a partner.
Leak: The crop is a display rectangle, not a file edit. The full photo — including the cropped-out competitor pricing column — is in xl/media/image1.jpg at full resolution. The partner unzips the file, opens the JPEG, and sees the original whiteboard.
A useful inspection enumerates every embedded artifact, dumps the metadata of each, and flags GPS coordinates and known-sensitive fields. Three practical approaches:
File → Info → Check for Issues → Inspect Document finds and offers to remove some image-related items, but it does not remove EXIF/GPS from the underlying media files, does not crack open OLE-embedded documents to inspect them, and treats the embedded objects as opaque blobs. Use it as a baseline pass, not a final answer.
Rename a copy of the XLSX to .zip, extract, and walk the xl/media/ and xl/embeddings/ trees:
# List every embedded image and embedding
unzip -l report.xlsx | grep -E 'xl/(media|embeddings)/'
# Extract the package and run exiftool over every embedded image
unzip -q report.xlsx -d report_unpacked
exiftool -r -G report_unpacked/xl/media/
# Flag GPS coordinates anywhere in the package
exiftool -if '$GPSLatitude' -GPSLatitude -GPSLongitude -filename -r report_unpacked/
# Inspect OLE-embedded objects to identify the wrapped format
for f in report_unpacked/xl/embeddings/*.bin; do
echo "--- $f"
file "$f"
olemeta "$f" 2>/dev/null || true
done
A dedicated metadata tool such as MetaData Analyzer enumerates every xl/media/ entry, runs an EXIF/IPTC/XMP pass on each, recursively extracts and inspects OLE embeddings, flags GPS coordinates and high-risk metadata fields, and produces a pre-share report — the only approach that scales beyond a handful of files.
There is no single button that removes all of it. Depending on what the file needs to preserve, combine the following:
xl/media/ file sizes have dropped and that exiftool shows fewer fields.Compress Pictures is the most accessible step but is not a guaranteed metadata stripper — results vary by Excel version. Always verify with exiftool afterward; do not assume.
exiftool -all= file.jpg, ImageOptim on macOS, the “Remove Properties” menu in Windows Explorer, or a print-to-PNG round trip).This is the only fully reliable path for high-sensitivity workflows. The discipline is to clean images before they enter the workbook, not after.
exiftool -all= file.pdf or pdftk file.pdf output clean.pdf), and embed the cleaned PDF instead of the source document.For embedded .msg or .eml files, consider whether the file needs to be embedded at all — an extracted, sanitized text excerpt is often the correct artifact.
For pipelines — DLP integrations, CI/CD exports, pre-send scanners — rewrite the package: replace each xl/media/ file with an EXIF-stripped copy, optionally remove xl/embeddings/ entirely, and clear the descr attributes in xl/drawings/. A minimal sketch:
# Strip EXIF from media and clear drawing alt text inside an XLSX
import zipfile, io, re
from PIL import Image
def strip_image_metadata(data):
img = Image.open(io.BytesIO(data))
clean = Image.new(img.mode, img.size)
clean.putdata(list(img.getdata()))
buf = io.BytesIO()
clean.save(buf, format=img.format)
return buf.getvalue()
def sanitize_xlsx(src, dst):
with zipfile.ZipFile(src) as zin, zipfile.ZipFile(dst, "w", zipfile.ZIP_DEFLATED) as zout:
for item in zin.infolist():
data = zin.read(item.filename)
if item.filename.startswith("xl/media/"):
data = strip_image_metadata(data)
elif item.filename.startswith("xl/embeddings/"):
continue # drop OLE embeddings entirely
elif item.filename.startswith("xl/drawings/drawing") and item.filename.endswith(".xml"):
data = re.sub(rb' descr="[^"]*"', b'', data)
data = re.sub(rb' name="[^"]*"', b' name="Picture"', data)
zout.writestr(item, data)
Sanitizing the metadata of every embedded file is hard — each format has its own metadata model and its own gotchas. For most external deliverables the safer default is to not embed in the first place: link to the document in a separate share, paste a sanitized PDF as a picture, or omit it. If a workbook ever embedded a document for “convenience,” that convenience is paid for in metadata exposure.
Run through this checklist before releasing any workbook that contains pictures, screenshots, or embedded objects:
xl/media/ and xl/embeddings/?exiftool over every embedded image and confirmed no GPS, no camera serial, no creator-tool history?xl/embeddings/ binary, identified the wrapped format, and inspected its own metadata?xl/drawings/*.xml for shape names and alt text that disclose filenames or context?Image and OLE leakage is unusually concentrated in a handful of teams — sales, field operations, HR, legal, finance — and unusually invisible to the people creating the files. Organizations that manage the risk tend to combine several of the following practices:
Block location services for the camera app on managed devices, or enforce a Camera Roll cleaner that strips GPS before photos sync. The cost of this change is low; the upside is that the most damaging single field never enters the document supply chain.
Set a DLP rule that flags any XLSX leaving the perimeter with non-empty xl/media/ containing GPS data, or any non-empty xl/embeddings/ folder. Most teams find the volume manageable and the alerts highly actionable.
The Display as icon path is the most common way to embed a full document for “convenience.” A simple policy — no embedded objects in workbooks shared externally — eliminates an entire class of leakage with negligible workflow cost.
A two-minute demo — unzip a familiar workbook, run exiftool, project the GPS coordinate that maps to someone’s house — persuades more effectively than any policy memo. Run it once a year for the teams that share the most files.
An embedded image looks like a picture. A pasted document looks like an icon. Excel does not flatten either; it carries the original bytes intact, including every metadata block the source application wrote. A recipient with unzip and exiftool reads the GPS coordinates, the camera serial, the alt-text caption, the embedded Word doc’s tracked changes, and the BCC list of an embedded email.
The fix is mechanical — clean images before they enter the workbook, avoid embedding documents for convenience, run Compress Pictures with Delete cropped areas, and verify at the ZIP level. None of it happens by accident. Treat every workbook with media or embeddings as if it were a multi-format archive carrying the original metadata of every file inside it, because it is — until you deliberately remove it.
How linked workbook references leak UNC paths and cached source data.
Why deleting the source sheet does not remove the data.
How XLSX files are structured internally.