Back to Blog
Technical

Excel Embedded Images and OLE Objects: The Hidden Metadata in Pictures and Files Inside Your Workbooks

When a user pastes a photo into a worksheet, drags a Word document onto a tab, or uses Insert → Object to attach a PDF, Excel does not flatten any of it. The original image arrives with its EXIF, IPTC, and XMP blocks intact — including GPS coordinates, the camera serial number, and the originating Photoshop session. The embedded document arrives as a complete file inside the XLSX, carrying its own author, revision history, and tracked changes. This post walks through where embedded-object metadata lives inside an XLSX, what it exposes, and how to strip it before sharing.

Technical Team
April 26, 2026
21 min read

Two Categories of Embedded Object, Both Untouched

Excel treats embedded content as two separate things. Pictures — PNG, JPEG, GIF, BMP, TIFF files dragged into a sheet, pasted from clipboard, or inserted via Insert → Pictures — are stored byte-for-byte inside the XLSX package. OLE objects — Word documents, PDFs, Outlook .msg files, other XLSX files, even custom application files registered with the OS — are stored as full embedded files (or a wrapper around them) plus a thumbnail rendered for preview.

In neither case does Excel re-encode, re-render, or normalize the embedded content. The 12-megapixel JPEG you dropped on a quarterly review tab is the same JPEG, with the same EXIF block written by your phone’s camera firmware. The 80-page negotiation draft you embedded on the cover sheet is the entire .docx file, with its full revision tree and every commenter’s name. The XLSX is a transport container, not a sanitizer.

  • xl/media/image1.png — one file per unique embedded picture, in its original encoding
  • xl/embeddings/oleObject1.bin — one file per OLE-embedded document, often a Compound File Binary stream wrapping the original
  • xl/drawings/drawing1.xml — the placement, sizing, anchor, alt text, and title for every shape on a worksheet
  • xl/drawings/_rels/drawing1.xml.rels — the relationship pointing each shape at its underlying media or embedding
  • [Content_Types].xml — declares MIME types for every embedded format, which can itself disclose the source application

Cropping Does Not Crop

When a user crops an image inside Excel, the visible region changes but the underlying file does not. The full original picture remains in xl/media/, and the crop is recorded as a clipping rectangle in the drawing XML. A recipient who unzips the file and opens the image gets the uncropped original — faces that were cropped out, sensitive text outside the visible window, the rest of a whiteboard photo. Use Picture Format → Compress Pictures → Delete cropped areas of pictures as a deliberate step; the default behavior is to keep everything.

EXIF, IPTC, and XMP: What Phones and Cameras Wrote Into Your Picture

A JPEG taken on a modern phone is not just pixels. It carries up to four parallel metadata blocks: EXIF (camera settings), GPS (location), IPTC (caption, byline, copyright), and XMP (Adobe-style structured metadata, often added during editing). Most users do not know any of these exist; all four travel intact when the photo is dropped into Excel.

// Representative output from `exiftool xl/media/image1.jpg` after unzipping an XLSX

Make : Apple

Model : iPhone 15 Pro

Software : 18.4.1

Date/Time Original : 2026:03:18 14:22:07

GPS Latitude : 47 deg 36' 32.41" N

GPS Longitude : 122 deg 19' 59.04" W

GPS Altitude : 56.3 m Above Sea Level

Lens Model : iPhone 15 Pro back triple camera 6.86mm f/1.78

Camera Owner Name : Jane Smith

Body Serial Number : C7QXM9XJYP

Image Description : IMG_4421 - draft P&L screenshot - DO NOT SHARE

Creator Tool : Adobe Photoshop 25.6 (Macintosh)

History Action : saved, edited, saved (3x)

Every line in that block is a real disclosure. The GPS coordinates resolve in any mapping tool to a specific street address. The camera owner name and serial number identify the device that took the photo. The image description, written by the user during a sort or favorites pass, contains an internal warning that the recipient now reads. The creator tool reveals that the image was edited in Photoshop, and the history action count tells the recipient there were edits worth saving three times — a hint that something was retouched, painted out, or cleaned up.

Metadata BlockTypical SourceWhat Often Leaks
EXIFPhone or camera firmwareCamera make/model, serial number, lens, exposure, original timestamp
GPS (in EXIF)Phone GPS or geotagging serviceLatitude, longitude, altitude, heading, sometimes a place name
IPTCDAM systems, photo librariesByline, headline, caption, keywords, copyright owner, source organization
XMPAdobe apps, Lightroom, Photos.appEdit history, original document ID, color profile, ratings, faces, person tags
tEXt / iTXt (PNG)Snipping tools, screen recordersSource application name, screen capture timestamp, device info

Screenshots Are Not Always Clean Either

Many users believe a screenshot has no metadata. Most snipping tools write a PNG tEXt chunk identifying the tool, sometimes the OS user name, and the capture timestamp. macOS’s screenshot service writes XMP. The Snipping Tool on Windows writes Software, CreationTime, and on some versions a thumbnail of the screen. A screenshot of a draft pasted into Excel can ship with the username of the person who took the screenshot embedded in the PNG.

The GPS Problem in Concrete Numbers

GPS leakage from embedded photos deserves to be called out separately because the consequence is so direct. A latitude/longitude pair with the typical six-decimal precision pins to roughly a 10-centimeter square. Embedded inside a workbook that ships to a counterparty, that translates to specific home addresses, hotel rooms, suppliers, dispute sites, and field locations.

A Concrete Measurement

Take a workbook used to track a sales team’s field visits. Each row references an account; each row has a small thumbnail photo of a meeting whiteboard or signed delivery slip dragged from a phone. Unzip the XLSX, run exiftool -GPSLatitude -GPSLongitude xl/media/*, and you have an unintended dataset: the home address of every customer whose photo was taken at their home, every regional office, every supplier site visited that week. The workbook’s visible columns may carry “City” only; the embedded photos quietly carry full coordinates.

Three categories of workbooks are particularly dangerous to share without sanitization:

Field Reports and Inspection Logs

Construction site visits, environmental sampling, insurance loss adjustment, real-estate inspections. Photos are taken on phones with location services on, dropped into a tab, and the workbook is shared with regulators or counsel. Every photo is a precise location pin.

Investigations and Incident Reports

HR investigations, harassment complaints, security incidents. A photo taken inside a sensitive location — a private office, a witness’s home, a location the investigator does not want disclosed — embeds the coordinates of where the photo was taken, even if the photo itself shows only an anonymized scene.

Asset Registers With Photos

Equipment, art, fleet, fine wine, server rooms. The photo of the asset taken on a smartphone records where the asset is stored. An asset register shared with an insurer, an auditor, a bank, or a buyer carries a precise location for every photographed item.

OLE Objects: Whole Documents Riding Inside the Workbook

Insert → Object → Create from File → check “Display as icon” looks innocent. The user sees a small icon on the worksheet labeled with the filename. What Excel did under the hood is far more involved: the entire source file was read in, wrapped in a Compound File Binary container, written to xl/embeddings/oleObject1.bin, and a rendered preview image was generated and stored alongside it.

The wrapper is a thin one. Inside the .bin a tool like olefile or ssview finds the original document streams — for a Word doc, the entire docx package; for an Outlook message, the full .msg with attachments and recipients; for a PDF, the original PDF bytes. Each of those carries its own metadata tree:

Embedded TypeMetadata Tree Inside
.docx (Word)Authors, last-modified-by, revision count, total edit time, comments, tracked changes, settings.xml, custom properties — the full Word document carries everything its own author left behind
.pdfAuthor, Producer, CreationDate, ModDate, Title, Keywords, plus optional XMP block; PDFs from Word also carry the original docx’s metadata via /Custom entries
.msg (Outlook)Sender, all recipients (including BCC if the sender saved a sent copy), full message headers, attachments with their own metadata trees, conversation thread index
.xlsxAll the metadata of a regular workbook — authors, hidden sheets, pivot caches, external links, defined names — recursively
.emlFull RFC822 headers including X-Mailer, Received chains, message-id, original sender IP in some configurations

Embedding Is Recursive

An XLSX with an embedded .msg with an attached .docx with an embedded image is not three artifacts — it is one delivery vehicle for all of them, with all of their metadata. Document Inspector cleans the outer XLSX. It does not open and clean each embedded file. The author of the workbook can run Inspect Document and pass; the recipient can extract a chain of nested attachments still carrying everyone’s names, all the way down.

Drawing XML: The Metadata Excel Wrote Itself

Beyond the embedded files, Excel writes its own descriptive metadata in xl/drawings/drawing1.xml for every shape, picture, and OLE object. Two fields here matter for disclosure: the shape’s name attribute (often the original filename) and the accessibility descr field (alt text, used by screen readers).

// Excerpt from xl/drawings/drawing1.xml

<xdr:nvPicPr>

<xdr:cNvPr id="3"

name="IMG_4421 (1) - draft Q2 forecast - confidential.png"

descr="Whiteboard photo from offsite at lakehouse, March 18"/>

<xdr:cNvPicPr>

<a:picLocks noChangeAspect="1"/>

</xdr:cNvPicPr>

</xdr:nvPicPr>

The shape name is whatever the source filename was when the picture was inserted — including parenthetical version markers like (1) that suggest a duplicate-rename pattern, descriptive suffixes the user added, and the file extension. The descr alt text is whatever the user (or Excel’s automatic captioning, if enabled) typed for accessibility. Both fields are visible to anyone who unzips the file and reads the drawing XML — and to many automated ingest pipelines that index alt text as caption text.

Recent versions of Excel offer auto-generated alt text via the cloud accessibility service. The auto-text is structurally informative (“A graph showing increasing revenue,” “A man in a red shirt next to a whiteboard with handwritten numbers”) and tells the recipient something about the picture even before they look at it — useful for accessibility, less useful when the recipient was not supposed to know what was on the whiteboard.

How This Goes Wrong in Practice

Four scenarios drawn from real patterns of disclosure:

Scenario 1: The Real-Estate Pitch With a Photo Tour

A broker prepares an investor pitch in Excel with thumbnail photos of comparable properties. The photos were taken on the broker’s phone during site visits; some are from properties on different deals.

Leak: Each photo’s GPS coordinates pinpoint the comparable property, but a few photos were taken inside another investor’s building during a previous showing. The recipient now has a precise list of what other deals the broker is also working, complete with addresses.

Scenario 2: The Embedded Term Sheet

A finance team builds a deal model and uses Insert → Object to attach the latest term sheet (.docx) to the cover sheet for convenience. The model is sent to outside counsel for review.

Leak: The embedded .docx carries every tracked change, every comment, the “last modified by” name of an associate at the opposing firm who edited the doc earlier in the cycle, and a custom property naming the document repository it came from. Outside counsel now has the redline history they were never meant to see.

Scenario 3: The HR Investigation Workbook

An HR investigator builds a workbook to track interviews and evidence. Witness statements are attached as embedded .msg files; photos taken during a site walk are pasted into the evidence tab. The workbook is shared with outside counsel.

Leak: The embedded .msg files carry the full BCC list of every email the investigator sent, including the names of executives who were privately copied. The site-walk photos carry GPS coordinates resolving to specific witnesses’ offices. The investigator believed the workbook had been sanitized; only the outer Excel layer was inspected.

Scenario 4: The Cropped Whiteboard

A product manager photographs a strategy whiteboard, drops the photo into a roadmap workbook, and crops it inside Excel to remove a section showing competitor-specific pricing. The workbook is sent to a partner.

Leak: The crop is a display rectangle, not a file edit. The full photo — including the cropped-out competitor pricing column — is in xl/media/image1.jpg at full resolution. The partner unzips the file, opens the JPEG, and sees the original whiteboard.

How to Inspect Embedded Objects in an XLSX

A useful inspection enumerates every embedded artifact, dumps the metadata of each, and flags GPS coordinates and known-sensitive fields. Three practical approaches:

Method 1: Document Inspector (Insufficient on Its Own)

File → Info → Check for Issues → Inspect Document finds and offers to remove some image-related items, but it does not remove EXIF/GPS from the underlying media files, does not crack open OLE-embedded documents to inspect them, and treats the embedded objects as opaque blobs. Use it as a baseline pass, not a final answer.

Method 2: Unzip and Use ExifTool

Rename a copy of the XLSX to .zip, extract, and walk the xl/media/ and xl/embeddings/ trees:

# List every embedded image and embedding

unzip -l report.xlsx | grep -E 'xl/(media|embeddings)/'

# Extract the package and run exiftool over every embedded image

unzip -q report.xlsx -d report_unpacked

exiftool -r -G report_unpacked/xl/media/

# Flag GPS coordinates anywhere in the package

exiftool -if '$GPSLatitude' -GPSLatitude -GPSLongitude -filename -r report_unpacked/

# Inspect OLE-embedded objects to identify the wrapped format

for f in report_unpacked/xl/embeddings/*.bin; do

echo "--- $f"

file "$f"

olemeta "$f" 2>/dev/null || true

done

Method 3: Automated Analysis

A dedicated metadata tool such as MetaData Analyzer enumerates every xl/media/ entry, runs an EXIF/IPTC/XMP pass on each, recursively extracts and inspects OLE embeddings, flags GPS coordinates and high-risk metadata fields, and produces a pre-share report — the only approach that scales beyond a handful of files.

How to Actually Remove Embedded-Object Metadata

There is no single button that removes all of it. Depending on what the file needs to preserve, combine the following:

Option A: Compress Pictures and Delete Cropped Areas

  1. Select a picture, then Picture Format → Compress Pictures.
  2. Uncheck Apply only to this picture to apply across the workbook.
  3. Check Delete cropped areas of pictures.
  4. Pick a target resolution; lower resolutions force re-encoding and frequently strip EXIF as a side effect.
  5. Save and verify that xl/media/ file sizes have dropped and that exiftool shows fewer fields.

Compress Pictures is the most accessible step but is not a guaranteed metadata stripper — results vary by Excel version. Always verify with exiftool afterward; do not assume.

Option B: Strip EXIF and Re-Encode the Image Before Inserting

  1. Run the source images through a metadata stripper (exiftool -all= file.jpg, ImageOptim on macOS, the “Remove Properties” menu in Windows Explorer, or a print-to-PNG round trip).
  2. Insert the cleaned images into the workbook.
  3. If images were already inserted, delete them, clean the originals, and reinsert.
  4. Verify after saving that the embedded media files no longer carry EXIF/GPS.

This is the only fully reliable path for high-sensitivity workflows. The discipline is to clean images before they enter the workbook, not after.

Option C: Replace OLE Embeddings With Sanitized PDF Snapshots

  1. Open each embedded document in its source application.
  2. Run that application’s own Inspect Document / sanitize routine.
  3. Print to PDF, run the PDF through a metadata stripper (exiftool -all= file.pdf or pdftk file.pdf output clean.pdf), and embed the cleaned PDF instead of the source document.
  4. Re-verify by extracting and inspecting the new embedding.

For embedded .msg or .eml files, consider whether the file needs to be embedded at all — an extracted, sanitized text excerpt is often the correct artifact.

Option D: Programmatic Sanitization

For pipelines — DLP integrations, CI/CD exports, pre-send scanners — rewrite the package: replace each xl/media/ file with an EXIF-stripped copy, optionally remove xl/embeddings/ entirely, and clear the descr attributes in xl/drawings/. A minimal sketch:

# Strip EXIF from media and clear drawing alt text inside an XLSX

import zipfile, io, re

from PIL import Image

def strip_image_metadata(data):

img = Image.open(io.BytesIO(data))

clean = Image.new(img.mode, img.size)

clean.putdata(list(img.getdata()))

buf = io.BytesIO()

clean.save(buf, format=img.format)

return buf.getvalue()

def sanitize_xlsx(src, dst):

with zipfile.ZipFile(src) as zin, zipfile.ZipFile(dst, "w", zipfile.ZIP_DEFLATED) as zout:

for item in zin.infolist():

data = zin.read(item.filename)

if item.filename.startswith("xl/media/"):

data = strip_image_metadata(data)

elif item.filename.startswith("xl/embeddings/"):

continue # drop OLE embeddings entirely

elif item.filename.startswith("xl/drawings/drawing") and item.filename.endswith(".xml"):

data = re.sub(rb' descr="[^"]*"', b'', data)

data = re.sub(rb' name="[^"]*"', b' name="Picture"', data)

zout.writestr(item, data)

Dropping Embeddings Is the Safer Default

Sanitizing the metadata of every embedded file is hard — each format has its own metadata model and its own gotchas. For most external deliverables the safer default is to not embed in the first place: link to the document in a separate share, paste a sanitized PDF as a picture, or omit it. If a workbook ever embedded a document for “convenience,” that convenience is paid for in metadata exposure.

Pre-Share Checklist for Workbooks With Embedded Content

Run through this checklist before releasing any workbook that contains pictures, screenshots, or embedded objects:

  • Have I unzipped a copy of the file and listed everything under xl/media/ and xl/embeddings/?
  • Have I run exiftool over every embedded image and confirmed no GPS, no camera serial, no creator-tool history?
  • For any cropped picture, have I run Compress Pictures → Delete cropped areas?
  • Have I extracted each xl/embeddings/ binary, identified the wrapped format, and inspected its own metadata?
  • Have I reviewed xl/drawings/*.xml for shape names and alt text that disclose filenames or context?
  • Did Document Inspector run? (Necessary but not sufficient — do not stop here.)
  • For high-sensitivity files, did I rebuild from a clean workbook with re-cleaned source images and no OLE embeddings?
  • Have I verified the final file with an automated metadata analyzer to catch anything I missed?

Organizational Recommendations

Image and OLE leakage is unusually concentrated in a handful of teams — sales, field operations, HR, legal, finance — and unusually invisible to the people creating the files. Organizations that manage the risk tend to combine several of the following practices:

Disable Location Tagging on Corporate Phones

Block location services for the camera app on managed devices, or enforce a Camera Roll cleaner that strips GPS before photos sync. The cost of this change is low; the upside is that the most damaging single field never enters the document supply chain.

Egress Scan for Embedded Media

Set a DLP rule that flags any XLSX leaving the perimeter with non-empty xl/media/ containing GPS data, or any non-empty xl/embeddings/ folder. Most teams find the volume manageable and the alerts highly actionable.

Ban “Display as icon” in External Deliverables

The Display as icon path is the most common way to embed a full document for “convenience.” A simple policy — no embedded objects in workbooks shared externally — eliminates an entire class of leakage with negligible workflow cost.

Train With the Demonstration

A two-minute demo — unzip a familiar workbook, run exiftool, project the GPS coordinate that maps to someone’s house — persuades more effectively than any policy memo. Run it once a year for the teams that share the most files.

Conclusion

An embedded image looks like a picture. A pasted document looks like an icon. Excel does not flatten either; it carries the original bytes intact, including every metadata block the source application wrote. A recipient with unzip and exiftool reads the GPS coordinates, the camera serial, the alt-text caption, the embedded Word doc’s tracked changes, and the BCC list of an embedded email.

The fix is mechanical — clean images before they enter the workbook, avoid embedding documents for convenience, run Compress Pictures with Delete cropped areas, and verify at the ZIP level. None of it happens by accident. Treat every workbook with media or embeddings as if it were a multi-format archive carrying the original metadata of every file inside it, because it is — until you deliberately remove it.

Detect Embedded Image and OLE Metadata in Your Excel Files

Use MetaData Analyzer to inspect your XLSX files for embedded EXIF data, GPS coordinates, OLE-embedded documents, and drawing alt-text leaks before sharing. See exactly what your workbook’s pictures and embeddings are disclosing.