Back to Blog
Technical

Excel Power Query Metadata: What Your Data Transformations Reveal

Power Query turned Excel into an ETL platform. Analysts pull from SQL Server, SharePoint lists, REST APIs, CSVs on network shares, and OneDrive folders, shape the data with a few clicks, and ship a polished workbook to stakeholders. What goes with that workbook is not just the shaped result — it is the full recipe: the M code, the server names, the user paths, the credential handles, and a timeline of every refresh. Anyone who opens the file can read the recipe. This post walks through where Power Query metadata lives inside an XLSX, what each part reveals, and how to remove it before sharing.

Technical Team
April 20, 2026
21 min read

What Power Query Actually Stores in an XLSX

Power Query — surfaced in Excel under Data → Get Data, Get & Transform, or the older New Query menu — is a data connection and transformation engine built into Excel. When you use it, three things get written into the workbook: a query (the M code that describes your transformation), a connection (the description of the source you are reading from), and optionally a data model entry if the query loads into Power Pivot.

Each of these is persisted to a specific part of the XLSX archive. None of them is visible on the worksheet grid. All of them are plain XML or DataMashup binary blobs that anyone with a text editor and a ZIP tool can inspect.

  • xl/connections.xml — every connection the workbook maintains, including Power Query queries exposed as OLEDB connections. Contains server names, database names, command text, connection strings, and refresh flags.
  • xl/customXml/item1.xml and siblings — the DataMashup payload. This is where the raw M code of every Power Query lives, wrapped in a small XML envelope but readable as plain text.
  • xl/model/item.data — the compressed Analysis Services tabular model used by Power Pivot. Contains the materialized tables, relationships, DAX measures, and the identities of the users who refreshed each table.
  • xl/queryTables/queryTable1.xml — one file per query loaded to a worksheet table. Records the connection id, refresh timestamps, and the “refreshedBy” username.

The Core Misconception

Users believe Power Query is just “how I got the data in.” The assumption is that once the query runs and the table is populated, the transformation is just history. It is not. Every query you have ever defined in the workbook — including queries that are disabled, set to “Connection Only,” or loaded to the Data Model instead of a sheet — stays in the file as M code until you explicitly delete it. The query is not a log; it is a persistent part of the workbook.

Inside connections.xml: Server Names, Paths, and Command Text

Every external connection a workbook knows about — legacy ODBC queries, Power Query queries, OLAP cubes, and web queries — gets an entry in xl/connections.xml. For a Power Query loaded into a table, Excel writes a synthetic OLEDB connection that points Analysis Services at the in-workbook Mashup engine, and it stores the target source description alongside.

// Representative excerpt from xl/connections.xml

<connection id="1" name="Query - Sales"

description="Connection to the 'Sales' query in the workbook."

type="5" refreshedVersion="7"

background="1" saveData="1">

<dbPr connection="Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;

Location=Sales;Extended Properties=""

command="SELECT * FROM [Sales]"/>

</connection>

For native connections — a legacy Microsoft Query against SQL Server, for instance — the contents are much more revealing. Excel writes the full connection string verbatim:

// A SQL Server connection as written to connections.xml

<dbPr connection="Provider=SQLOLEDB.1;Integrated Security=SSPI;

Persist Security Info=False;Initial Catalog=ContosoFinance;

Data Source=prod-sql-01.contoso.local;Use Procedure for Prepare=1;

Auto Translate=True;Packet Size=4096;Workstation ID=LAPTOP-ANALYST42"

command="EXEC dbo.usp_GetMonthlyCompensation @Month=3, @Year=2026"

commandType="1"/>

In three lines of XML that connection has leaked: the internal DNS name of a production SQL Server, the name of the finance database, the name of a stored procedure that an attacker now knows to look for, the parameters it takes, and even the hostname of the analyst’s laptop (via Workstation ID). None of that is visible on any sheet. All of it is in the XLSX you emailed out.

What About Credentials?

Modern Excel does not embed passwords in connections.xml; credentials are stored in Windows Credential Manager keyed to the user who authored the query. The risk is not a plaintext password leaking — it is the structural disclosure: server hostnames, database names, object names, and parameters that hand an attacker a map of your environment. Older files saved with “Save password” enabled, or web query files, can still contain credentials. Always inspect before sharing.

Reading the M Code Out of a Workbook

The most revealing part of a Power Query workbook is the M code itself. Excel stores the full source of every query in a Custom XML Part called the DataMashup. The part lives at xl/customXml/item1.xml (the number varies by workbook), and it contains a ZIP archive encoded inside an XML wrapper — yes, a ZIP inside a ZIP.

Inside the DataMashup payload sits a file called Formulas/Section1.m, which is the literal source of every Power Query in the workbook, concatenated. A reader who knows the format — or who uses any of the dozen public tools that decode it — gets something like this:

// Excerpt from Formulas/Section1.m inside the DataMashup

section Section1;

shared Sales = let

Source = Sql.Database("prod-sql-01.contoso.local", "ContosoFinance"),

dbo_SalesTxn = Source{[Schema="dbo",Item="SalesTxn"]}[Data],

Removed = Table.RemoveColumns(dbo_SalesTxn, {"InternalMarginPct", "LoyaltyRiskScore"}),

Filtered = Table.SelectRows(Removed, each [CustomerTier] <> "Internal"),

Merged = Table.NestedJoin(Filtered, "CustomerId",

SharePoint.Contents("https://contoso.sharepoint.com/sites/finance"),

…)

in

Merged;

Look at what just leaked. The first line names the production SQL Server. The third line names the source table. The fourth line names two columns that were deliberately removed because they were sensitive — now the recipient knows those columns exist and can guess their role. The fifth line tells the recipient that there is a customer tier called “Internal” that is excluded from external reporting. The sixth line points to an internal SharePoint site. The recipient did not ask for any of this. It came for free with the workbook.

Removed Columns Still Leak

The M step Table.RemoveColumns literally names every column you stripped. “I removed the salary and SSN columns before exporting” leaves behind an M step that says “I removed the Salary and SSN columns” — in source code inside the file. Renaming and removing data in Power Query does not hide the fact that you had it. It advertises it.

What M Code Reveals Beyond the Result Set

The result table on the worksheet is only one dimension of what the query discloses. The M code itself is a structured narrative of your data environment. Each function call is a breadcrumb:

M FunctionWhat It Exposes
Sql.Database(…)Server hostname, database name, sometimes a direct query string
SharePoint.Contents(…)Tenant URL, site path, list/library name, internal collaboration structure
File.Contents(…)UNC paths (\\fileserver01\finance\…), usernames in profile paths (C:\Users\jsmith\…)
Web.Contents(…)Internal API URLs, query parameters, bearer token placeholder names
OData.Feed(…)Dynamics, Salesforce, or other CRM/ERP service endpoints and entity names
Table.SelectRows(…)Business logic: who gets filtered out, which thresholds are applied, what counts as “valid”
Table.RemoveColumns(…)Names of sensitive columns the author chose to drop
#"Merged Queries"The full set of other queries (and therefore other data sources) combined into this one

In a workbook with ten queries, the DataMashup typically contains a hundred or more M steps. Taken together they form a detailed blueprint of the analyst’s data environment: which systems of record they pull from, which files they have access to, which business rules they apply, and which fields they consider sensitive enough to exclude. That blueprint is exactly the kind of intelligence a competitor or attacker would spend effort to acquire. In a Power Query workbook, it ships for free.

The Data Model: A Whole Database Inside the File

When a query is loaded “To Data Model” rather than to a worksheet, the result goes into an embedded Analysis Services tabular engine that ships inside the XLSX as xl/model/item.data. This file is a compressed Analysis Services backup. It contains:

  • The materialized tables — every row that was loaded, compressed but not encrypted, independent of whatever is visible on a worksheet. Exactly the same data-with-the-file problem as the pivot cache, but for arbitrarily many tables.
  • Relationships between tables — foreign keys, cardinality, and cross-filter directions that describe your star schema.
  • DAX measures — the full source of every calculation the analyst defined, including sensitive KPIs, allocations, and financial formulas.
  • Refresh history — per-table “refreshed at” timestamps and the identity of the refreshing user.
  • Display labels and descriptions — friendly names analysts added for measures, often revealing internal terminology, project codenames, and jargon that would otherwise stay inside the team.

A workbook that uses the Data Model as its sole backing store may look deceptively simple on the surface — a few pivot tables, a couple of charts, no visible “data sheet.” The full database is still in there. Treat every file containing a Data Model as if it contained its source tables in full, because it does.

Refresh Metadata: Who Ran It, When, From Where

Power Query and its related objects leave a surprisingly rich audit trail on every refresh. These attributes tend to survive normal cleanup workflows because users do not realize they exist:

Attribute / LocationWhat It Contains
queryTable.refreshedByDisplay name of the Windows/AAD account that last refreshed this table
queryTable.refreshedDateOLE date of the last refresh (down to the second)
connection.credentialsA Credential Manager handle referencing who authorized the connection
connection.backgroundWhether refresh runs in background — implies the data was intended to live-refresh, not be static
Data Model .sqlLogIn some versions, the SQL statements generated against the source, including parameter values
Workstation ID in conn strThe hostname of the laptop that issued the last query

Combine these with the core document properties (docProps/core.xml creator, last-modified-by, and save timestamps) and the recipient has a timeline: who created the analysis, who authored the queries, who refreshed them, and when. That is exactly the metadata forensic investigators use to reconstruct events — and exactly the metadata competitors use to map an organization’s analyst bench.

How This Goes Wrong in Practice

A handful of representative scenarios to make the risk concrete:

Scenario 1: The RFP Response Dashboard

A sales engineer builds a polished pricing dashboard by combining three internal sources: a product catalog from SharePoint, a customer-tier table from SQL Server, and a discount schedule from a CSV on a team network share. The workbook is attached to an RFP response delivered to a prospective customer.

Leak: The M code in the DataMashup names the SharePoint tenant URL and library, the internal SQL Server hostname and discount-schedule file path, and lists the rows excluded for “internal use only.” The prospect now has a map of the sales organization’s pricing infrastructure and knows which discount tiers exist that they were not offered.

Scenario 2: The Board Materials Leak

The CFO’s team ships a board pack containing a financial model backed by a Power Pivot Data Model. A copy is stolen from a directors’ email inbox.

Leak: The Data Model contains the raw general ledger extract in full, every DAX measure with proprietary allocation logic, and the refresh history naming the controller and senior analyst who maintain the model. The attacker obtains far more than the ten slides of summary they expected.

Scenario 3: The FOIA-Released Workbook

A government agency releases a redacted Excel workbook in response to a public records request. The worksheet cells are dutifully redacted. The Power Query metadata is not.

Leak: The DataMashup contains the M code that reads from an internal database with non-redacted column names visible in the M source, plus the email addresses of the staff who refreshed the queries. Journalists extract the recipe, identify the source system, and file follow-up requests more precisely scoped to what the agency hoped to keep private.

Scenario 4: The Downstream Recruiter

An analyst shares a workbook with a staffing agency to demonstrate the types of reports they produce, as part of a job application.

Leak: The M code names every internal system the analyst has accessed, down to UNC paths with their Windows username embedded (\\hq-file01\teams\analytics\sandbox\jsmith\…). The recruiter, and whoever they forward it to, now has a complete picture of the analyst’s prior employer’s stack and internal folder structure.

How to Inspect Power Query Metadata in an XLSX

Confirming what is inside a file takes only a few minutes. Three reliable approaches, in increasing order of automation:

Method 1: The Power Query Editor

The fastest inspection is inside Excel itself. Data → Queries & Connections opens a pane listing every query in the workbook. Right-click any query and select Edit to open the Power Query Editor, then use View → Advanced Editor to see the full M source. This shows you exactly what a motivated recipient will see if they do the same thing after receiving your file. If you do not recognize a query, do not trust that it is harmless — inspect it first.

Method 2: Unzip and Read the XML

Make a copy of the file, rename from .xlsx to .zip, and open it in any archive viewer. The interesting files are:

# Quick command-line audit (Mac/Linux/WSL)

unzip -l report.xlsx | grep -E 'connections|customXml|model'

# Show every server name and path referenced anywhere in the package

unzip -p report.xlsx xl/connections.xml | grep -Eo 'Data Source=[^;"]+'

# Decode the DataMashup container (it contains a zip-in-a-zip)

unzip -p report.xlsx xl/customXml/item1.xml > mashup.xml

# Tools like pqm (Python) or the built-in Power Query Editor will decode Section1.m

Method 3: A Metadata Analysis Tool

For large numbers of files or for compliance workflows, a dedicated tool such as MetaData Analyzer will enumerate every Power Query, every connection, and every Data Model table in a workbook, flag external sources and embedded credentials, and produce a report suitable for attaching to a pre-share approval. This is the only approach that scales beyond a handful of files.

How to Actually Remove Power Query Metadata

There is no single “clean queries” button that gets everything. Depending on what you need to preserve in the file, combine the following techniques:

Option A: Delete the Queries Explicitly

  1. Open Data → Queries & Connections.
  2. Right-click each query and choose Delete.
  3. Also delete any “Connection Only” queries — these do not populate a sheet but still live in the DataMashup.
  4. Save the file.
  5. Verify: unzip the XLSX and confirm the xl/customXml/ DataMashup part is gone or contains only the empty section Section1; stub.

Deleting queries is sufficient for simple cases but does not touch the Data Model. If any query was loaded to the Data Model, the resulting tables survive even after the query is deleted.

Option B: Clear the Data Model

  1. Open the Power Pivot window (enable the add-in if it is not visible).
  2. For each table, right-click the table tab and choose Delete.
  3. Delete any DAX measures you do not want to ship.
  4. Close Power Pivot and save.

This removes the materialized tables and measures from xl/model/item.data. Any pivot tables that depended on the Data Model will break — consider converting them to static values (Copy → Paste Values) before clearing the model.

Option C: Save as a New Workbook

  1. Create a new blank workbook.
  2. Copy the result sheets from the original as Values only (Paste Special → Values).
  3. Save the new file. The queries, connections, Data Model, and DataMashup do not come along.

This is the simplest and most reliable method when the recipient only needs the numbers. You lose interactivity but gain certainty that no query metadata escapes. For most external deliveries, this is the right default.

Option D: Programmatic Sanitization

For automated pipelines — export jobs, pre-send scanning, DLP integrations — remove the relevant parts of the package directly:

# Strip Power Query artifacts from an XLSX in place

import zipfile, shutil

STRIP_PREFIXES = (

"xl/connections.xml",

"xl/customXml/", # DataMashup lives here

"xl/queryTables/",

"xl/model/", # Power Pivot tabular model

)

def strip_power_query(path):

tmp = path + ".tmp"

with zipfile.ZipFile(path) as zin, zipfile.ZipFile(tmp, "w", zipfile.ZIP_DEFLATED) as zout:

for item in zin.infolist():

if any(item.filename.startswith(p) for p in STRIP_PREFIXES):

continue # skip — do not copy to output

zout.writestr(item, zin.read(item.filename))

shutil.move(tmp, path)

Brute-Force Stripping Breaks Relationships

Removing parts of the package without also updating [Content_Types].xml and the workbook’s relationship files can leave Excel showing a repair prompt when the recipient opens the file. For production pipelines either (a) use a library like openpyxl that rewrites relationships correctly, or (b) route all sanitized files through Excel’s own Save As once before shipping, which normalizes the package.

Pre-Share Checklist for Power Query Workbooks

Run through this checklist before releasing any workbook that ever used Power Query, even if the queries look “gone” from the user interface:

  • Have I opened Data → Queries & Connections and reviewed every query, including “Connection Only” ones?
  • Have I read each query’s M source in the Advanced Editor and confirmed it does not disclose sources, paths, or column names I need to protect?
  • Have I opened Power Pivot and verified the Data Model is empty or contains only what the recipient needs?
  • Have I reviewed xl/connections.xml for hostnames, database names, UNC paths, and stored procedure names that should not leave the building?
  • Have I checked DAX measures in Power Pivot for proprietary formulas I do not want disclosed?
  • Have I removed refreshedBy and other identity metadata from any remaining query tables?
  • Have I considered delivering a new workbook with values only, rather than the working file, when the recipient does not need interactivity?
  • Have I verified the final file with an automated metadata analyzer to catch anything I missed?

Organizational Recommendations

Power Query is everywhere, and its metadata is invisible to the people who share files most. Organizations that take this seriously tend to combine several of the following:

Separate Working and Published Files

Make “the model” and “the distributable” two different files. The working workbook contains queries, Data Models, and formulas; the distributable is produced by copying values out. Never share the working file externally.

Egress Scanning for DataMashup

Integrate a Power Query detector into DLP and email-attachment pipelines. Flag any outbound XLSX whose DataMashup is non-trivial in size, and require explicit approval before delivery.

Publish to a Dataset, Not a File

For recurring analyses, move the M logic into a Power BI dataset or Fabric dataflow that is not distributed directly. End-user workbooks connect to the published model, leaving no M code in any shared Excel file.

Training With the Demo

Most users do not believe the M code is still in the file until they see it extracted. A five-minute demonstration that unzips a familiar workbook and prints the source paths is the most effective training tool available.

Conclusion

Power Query is one of Excel’s most productive features. It is also one of the most revealing. The recipe you use to produce a workbook ships with that workbook: the server names you connect to, the files you read, the transformations you apply, the columns you decided to strip, the KPIs you encoded as DAX, and the identities of the people who refreshed the data. None of this is on a worksheet. All of it is in the XLSX.

The fix is not complicated, but it has to be deliberate. Before sharing, inspect the queries, connections, and Data Model; remove anything that is not strictly needed; and when in doubt, deliver values in a new workbook rather than the working file. Treat every workbook that ever touched Power Query as if it contained a blueprint of your data environment, because it does — until you take a specific step to remove it.

The broader pattern repeats across Excel metadata risks: features designed for analyst productivity build up persistence layers that survive casual cleanup. Author names, pivot caches, custom XML parts, the DataMashup, the Data Model — each is invisible in the UI, each ships with the file, and each requires a specific action to remove. Build that action into the workflow, not just the checklist.

Detect Power Query Metadata in Your Excel Files

Use MetaData Analyzer to inspect your XLSX files for Power Query M code, embedded connection strings, Data Model tables, and refresh identities before sharing. See exactly what your transformations are leaking.