Back to Blog
Technical

Excel Metadata Considerations for Cloud Storage Services

Cloud storage has become the default home for business spreadsheets. But when you upload an Excel file to OneDrive, Google Drive, SharePoint, or Dropbox, the metadata story gets more complex—not simpler. Cloud platforms add their own metadata layers, preserve version histories, and synchronize file properties across devices in ways that can expose sensitive information long after you thought it was removed.

By Technical TeamMarch 10, 202621 min read

The Cloud Metadata Problem

When Excel files lived on local drives and file servers, metadata risks were contained. An author name, a file path, a hidden sheet—these stayed wherever the file was stored. Cloud storage fundamentally changes this equation. Files are now synchronized across multiple devices, shared via links with configurable permissions, indexed by search engines, and preserved in version histories that may be impossible to fully purge.

The challenge is that each cloud platform handles Excel metadata differently. Some preserve every byte of the original file's metadata. Others strip certain properties during conversion. Some add entirely new metadata layers—collaboration timestamps, access logs, and sharing histories—that create additional privacy concerns. Understanding these differences is essential for anyone who stores sensitive spreadsheets in the cloud.

Cloud Metadata Risks at a Glance

  • Version history retention: Cloud platforms may retain every version of a file for 30 days to indefinitely, meaning metadata you “removed” still exists in prior versions accessible to anyone with file access.
  • Sync-propagated metadata: Cleaning metadata on one device doesn't guarantee it's cleaned across all synced copies. Race conditions can restore old metadata from another device.
  • Platform-added metadata: Cloud services add their own metadata layers—who accessed the file, when, from where—creating new privacy exposure beyond what the original file contained.
  • Sharing link persistence: Even after revoking a sharing link, cached versions, browser history, and CDN edge caches may retain access to files with their full metadata intact.

How Major Cloud Platforms Handle Excel Metadata

Each cloud storage provider has its own approach to handling the metadata embedded in Excel files. Some treat XLSX files as opaque blobs and preserve everything. Others parse the file format and add, modify, or strip metadata during upload, conversion, or collaborative editing.

Microsoft OneDrive and SharePoint

As Microsoft's own ecosystem, OneDrive and SharePoint have the deepest integration with Excel file metadata. They preserve all core and extended properties from the XLSX format and add additional metadata layers on top.

Metadata Behavior on OneDrive/SharePoint

  • Core properties preserved: Author, title, subject, keywords, comments, and category fields from core.xml are fully retained and indexed for search.
  • Extended properties preserved: Application name, company, manager, and document statistics from app.xml remain intact.
  • Version history: Every save creates a new version. OneDrive retains up to 500 versions for Microsoft 365 subscribers. SharePoint retains versions based on library settings (often unlimited for major versions).
  • Co-authoring metadata: When multiple users edit simultaneously, SharePoint tracks each user's session, creating metadata about who edited which cells and when.
  • SharePoint columns: Site administrators can define custom metadata columns that attach to files independently of the file's internal metadata, creating a secondary metadata layer.

# Accessing OneDrive file metadata via Microsoft Graph API

GET https://graph.microsoft.com/v1.0/me/drive/items/{item-id}

Response includes:
{
  "name": "Q4-Budget.xlsx",
  "createdBy": {
    "user": {
      "displayName": "Jane Smith",
      "email": "jane.smith@company.com"
    }
  },
  "lastModifiedBy": {
    "user": {
      "displayName": "Mike Johnson",
      "email": "mike.j@company.com"
    }
  },
  "createdDateTime": "2026-01-15T09:30:00Z",
  "lastModifiedDateTime": "2026-03-08T14:22:00Z",
  "shared": {
    "scope": "organization",
    "sharedDateTime": "2026-02-01T10:00:00Z"
  }
}

Google Drive and Google Sheets

Google Drive handles Excel metadata differently depending on whether you keep the file in XLSX format or convert it to Google Sheets. This distinction is critical for metadata management.

XLSX Files Stored Without Conversion

  • • File is stored as an opaque binary blob—all internal metadata (author, company, hidden sheets, comments) is preserved exactly as uploaded.
  • • Google Drive adds its own metadata layer: uploader identity, upload timestamp, sharing permissions, and access activity.
  • • Version history retains up to 100 versions or 30 days (whichever comes first) for free accounts; Google Workspace accounts can retain versions indefinitely.
  • • Google's search indexes file names and some document properties but does not deeply index XLSX internal metadata.

XLSX Files Converted to Google Sheets

  • • Conversion strips most XLSX-native metadata (author, company, application version from core.xml and app.xml).
  • • Hidden sheets are preserved in the converted Google Sheet and may become visible to collaborators who know how to look.
  • • Cell comments are converted to Google Sheets comments, which are attributed to the Google account that performed the conversion—not the original commenter.
  • • The original XLSX file may still exist as a separate file in Google Drive, retaining all its original metadata.
  • • Google Sheets adds comprehensive revision history that tracks every edit by every collaborator, including cell-level changes.

Dropbox

Dropbox takes a file-preserving approach: it stores Excel files as-is without parsing or modifying their internal structure. This means all XLSX metadata is preserved exactly as it exists in the original file.

Dropbox Metadata Behavior

  • Full preservation: All XLSX metadata (core properties, extended properties, custom properties, hidden sheets, comments, VBA macros) is preserved without modification.
  • Version history: Dropbox Basic retains 30 days of version history. Dropbox Professional and Business retain 180 days. Extended version history add-on provides up to 10 years.
  • Event log: Dropbox Business provides detailed event logs showing who viewed, downloaded, shared, or modified files—creating an administrative metadata trail.
  • Shared link metadata: When you create a shared link, Dropbox records who created it, when, what permissions were set, and who accessed it.

The Version History Problem

Version history is arguably the most significant metadata risk in cloud storage. Even if you meticulously clean an Excel file's metadata before sharing it, previous versions of that file—complete with all their original metadata—may remain accessible to anyone with file access.

Version History Retention by Platform

PlatformFree TierBusiness TierCan Purge History?
OneDrive25 versions500 versionsIndividual versions only
SharePointN/AConfigurable (often 500+)Admin-configurable
Google Drive100 versions / 30 daysConfigurable retentionKeep forever or auto-delete
Dropbox30 days180 days (up to 10 years)Permanent delete available

This creates a practical problem: if you upload an Excel file with sensitive metadata on Monday, realize the issue on Wednesday, and upload a cleaned version, anyone with access to the file can still retrieve Monday's version with all the original metadata intact. Simply “replacing” a file does not remove its history.

# Scenario: Metadata persists in version history

Day 1: Upload "Sales-Forecast.xlsx"
  - Author: "Sarah Chen, VP Sales"
  - Company: "Acme Corp"
  - Hidden sheet: "Internal-Targets" with margin data
  - Comments: Notes from executive review

Day 3: Realize metadata exposure, clean file
  - Remove author, company, hidden sheets, comments
  - Upload cleaned version to same location

Day 5: Competitor accesses shared file
  - Downloads current (clean) version ✓
  - Clicks "Version history" → downloads Day 1 version
  - Extracts all original metadata including:
    → Author identity and role
    → Company name
    → Hidden sheet with margin targets
    → Executive review comments

Mitigating Version History Risks

  • Clean before first upload: Always strip metadata before the file ever reaches the cloud. The cleanest version should be the first version.
  • Upload to a new location: Instead of replacing a file that has sensitive history, upload the cleaned version as a new file with a new sharing link.
  • Purge version history: Where possible, delete old versions manually. On SharePoint, administrators can configure version limits.
  • Use short retention policies: Configure your cloud platform to automatically delete versions older than your compliance minimum.

Synchronization and Metadata Propagation

Cloud sync clients create local copies of files on each device. When you modify a file's metadata on one device, the sync engine must propagate that change to every other device and the cloud copy. This process introduces several metadata risks.

Sync Conflict Scenarios

Race Condition: Metadata Restoration

When metadata is cleaned on one device but another device still has the old version synced locally, the sync engine may detect a conflict. Depending on the platform's conflict resolution strategy, the old metadata-rich version could overwrite the cleaned version.

Laptop A: Cleans metadata from Budget.xlsx, syncs to cloud
Laptop B: Opens old Budget.xlsx (with metadata) offline
Laptop B: Makes a small cell edit, comes back online
Cloud: Conflict detected → may merge or keep “latest”
Result: Metadata from Laptop B's copy may be restored

Desktop Sync Client Risks

Desktop sync clients like OneDrive, Google Drive for Desktop, and Dropbox create local file system mirrors. These local copies retain all file metadata including filesystem-level attributes that the cloud platform might not display in its web interface.

Metadata Exposed Through Local Sync

  • File system timestamps: Created and modified dates from the original file system may differ from the cloud platform's timestamps, revealing when the file truly originated.
  • Extended attributes: On macOS, extended attributes (xattr) can carry additional metadata like where the file was downloaded from, the quarantine flag, and Finder tags.
  • NTFS alternate data streams: On Windows, NTFS alternate data streams can contain zone information, custom metadata, and security descriptors that are preserved through sync.
  • Thumbnail caches: Operating systems generate thumbnail previews of Excel files that may persist in cache directories even after the file is deleted from the synced folder.

Platform-Added Metadata: The Second Layer

Beyond preserving the metadata embedded in your Excel files, cloud platforms generate their own metadata about file activity. This “second layer” can reveal sensitive information about your organization's operations, workflows, and personnel.

Access and Activity Logs

  • • Who viewed the file and when
  • • Who downloaded a copy
  • • IP addresses and geographic locations of viewers
  • • Device types and operating systems used
  • • Duration of file access sessions
  • • Whether the file was previewed or fully opened

Sharing and Permission History

  • • Complete sharing history (who shared with whom)
  • • Permission changes over time
  • • Sharing link creation and revocation timestamps
  • • External vs. internal sharing patterns
  • • Forwarded sharing link chains
  • • Group vs. individual access grants

This platform-level metadata often reveals more about your organization than the file's internal metadata. For example, if a financial model is accessed by three executives at 2 AM on a Sunday before an acquisition announcement, the access pattern itself is sensitive information—even if the file's internal metadata has been thoroughly cleaned.

# SharePoint audit log revealing sensitive access patterns

{
  "Operation": "FileAccessed",
  "ItemName": "Project-Falcon-Valuation.xlsx",
  "UserId": "cfo@company.com",
  "ClientIP": "203.0.113.42",
  "UserAgent": "Microsoft Office/16.0",
  "EventTime": "2026-03-09T02:15:00Z",
  "SiteUrl": "https://company.sharepoint.com/sites/ma-team",
  "SourceRelativeUrl": "Shared Documents/Confidential"
}

// This single log entry reveals:
// - The CFO accessed a valuation file
// - At 2:15 AM (unusual hours = urgency)
// - From an IP outside the corporate network
// - The file is in a folder named "Confidential"
// - Within an M&A team site

Sharing Links and External Access Risks

Cloud sharing links are the primary way Excel files move between organizations. Each sharing method carries different metadata implications.

Sharing Methods Compared

MethodFile Metadata ExposedVersion HistoryActivity Tracked
View-only linkVisible in web previewUsually hiddenViews logged
Edit linkFully accessibleAccessibleEdits and views logged
Download linkFull file with all metadataOnly current versionDownloads logged
Email attachmentFull file with all metadataNoneNot tracked by cloud

The “Anyone with the link” Trap

Many organizations use “Anyone with the link” sharing for convenience. This creates several metadata risks:

  • • The link can be forwarded without your knowledge, giving unintended recipients access to all file metadata.
  • • Search engines can index publicly shared links, making your files (and their metadata) discoverable.
  • • Even after revoking a link, recipients who downloaded the file retain their copy with all metadata intact.
  • • Some platforms allow link recipients to see other files in the same folder if permissions are misconfigured.

Best Practices for Cloud-Stored Excel Files

Protecting metadata in cloud-stored Excel files requires a combination of preventive measures, platform configuration, and organizational policies. Here are the essential practices for each stage of the file lifecycle.

Before Uploading

Pre-Upload Checklist

  • • Run Document Inspector to remove personal information
  • • Check for and remove hidden sheets and named ranges
  • • Clear cell comments and revision marks
  • • Remove external data connections and linked workbooks
  • • Strip VBA macros if not needed
  • • Verify the author and company fields are appropriate
  • • Check for sensitive data in header/footer fields

Automated Pre-Upload Cleaning

  • • Implement a metadata scrubbing script in your upload workflow
  • • Use PowerShell or Python to strip properties programmatically
  • • Configure Data Loss Prevention (DLP) policies to scan uploads
  • • Set up automated alerts for files with sensitive metadata patterns
  • • Create organization templates with clean default metadata
  • • Use our MetaData Analyzer tool to inspect files before uploading

Platform Configuration

Recommended Cloud Platform Settings

  • Limit version history retention: Set the minimum retention period required by your compliance framework. Fewer retained versions means less metadata exposure.
  • Restrict external sharing: Default to “Specific people” rather than “Anyone with the link.” Require authentication for all shared links.
  • Enable audit logging: On enterprise plans, enable comprehensive audit logging so you can track who accessed files and when—important for incident response.
  • Configure DLP policies: Use Microsoft Purview, Google DLP, or third-party tools to scan Excel files for sensitive metadata patterns before they can be shared externally.
  • Disable link forwarding: Where possible, configure shared links to be non-transferable so they only work for the intended recipients.
  • Set expiration dates: Configure shared links to expire automatically after a set period to limit long-term metadata exposure.

When Sharing Externally

External sharing carries the highest metadata risk because you lose control of the file once it leaves your cloud environment. Follow these practices to minimize exposure:

# Python script to clean metadata before cloud upload

import openpyxl
from openpyxl.packaging.core import DocumentProperties
import shutil
import os

def clean_for_cloud_sharing(input_path, output_path):
    """Strip metadata from Excel file before uploading to cloud storage."""

    # Copy file to avoid modifying original
    shutil.copy2(input_path, output_path)

    wb = openpyxl.load_workbook(output_path)

    # Clear core document properties
    wb.properties = DocumentProperties()
    wb.properties.creator = ""
    wb.properties.lastModifiedBy = ""
    wb.properties.title = ""
    wb.properties.subject = ""
    wb.properties.description = ""
    wb.properties.keywords = ""
    wb.properties.category = ""

    # Remove comments from all sheets
    for sheet in wb.worksheets:
        for row in sheet.iter_rows():
            for cell in row:
                if cell.comment:
                    cell.comment = None

    # Remove hidden sheets (optional - based on policy)
    hidden_sheets = [s for s in wb.sheetnames
                     if wb[s].sheet_state == 'hidden']
    for name in hidden_sheets:
        del wb[name]

    wb.save(output_path)
    print(f"Cleaned file saved to: {output_path}")

# Usage
clean_for_cloud_sharing(
    "Sales-Forecast-Internal.xlsx",
    "Sales-Forecast-External.xlsx"
)

Compliance and Regulatory Considerations

Storing Excel files in the cloud intersects with multiple regulatory frameworks. The combination of file-level metadata and platform-level metadata creates a complex compliance landscape.

GDPR Implications

  • • Author names and email addresses in metadata are personal data under GDPR
  • • Cloud platform access logs containing user identifiers are also personal data
  • • Data subject access requests (DSARs) must include metadata if it contains personal data
  • • Right to erasure applies to metadata—including version history copies
  • • Cross-border data transfers apply to cloud-stored files and their metadata

Data Residency and Sovereignty

  • • Cloud storage regions determine where file metadata physically resides
  • • CDN edge caches may store file previews (with metadata) in different jurisdictions
  • • Backup and disaster recovery copies may exist in additional regions
  • • Some industries require metadata to remain within national borders
  • • Audit logs may be stored in different regions than the files themselves

Litigation Hold and eDiscovery

Cloud-stored Excel files and their metadata are discoverable in litigation. Cloud platforms provide eDiscovery tools that can search across file metadata, version history, access logs, and sharing records. Organizations should be aware that:

  • • Litigation holds can prevent deletion of files and their version history, including metadata-rich older versions.
  • • Cloud platform audit logs can be subpoenaed, revealing who accessed sensitive files and when.
  • • Metadata that was “deleted” may still exist in backup systems and be recoverable during forensic examination.
  • • Intentionally destroying metadata after a litigation hold is in place constitutes spoliation and can result in sanctions.

Metadata During Cloud Platform Migrations

Migrating Excel files between cloud platforms—from Google Drive to OneDrive, or from Dropbox to SharePoint—introduces unique metadata challenges. File-level metadata may be preserved, modified, or lost depending on the migration method.

Common Migration Metadata Issues

  • Timestamp changes: Migration tools often reset creation dates to the migration date, losing the original file history. The internal XLSX metadata may retain the original dates, creating a mismatch.
  • Author mapping: User accounts on the source platform may not map to accounts on the destination platform, leading to orphaned author references or incorrect attribution.
  • Version history loss: Most migration tools only transfer the current version of each file. All version history—and the metadata it contains—is typically lost during migration.
  • Permission metadata: Sharing permissions rarely transfer cleanly between platforms. Files may become more or less accessible after migration.
  • Format conversion artifacts: If files are converted during migration (e.g., Google Sheets back to XLSX), new metadata may be injected by the conversion process.

Platform migrations present a unique opportunity: they are a natural point to implement metadata cleaning as part of the migration workflow. Since files are already being processed in bulk, adding a metadata scrubbing step is relatively low-effort and can dramatically reduce your organization's metadata exposure on the new platform.

Taking Control of Cloud Metadata

Cloud storage makes Excel files more accessible, collaborative, and resilient. But it also multiplies the metadata surface area. Every upload, sync, share, and version creates new metadata that can reveal sensitive information about your organization, your people, and your operations.

The key takeaway is that metadata hygiene must happen before files reach the cloud, not after. Once a file is uploaded with sensitive metadata, version history, sync propagation, and platform-level activity logs make complete cleanup extremely difficult. Treat the first upload as the point of no return and ensure your files are clean before that moment.

Key Takeaways

  • • Cloud platforms preserve and add to Excel file metadata—they don't reduce it.
  • • Version history is the biggest risk: old versions with sensitive metadata persist for weeks, months, or indefinitely.
  • • Each platform handles metadata differently—understand your platform's specific behavior.
  • • Clean metadata before the first upload, not after. Retroactive cleaning is unreliable.
  • • Platform-added metadata (access logs, sharing history) creates a second layer of exposure beyond the file itself.
  • • Sync conflicts can restore metadata you already removed. Clean on all devices simultaneously.
  • • Configure platform settings to minimize retention, restrict sharing, and enable audit logging.

Check Your Excel Files Before Uploading to the Cloud

Use our free MetaData Analyzer to inspect your Excel files for hidden metadata, author information, hidden sheets, and other sensitive data before uploading to any cloud storage platform.