Forensic Analysis of Excel Macros and VBA Code

Why Macros Are a Critical Forensic Artifact

Excel macros, written in Visual Basic for Applications (VBA), occupy a unique position in digital forensics. Unlike passive metadata—author names, timestamps, or document properties—VBA code is active. It executes. It can download files, exfiltrate data, modify system settings, and communicate with remote servers. This makes macro forensics a discipline that sits at the intersection of spreadsheet analysis and malware reverse engineering.

At the same time, VBA macros are routinely used for legitimate business automation: generating reports, transforming data, interfacing with databases and APIs. The forensic challenge is distinguishing legitimate automation from malicious behavior, and understanding precisely what the code does—and what it was designed to hide.

What Macro Forensics Can Reveal

• Author identification: Code style, variable naming conventions, and comments that fingerprint the developer
• Malicious capability: Network connections, file system access, credential harvesting, and payload delivery
• Obfuscation intent: Deliberate attempts to hide code behavior using encoding, splitting, or misdirection
• Execution history: Log entries, registry artifacts, and file system traces that show when macros ran
• Code provenance: Copied code, borrowed modules, and reused logic that links to other files or actors
• Timestamp evidence: Module creation and modification times that establish when code was written
• External dependencies: Referenced libraries, COM objects, and API calls that characterize the attack surface

Understanding How VBA Is Stored in Excel Files

Before analyzing macros forensically, you need to understand how VBA code is physically stored inside an Excel file. The storage mechanism differs depending on the file format, and this has direct implications for what you can extract and how.

XLSM Files: ZIP Archive with Binary VBA Stream

Macro-enabled Excel files (.xlsm, .xlam) are ZIP archives with a binary VBA storage embedded inside xl/vbaProject.bin. This binary file is a Compound Document File (OLE2 format)—the same format used by legacy .xls files. The VBA source code is compressed and stored inside this binary container.

# Extract the ZIP contents of an XLSM file

mkdir xlsm_contents

unzip suspicious.xlsm -d xlsm_contents/

# The VBA project is a binary OLE2 file

ls -la xlsm_contents/xl/vbaProject.bin

# Identify the file type

file xlsm_contents/xl/vbaProject.bin

# Output: Composite Document File V2 Document...

Key Files Inside vbaProject.bin

• VBA/ThisWorkbook — Workbook-level code
• VBA/Sheet1, VBA/Sheet2 — Sheet code
• VBA/Module1 — Standard modules
• VBA/UserForm1 — Form code
• _VBA_PROJECT — Compiled p-code
• PROJECT — Module metadata
• PROJECTwm — Unicode name map

Legacy XLS Format

Legacy .xls files store VBA directly in the binary BIFF format. The entire file is an OLE2 compound document. Tools like oledump.py andolevba can parse both formats. The XLS format is more common in older malware campaigns because macro security warnings differ from XLSM.

P-Code vs Source Code: A Critical Distinction

VBA stores both the human-readable source code and a compiled intermediate representation called p-code. This distinction has profound forensic implications.

Source Code

• Human-readable VBA text
• Compressed using a proprietary algorithm
• What you see in the VBA editor
• Can be modified without affecting p-code
• Checked first when the file opens

P-Code (Compiled)

• Compiled bytecode for the VBA engine
• Executed when source is unavailable or version mismatches
• Can differ from the source code if tampered
• Used by attackers to hide true behavior
• Requires specialized tools to decompile

Forensic alert: A sophisticated attacker can modify the source code after compilation so that what you see in the VBA editor does not match what actually executes. If the p-code and source code disagree, treat the file as highly suspicious. This technique, sometimes called "p-code stomping," is designed specifically to evade static analysis.

Essential Tools for Macro Extraction

Several purpose-built tools can extract and analyze VBA code from Excel files without opening them in Excel itself. Opening suspicious files directly is risky because macros can execute on open. Always use a controlled environment or purpose-built analysis tools.

olevba and oletools Suite

The oletools suite by Philippe Lagadec is the de facto standard for VBA extraction and analysis. The olevba tool extracts source code from all macro-enabled Office formats, detects suspicious keywords, and automatically flags obfuscation techniques.

# Install oletools

pip install oletools

# Extract all VBA code from a file

olevba suspicious.xlsm

# Detailed analysis with IOC extraction

olevba --reveal --decode suspicious.xlsm

# Output just the raw source code

olevba --code suspicious.xlsm > extracted_vba.txt

# Analyze multiple files at once

olevba --csv *.xlsm > batch_analysis.csv

# Detect autorun macros specifically

olevba --json suspicious.xlsm | python3 -m json.tool

What olevba Automatically Flags

• AutoExec keywords: AutoOpen, Workbook_Open, Document_Open, AutoClose
• Suspicious keywords: Shell, CreateObject, WScript.Shell, PowerShell
• Network activity: XMLHTTP, WinHttp, URLDownloadToFile
• Obfuscation: Chr() concatenation, StrReverse, base64 strings
• File system access: Open, FileSystemObject, Environ

oledump.py for OLE Structure Analysis

oledump.py by Didier Stevens provides lower-level access to the OLE2 structure of the vbaProject.bin file, listing all streams and allowing individual stream extraction. This is particularly valuable when source code has been stripped or when you need to examine the raw compiled p-code.

# List all OLE streams in the VBA project

python3 oledump.py xl/vbaProject.bin

# Output example:

1: 4096 '\x01CompObj'

2: 512 '\x05DocumentSummaryInformation'

3: 512 '\x05SummaryInformation'

4: 4096 'VBA/ThisWorkbook' M

5: 8192 'VBA/Module1' M

6: 2048 'VBA/_VBA_PROJECT'

# Extract and decompress a specific module (stream 5)

python3 oledump.py -s 5 -v xl/vbaProject.bin

# Dump raw bytes of a stream for hex analysis

python3 oledump.py -s 5 -d xl/vbaProject.bin | xxd | head -50

Streams marked with M in oledump output contain VBA macros. The "M" flag means the stream has been identified as containing compressed VBA source code. Streams without this flag may still contain relevant forensic data, such as compiled p-code or OLE metadata.

pcodedmp for P-Code Analysis

When you suspect that source code has been manipulated to hide the true behavior (p-code stomping), pcodedmp can disassemble the compiled p-code directly, bypassing the source code layer entirely.

# Install pcodedmp

pip install pcodedmp

# Disassemble p-code from the VBA project

python3 pcodedmp.py xl/vbaProject.bin

# Compare with olevba output to detect p-code stomping

olevba suspicious.xlsm > source_code.txt

python3 pcodedmp.py xl/vbaProject.bin > p_code.txt

diff source_code.txt p_code.txt

Investigative note: If pcodedmp output shows function calls or API references that do not appear in the source code extracted by olevba, you have found evidence of p-code stomping. This is a strong indicator of deliberate obfuscation and a file that deserves intensive dynamic analysis in a sandboxed environment.

Identifying Malicious Macro Patterns

Malicious macros tend to follow recognizable patterns. Understanding these patterns allows investigators to quickly assess the threat level of a macro file and prioritize which elements to analyze in depth.

Auto-Execution Triggers

The first thing any forensic analyst checks is whether macros are designed to run automatically—without the user explicitly clicking a button. Auto-execution is the hallmark of malicious macros, because legitimate automation generally requires deliberate user action.

Trigger Name	Location	Fires When
Workbook_Open	ThisWorkbook	File opens (most common)
Auto_Open	Any module	File opens (legacy)
Workbook_Activate	ThisWorkbook	Workbook gets focus
Workbook_BeforeClose	ThisWorkbook	File closes (persistence)
Worksheet_Activate	Sheet module	Sheet is selected
Worksheet_Change	Sheet module	Any cell is changed
Application.OnTime	Any	Scheduled delay (evasion)

# Grep for autorun triggers in extracted VBA

grep -iE "Workbook_Open|Auto_Open|Workbook_Activate|

Worksheet_Activate|Worksheet_Change|Application\.OnTime" \

extracted_vba.txt

Obfuscation Techniques

Malicious VBA code is almost always obfuscated to evade static analysis and make forensic examination harder. Recognizing these techniques is essential for understanding what the code actually does.

Chr() String Concatenation

Replaces string literals with sequences of Chr() function calls that concatenate ASCII codes. The string "PowerShell" becomes unrecognizable to simple keyword scanners.

// Obfuscated:

cmd = Chr(80) & Chr(111) & Chr(119) & Chr(101) & Chr(114) & Chr(83) & Chr(104) & Chr(101) & Chr(108) & Chr(108)

// Decoded: "PowerShell"

String Reversal

The payload string is stored in reverse order and reversed at runtime usingStrReverse(). Simple but effective against pattern-matching.

// Obfuscated:

cmd = StrReverse("llehSrewoP")

// Decoded: "PowerShell"

Base64 Encoded Payloads

The actual command or payload is stored as a base64-encoded string and decoded at runtime, often being passed directly to PowerShell or executed via Shell().

// Suspicious pattern:

Dim b64 As String

b64 = "cG93ZXJzaGVsbCAtZW5jb2RlZENvbW1hbmQ..."

Shell "powershell -EncodedCommand " & b64

Variable Name Mangling

Variables and function names are replaced with meaningless strings (aA1bB2cC3, x_1_y_2) to make the code logic impossible to follow by reading alone. Combined with Chr() obfuscation, this can make even a short macro extremely difficult to analyze statically.

High-Risk API Calls and Patterns

Certain API calls and code patterns are almost exclusively associated with malicious activity. Their presence warrants immediate escalation and sandbox analysis.

Network and Download

MSXML2.XMLHTTP
WinHttp.WinHttpRequest
URLDownloadToFile
InternetOpen / InternetReadFile
CreateObject("Msxml2.ServerXMLHTTP")

Process Execution

Shell() / WScript.Shell.Run()
CreateObject("WScript.Shell")
PowerShell.exe / cmd.exe
CreateProcess (Win32 API)
ShellExecute / WinExec

File System

FileSystemObject
Environ("TEMP") / Environ("APPDATA")
Open ... For Binary As
Put / Get (binary file I/O)
Kill / FileCopy

Persistence and Registry

RegWrite / RegRead
HKCU\Software\Microsoft\Windows\CurrentVersion\Run
Scheduled Task creation
Startup folder writes
COM object registration

Author Identification Through Code Forensics

Beyond identifying what a macro does, forensic analysis can often identify who wrote it. VBA code carries numerous stylistic and technical fingerprints that can link a file to a specific developer, organization, or threat actor group.

Code Style Analysis

Every developer has unconscious habits. Variable naming conventions, indentation style, comment language, error handling approaches, and the choice of API calls over built-in functions are all characteristics that create a code "fingerprint."

Style Fingerprints to Examine

• Naming convention (camelCase, Hungarian, underscores)
• Comment language and style
• Error handling approach (On Error Resume Next vs structured)
• Indentation (tabs vs spaces, width)
• Line length and wrapping habits
• Preferred string manipulation methods
• Use of Option Explicit or lack thereof

Linguistic Markers

Comments and string literals in the code can reveal the author's native language. Look for:

• Error messages in non-English languages
• Variable names that are transliterations
• Comments that reveal cultural context
• Date formats embedded in strings (DD/MM vs MM/DD)

Metadata Inside the VBA Project

The VBA project itself contains metadata that identifies the author and development environment. This data is separate from the document properties and is not removed by Excel's built-in Document Inspector.

# Extract PROJECT stream from vbaProject.bin

python3 oledump.py -s PROJECT xl/vbaProject.bin

# The PROJECT stream contains:

# - Module names and types

# - CodePage (reveals locale/language)

# - HelpFile path (may reveal developer file system)

# - LastSaved timestamp (may differ from document properties)

# - Library references

# Extract using strings utility for quick review

strings xl/vbaProject.bin | grep -iE "path|user|module|help"

HelpFile Path Artifact

If a developer associated a help file with the project, the path is stored in the PROJECT stream. Paths like C:\Users\JohnDoe\Documents\...directly reveal the developer's username and machine structure.

CodePage Analysis

The CodePage value identifies the character encoding used when writing the VBA code. Common forensic values: 1252 (Western Europe), 1251 (Cyrillic), 936 (Simplified Chinese), 932 (Japanese). This can narrow the developer's likely geographic location.

Code Reuse and Provenance

Malicious actors frequently reuse code from open-source repositories, paste it from online forums, or copy it between campaigns. Identifying code reuse can link a file to known threat actors, previous incidents, or public exploit code repositories.

Code Reuse Investigation Steps

• Hash distinctive code blocks: Extract unique function bodies and search threat intelligence platforms for matching hashes
• Search for unique strings: Error messages, URL patterns, registry keys, and file paths are often reused verbatim
• Check public repositories: GitHub, Pastebin, and VBA malware databases contain cataloged samples for comparison
• Compare with known campaigns: Threat intelligence reports from vendors often include VBA code snippets for attribution

# Extract unique strings for TI platform searching

strings xl/vbaProject.bin | sort | uniq | \

grep -v "^.$" | grep -v "^..$" > unique_strings.txt

# Hash the extracted VBA code

sha256sum extracted_vba.txt

md5sum extracted_vba.txt

# Submit binary to threat intelligence (via API)

# e.g., VirusTotal, Malware Bazaar, Hybrid Analysis

Dynamic Analysis in a Safe Environment

Static analysis tells you what the code says. Dynamic analysis tells you what it actually does when it runs. For obfuscated or complex macros, dynamic analysis in a controlled sandbox environment is often the only way to fully understand the behavior.

Critical Safety Warning

Never open a suspected malicious Excel file on a production machine or any machine connected to your organization's network. Even with macro security settings, malicious files can exploit vulnerabilities in Excel itself.

Use an isolated virtual machine with no network access, no shared folders with the host, and no persistent state. Snapshot the VM before execution and revert after each test.

Sandbox Analysis Environments

Automated Sandboxes

• Any.run: Interactive sandbox with live monitoring
• Hybrid Analysis: Detailed behavior reports
• Cuckoo Sandbox: Self-hosted, network monitoring
• Joe Sandbox: Deep behavioral analysis
• VirusTotal: Multi-engine scan + basic behavior

What to Monitor

• Network connections (IP, domain, HTTP requests)
• Files created, modified, or deleted
• Registry keys read or written
• Processes spawned (especially powershell.exe)
• Windows API calls and parameters
• Memory allocations and injections

Manual Dynamic Analysis with Process Monitor

For more controlled dynamic analysis, running the file in an isolated VM with Sysinternals Process Monitor (ProcMon) and Wireshark provides granular visibility into every action the macro takes.

# Before opening the file in your isolated VM:

# 1. Start Process Monitor (filter for excel.exe and child processes)

Procmon.exe /Minimized /Quiet /BackingFile C:\evidence\procmon.pml

# 2. Start Wireshark capture on the network interface

dumpcap -i 1 -w C:\evidence\capture.pcap

# 3. Open the Excel file with macros enabled

# 4. Wait for activity to complete, then stop captures

# 5. Revert VM to pre-execution snapshot

Process Monitor captures file system operations, registry operations, network activity, and process events in real time. Filter the output for the Excel process and any child processes it spawns to build a complete picture of the macro's behavior.

Tracing Macro Execution History

In many investigations, the question is not just whether a macro is malicious, but whether it has already executed—and when. Several artifact sources can answer this question even after the fact.

Windows Artifacts from Macro Execution

Prefetch Files

Windows Prefetch records execution of every program, including processes spawned by macros. If a macro ran powershell.exe, there will be a prefetch file for PowerShell with a timestamp corresponding to when the macro executed. Location: C:\Windows\Prefetch\

Windows Event Logs

Event ID 4688 (Process Creation, if auditing is enabled) records every new process, including command line arguments. PowerShell execution logging (Event IDs 4103, 4104) captures the actual commands run by PowerShell, including those launched from macros.

LNK Files and JumpLists

When Excel opens a file, Windows creates a shortcut (LNK) file recording the file path, size, MAC timestamps, and the machine that opened it. JumpLists record the same information per application. Both sources can confirm that a file was opened even if it has since been deleted.

Office Trust Record

When a user clicks "Enable Content" to allow macros, Excel records this decision in the registry. The path is: HKCU\Software\Microsoft\Office\<version>\Excel\Security\Trusted Documents\TrustRecords. Each entry includes the file path and the timestamp when trust was granted, directly confirming when macros were enabled.

Excel-Specific Execution Artifacts

# Check Office Trust Records in registry

reg query "HKCU\Software\Microsoft\Office\16.0\Excel\

Security\Trusted Documents\TrustRecords"

# Check Recent Files list

reg query "HKCU\Software\Microsoft\Office\16.0\Excel\

File MRU"

# LNK files for Excel documents

dir C:\Users\%USERNAME%\AppData\Roaming\Microsoft\Windows\

Recent\*.lnk

# Prefetch for processes macro may have spawned

dir C:\Windows\Prefetch\POWERSHELL*.pf

dir C:\Windows\Prefetch\CMD*.pf

Cross-reference the Trust Records timestamp with the file's modification time and any network or process activity from that time window. This triangulates the moment of macro execution.

Macro Forensics Investigation Checklist

Step-by-Step Macro Investigation Process

Preserve the original file

Hash the original file (MD5, SHA-256) immediately. Never work on the original. All analysis should be on copies.

Extract the VBA project

Unzip the XLSM to access xl/vbaProject.bin. Use oledump to list all streams. Identify all module streams marked with "M".

Run olevba for automated analysis

Extract all source code and review the automated flags for autorun triggers, suspicious keywords, obfuscation techniques, and network indicators.

Check for p-code stomping

Run pcodedmp and compare the p-code output with the extracted source. Any discrepancy indicates source manipulation and requires dynamic analysis.

Analyze VBA project metadata

Extract the PROJECT stream. Check CodePage (language indicator), HelpFile paths (developer file system), and library references.

Decode obfuscated strings

Manually or programmatically decode Chr() concatenation, base64 strings, and StrReverse patterns to recover the actual payload commands and URLs.

Search threat intelligence databases

Submit the file hash and unique strings to VirusTotal, Malware Bazaar, and threat intelligence platforms. Look for code reuse linking to known campaigns.

Perform sandbox analysis

Run the file in an isolated sandbox environment. Capture network activity, file system changes, registry modifications, and process creation events.

Check host artifacts for execution evidence

Examine Trust Records, Prefetch, Event Logs, LNK files, and JumpLists on the victim machine to determine if and when macros actually executed.

Document and correlate findings

Compile all findings into a structured report. Include code excerpts, decoded payloads, execution artifacts, and author attribution evidence. Correlate with email delivery, download events, and network logs.

Distinguishing Legitimate Macros from Malicious Ones

Not every macro is malicious. Organizations use VBA extensively for legitimate business automation. The forensic challenge is distinguishing intentional misuse from routine automation, especially in insider threat investigations where an employee may have abused legitimate tools.

Characteristic	Likely Legitimate	Likely Malicious
Execution trigger	Button click, menu item, explicit call	Workbook_Open, Auto_Open, hidden triggers
Code readability	Clear variable names, comments explaining logic	Obfuscated, mangled names, no comments
Network access	To internal systems, documented purpose	External IPs, encoded URLs, no clear purpose
File system access	Reads/writes to expected locations	Writes to Temp, AppData, drops executables
Process creation	Rare or absent	Spawns PowerShell, cmd.exe, or other processes
Error suppression	Specific error handling with recovery	`On Error Resume Next` blanket suppression
Code signing	Signed with organizational certificate	Unsigned or signed with unknown certificate
Business context	Clear relationship to spreadsheet purpose	Unrelated to spreadsheet content, hidden

Conclusion

Excel macro forensics is one of the most technically demanding disciplines in spreadsheet investigation. The combination of a binary storage format, compiled p-code, multiple obfuscation layers, and runtime-only behavior means that understanding a macro requires a multi-stage approach: static extraction, metadata analysis, code style examination, and controlled dynamic execution.

The most important principle is never to trust the source code alone. The gap between what a macro appears to do in the VBA editor and what it actually executes can be vast. P-code stomping, encoded payloads, and delayed execution all ensure that simple inspection is not enough.

Whether you are investigating a suspected malware delivery, tracing the origins of fraudulent automation, or determining whether a macro has already executed on a compromised machine, the methods in this guide provide a systematic framework for answering those questions with evidence that can withstand scrutiny. The code was written by someone, it does something, and if it ran, it left traces. Macro forensics is the discipline of finding all three.