Excel macros and VBA code are among the most revealing forensic artifacts in a spreadsheet investigation. Whether you are examining a suspected malware delivery vehicle, investigating unauthorized automation, or tracing the origins of a fraudulent financial model, the macro layer of an Excel file can tell you who wrote the code, what it was designed to do, and whether it has been deliberately hidden or obfuscated.
Excel macros, written in Visual Basic for Applications (VBA), occupy a unique position in digital forensics. Unlike passive metadata—author names, timestamps, or document properties—VBA code is active. It executes. It can download files, exfiltrate data, modify system settings, and communicate with remote servers. This makes macro forensics a discipline that sits at the intersection of spreadsheet analysis and malware reverse engineering.
At the same time, VBA macros are routinely used for legitimate business automation: generating reports, transforming data, interfacing with databases and APIs. The forensic challenge is distinguishing legitimate automation from malicious behavior, and understanding precisely what the code does—and what it was designed to hide.
Before analyzing macros forensically, you need to understand how VBA code is physically stored inside an Excel file. The storage mechanism differs depending on the file format, and this has direct implications for what you can extract and how.
Macro-enabled Excel files (.xlsm, .xlam) are ZIP archives with a binary VBA storage embedded inside xl/vbaProject.bin. This binary file is a Compound Document File (OLE2 format)—the same format used by legacy .xls files. The VBA source code is compressed and stored inside this binary container.
# Extract the ZIP contents of an XLSM file
mkdir xlsm_contents
unzip suspicious.xlsm -d xlsm_contents/
# The VBA project is a binary OLE2 file
ls -la xlsm_contents/xl/vbaProject.bin
# Identify the file type
file xlsm_contents/xl/vbaProject.bin
# Output: Composite Document File V2 Document...
Key Files Inside vbaProject.bin
VBA/ThisWorkbook — Workbook-level codeVBA/Sheet1, VBA/Sheet2 — Sheet codeVBA/Module1 — Standard modulesVBA/UserForm1 — Form code_VBA_PROJECT — Compiled p-codePROJECT — Module metadataPROJECTwm — Unicode name mapLegacy XLS Format
Legacy .xls files store VBA directly in the binary BIFF format. The entire file is an OLE2 compound document. Tools like oledump.py andolevba can parse both formats. The XLS format is more common in older malware campaigns because macro security warnings differ from XLSM.
VBA stores both the human-readable source code and a compiled intermediate representation called p-code. This distinction has profound forensic implications.
Source Code
P-Code (Compiled)
Forensic alert: A sophisticated attacker can modify the source code after compilation so that what you see in the VBA editor does not match what actually executes. If the p-code and source code disagree, treat the file as highly suspicious. This technique, sometimes called "p-code stomping," is designed specifically to evade static analysis.
Several purpose-built tools can extract and analyze VBA code from Excel files without opening them in Excel itself. Opening suspicious files directly is risky because macros can execute on open. Always use a controlled environment or purpose-built analysis tools.
The oletools suite by Philippe Lagadec is the de facto standard for VBA extraction and analysis. The olevba tool extracts source code from all macro-enabled Office formats, detects suspicious keywords, and automatically flags obfuscation techniques.
# Install oletools
pip install oletools
# Extract all VBA code from a file
olevba suspicious.xlsm
# Detailed analysis with IOC extraction
olevba --reveal --decode suspicious.xlsm
# Output just the raw source code
olevba --code suspicious.xlsm > extracted_vba.txt
# Analyze multiple files at once
olevba --csv *.xlsm > batch_analysis.csv
# Detect autorun macros specifically
olevba --json suspicious.xlsm | python3 -m json.tool
What olevba Automatically Flags
AutoOpen, Workbook_Open, Document_Open, AutoCloseShell, CreateObject, WScript.Shell, PowerShellXMLHTTP, WinHttp, URLDownloadToFileChr() concatenation, StrReverse, base64 stringsOpen, FileSystemObject, Environoledump.py by Didier Stevens provides lower-level access to the OLE2 structure of the vbaProject.bin file, listing all streams and allowing individual stream extraction. This is particularly valuable when source code has been stripped or when you need to examine the raw compiled p-code.
# List all OLE streams in the VBA project
python3 oledump.py xl/vbaProject.bin
# Output example:
1: 4096 '\x01CompObj'
2: 512 '\x05DocumentSummaryInformation'
3: 512 '\x05SummaryInformation'
4: 4096 'VBA/ThisWorkbook' M
5: 8192 'VBA/Module1' M
6: 2048 'VBA/_VBA_PROJECT'
# Extract and decompress a specific module (stream 5)
python3 oledump.py -s 5 -v xl/vbaProject.bin
# Dump raw bytes of a stream for hex analysis
python3 oledump.py -s 5 -d xl/vbaProject.bin | xxd | head -50
Streams marked with M in oledump output contain VBA macros. The "M" flag means the stream has been identified as containing compressed VBA source code. Streams without this flag may still contain relevant forensic data, such as compiled p-code or OLE metadata.
When you suspect that source code has been manipulated to hide the true behavior (p-code stomping), pcodedmp can disassemble the compiled p-code directly, bypassing the source code layer entirely.
# Install pcodedmp
pip install pcodedmp
# Disassemble p-code from the VBA project
python3 pcodedmp.py xl/vbaProject.bin
# Compare with olevba output to detect p-code stomping
olevba suspicious.xlsm > source_code.txt
python3 pcodedmp.py xl/vbaProject.bin > p_code.txt
diff source_code.txt p_code.txt
Investigative note: If pcodedmp output shows function calls or API references that do not appear in the source code extracted by olevba, you have found evidence of p-code stomping. This is a strong indicator of deliberate obfuscation and a file that deserves intensive dynamic analysis in a sandboxed environment.
Malicious macros tend to follow recognizable patterns. Understanding these patterns allows investigators to quickly assess the threat level of a macro file and prioritize which elements to analyze in depth.
The first thing any forensic analyst checks is whether macros are designed to run automatically—without the user explicitly clicking a button. Auto-execution is the hallmark of malicious macros, because legitimate automation generally requires deliberate user action.
| Trigger Name | Location | Fires When |
|---|---|---|
| Workbook_Open | ThisWorkbook | File opens (most common) |
| Auto_Open | Any module | File opens (legacy) |
| Workbook_Activate | ThisWorkbook | Workbook gets focus |
| Workbook_BeforeClose | ThisWorkbook | File closes (persistence) |
| Worksheet_Activate | Sheet module | Sheet is selected |
| Worksheet_Change | Sheet module | Any cell is changed |
| Application.OnTime | Any | Scheduled delay (evasion) |
# Grep for autorun triggers in extracted VBA
grep -iE "Workbook_Open|Auto_Open|Workbook_Activate|
Worksheet_Activate|Worksheet_Change|Application\.OnTime" \
extracted_vba.txt
Malicious VBA code is almost always obfuscated to evade static analysis and make forensic examination harder. Recognizing these techniques is essential for understanding what the code actually does.
Chr() String Concatenation
Replaces string literals with sequences of Chr() function calls that concatenate ASCII codes. The string "PowerShell" becomes unrecognizable to simple keyword scanners.
// Obfuscated:
cmd = Chr(80) & Chr(111) & Chr(119) & Chr(101) & Chr(114) & Chr(83) & Chr(104) & Chr(101) & Chr(108) & Chr(108)
// Decoded: "PowerShell"
String Reversal
The payload string is stored in reverse order and reversed at runtime usingStrReverse(). Simple but effective against pattern-matching.
// Obfuscated:
cmd = StrReverse("llehSrewoP")
// Decoded: "PowerShell"
Base64 Encoded Payloads
The actual command or payload is stored as a base64-encoded string and decoded at runtime, often being passed directly to PowerShell or executed via Shell().
// Suspicious pattern:
Dim b64 As String
b64 = "cG93ZXJzaGVsbCAtZW5jb2RlZENvbW1hbmQ..."
Shell "powershell -EncodedCommand " & b64
Variable Name Mangling
Variables and function names are replaced with meaningless strings (aA1bB2cC3, x_1_y_2) to make the code logic impossible to follow by reading alone. Combined with Chr() obfuscation, this can make even a short macro extremely difficult to analyze statically.
Certain API calls and code patterns are almost exclusively associated with malicious activity. Their presence warrants immediate escalation and sandbox analysis.
Network and Download
Process Execution
File System
Persistence and Registry
Beyond identifying what a macro does, forensic analysis can often identify who wrote it. VBA code carries numerous stylistic and technical fingerprints that can link a file to a specific developer, organization, or threat actor group.
Every developer has unconscious habits. Variable naming conventions, indentation style, comment language, error handling approaches, and the choice of API calls over built-in functions are all characteristics that create a code "fingerprint."
Style Fingerprints to Examine
On Error Resume Next vs structured)Option Explicit or lack thereofLinguistic Markers
Comments and string literals in the code can reveal the author's native language. Look for:
The VBA project itself contains metadata that identifies the author and development environment. This data is separate from the document properties and is not removed by Excel's built-in Document Inspector.
# Extract PROJECT stream from vbaProject.bin
python3 oledump.py -s PROJECT xl/vbaProject.bin
# The PROJECT stream contains:
# - Module names and types
# - CodePage (reveals locale/language)
# - HelpFile path (may reveal developer file system)
# - LastSaved timestamp (may differ from document properties)
# - Library references
# Extract using strings utility for quick review
strings xl/vbaProject.bin | grep -iE "path|user|module|help"
HelpFile Path Artifact
If a developer associated a help file with the project, the path is stored in the PROJECT stream. Paths like C:\Users\JohnDoe\Documents\...directly reveal the developer's username and machine structure.
CodePage Analysis
The CodePage value identifies the character encoding used when writing the VBA code. Common forensic values: 1252 (Western Europe), 1251 (Cyrillic), 936 (Simplified Chinese), 932 (Japanese). This can narrow the developer's likely geographic location.
Malicious actors frequently reuse code from open-source repositories, paste it from online forums, or copy it between campaigns. Identifying code reuse can link a file to known threat actors, previous incidents, or public exploit code repositories.
Code Reuse Investigation Steps
# Extract unique strings for TI platform searching
strings xl/vbaProject.bin | sort | uniq | \
grep -v "^.$" | grep -v "^..$" > unique_strings.txt
# Hash the extracted VBA code
sha256sum extracted_vba.txt
md5sum extracted_vba.txt
# Submit binary to threat intelligence (via API)
# e.g., VirusTotal, Malware Bazaar, Hybrid Analysis
Static analysis tells you what the code says. Dynamic analysis tells you what it actually does when it runs. For obfuscated or complex macros, dynamic analysis in a controlled sandbox environment is often the only way to fully understand the behavior.
Never open a suspected malicious Excel file on a production machine or any machine connected to your organization's network. Even with macro security settings, malicious files can exploit vulnerabilities in Excel itself.
Use an isolated virtual machine with no network access, no shared folders with the host, and no persistent state. Snapshot the VM before execution and revert after each test.
Automated Sandboxes
What to Monitor
For more controlled dynamic analysis, running the file in an isolated VM with Sysinternals Process Monitor (ProcMon) and Wireshark provides granular visibility into every action the macro takes.
# Before opening the file in your isolated VM:
# 1. Start Process Monitor (filter for excel.exe and child processes)
Procmon.exe /Minimized /Quiet /BackingFile C:\evidence\procmon.pml
# 2. Start Wireshark capture on the network interface
dumpcap -i 1 -w C:\evidence\capture.pcap
# 3. Open the Excel file with macros enabled
# 4. Wait for activity to complete, then stop captures
# 5. Revert VM to pre-execution snapshot
Process Monitor captures file system operations, registry operations, network activity, and process events in real time. Filter the output for the Excel process and any child processes it spawns to build a complete picture of the macro's behavior.
In many investigations, the question is not just whether a macro is malicious, but whether it has already executed—and when. Several artifact sources can answer this question even after the fact.
Prefetch Files
Windows Prefetch records execution of every program, including processes spawned by macros. If a macro ran powershell.exe, there will be a prefetch file for PowerShell with a timestamp corresponding to when the macro executed. Location: C:\Windows\Prefetch\
Windows Event Logs
Event ID 4688 (Process Creation, if auditing is enabled) records every new process, including command line arguments. PowerShell execution logging (Event IDs 4103, 4104) captures the actual commands run by PowerShell, including those launched from macros.
LNK Files and JumpLists
When Excel opens a file, Windows creates a shortcut (LNK) file recording the file path, size, MAC timestamps, and the machine that opened it. JumpLists record the same information per application. Both sources can confirm that a file was opened even if it has since been deleted.
Office Trust Record
When a user clicks "Enable Content" to allow macros, Excel records this decision in the registry. The path is: HKCU\Software\Microsoft\Office\<version>\Excel\Security\Trusted Documents\TrustRecords. Each entry includes the file path and the timestamp when trust was granted, directly confirming when macros were enabled.
# Check Office Trust Records in registry
reg query "HKCU\Software\Microsoft\Office\16.0\Excel\
Security\Trusted Documents\TrustRecords"
# Check Recent Files list
reg query "HKCU\Software\Microsoft\Office\16.0\Excel\
File MRU"
# LNK files for Excel documents
dir C:\Users\%USERNAME%\AppData\Roaming\Microsoft\Windows\
Recent\*.lnk
# Prefetch for processes macro may have spawned
dir C:\Windows\Prefetch\POWERSHELL*.pf
dir C:\Windows\Prefetch\CMD*.pf
Cross-reference the Trust Records timestamp with the file's modification time and any network or process activity from that time window. This triangulates the moment of macro execution.
Preserve the original file
Hash the original file (MD5, SHA-256) immediately. Never work on the original. All analysis should be on copies.
Extract the VBA project
Unzip the XLSM to access xl/vbaProject.bin. Use oledump to list all streams. Identify all module streams marked with "M".
Run olevba for automated analysis
Extract all source code and review the automated flags for autorun triggers, suspicious keywords, obfuscation techniques, and network indicators.
Check for p-code stomping
Run pcodedmp and compare the p-code output with the extracted source. Any discrepancy indicates source manipulation and requires dynamic analysis.
Analyze VBA project metadata
Extract the PROJECT stream. Check CodePage (language indicator), HelpFile paths (developer file system), and library references.
Decode obfuscated strings
Manually or programmatically decode Chr() concatenation, base64 strings, and StrReverse patterns to recover the actual payload commands and URLs.
Search threat intelligence databases
Submit the file hash and unique strings to VirusTotal, Malware Bazaar, and threat intelligence platforms. Look for code reuse linking to known campaigns.
Perform sandbox analysis
Run the file in an isolated sandbox environment. Capture network activity, file system changes, registry modifications, and process creation events.
Check host artifacts for execution evidence
Examine Trust Records, Prefetch, Event Logs, LNK files, and JumpLists on the victim machine to determine if and when macros actually executed.
Document and correlate findings
Compile all findings into a structured report. Include code excerpts, decoded payloads, execution artifacts, and author attribution evidence. Correlate with email delivery, download events, and network logs.
Not every macro is malicious. Organizations use VBA extensively for legitimate business automation. The forensic challenge is distinguishing intentional misuse from routine automation, especially in insider threat investigations where an employee may have abused legitimate tools.
| Characteristic | Likely Legitimate | Likely Malicious |
|---|---|---|
| Execution trigger | Button click, menu item, explicit call | Workbook_Open, Auto_Open, hidden triggers |
| Code readability | Clear variable names, comments explaining logic | Obfuscated, mangled names, no comments |
| Network access | To internal systems, documented purpose | External IPs, encoded URLs, no clear purpose |
| File system access | Reads/writes to expected locations | Writes to Temp, AppData, drops executables |
| Process creation | Rare or absent | Spawns PowerShell, cmd.exe, or other processes |
| Error suppression | Specific error handling with recovery | On Error Resume Next blanket suppression |
| Code signing | Signed with organizational certificate | Unsigned or signed with unknown certificate |
| Business context | Clear relationship to spreadsheet purpose | Unrelated to spreadsheet content, hidden |
Excel macro forensics is one of the most technically demanding disciplines in spreadsheet investigation. The combination of a binary storage format, compiled p-code, multiple obfuscation layers, and runtime-only behavior means that understanding a macro requires a multi-stage approach: static extraction, metadata analysis, code style examination, and controlled dynamic execution.
The most important principle is never to trust the source code alone. The gap between what a macro appears to do in the VBA editor and what it actually executes can be vast. P-code stomping, encoded payloads, and delayed execution all ensure that simple inspection is not enough.
Whether you are investigating a suspected malware delivery, tracing the origins of fraudulent automation, or determining whether a macro has already executed on a compromised machine, the methods in this guide provide a systematic framework for answering those questions with evidence that can withstand scrutiny. The code was written by someone, it does something, and if it ran, it left traces. Macro forensics is the discipline of finding all three.
Use our metadata analyzer to inspect Excel files for macro indicators, suspicious properties, and hidden content before you open them