The most dangerous security bugs are sometimes the ones that look completely harmless in code review.
I was working on a legacy C codebase once, the kind that had grown organically over years, with different developers layering features on top without much cleanup. There was no clean code culture, and technical debt had compounded quietly. You probably know the kind.
To get a sense of what I was dealing with, I added static analysis (SAST) to the CI pipeline. The tool had plenty to complain about, but one category of bug reigned supreme: format string vulnerabilities.
I knew format strings were supposed to be static, but I’d never really known why. A blind spot I had been wanting to close for a while - so I had a look right then and there.
What could an attacker do with a single printf call?
Turns out: quite a lot.
If user input ever becomes a format string, printf turns from a convenience into an exploit primitive.
The Harmless-Looking Bug
Consider this debug logging function. It’s the kind of utility code you’d find in any C project:
static bool to_syslog = false;
void debug_log(const char *user_input) {
char input[256];
sprintf(input, "Debug: %s", user_input);
if (to_syslog) {
syslog(LOG_INFO, input);
} else {
printf(input);
}
}
At first glance, this looks reasonable. It builds a debug message and outputs it to either the console or syslog.
Historically, passing user-controlled data directly to syslog has been a real-world source of exploits.
The observant reader will notice this contains a buffer overflow - if user_input is longer than ~250 bytes, sprintf will write past the end of input.
But there’s another, subtler vulnerability hidden here.
What happens if you call it like this?
debug_log("%08x %08x %08x %08x");
Something very interesting happens. So interesting, in fact, that there are people who would call it exactly like that: attackers trying to exploit a bug you might not even know is there.
The Anatomy of a Format String Vulnerability
To understand what’s happening, you need to know how printf and its family of functions work internally.
When you call printf("Value: %d", x), the function:
- Parses the format string (
"Value: %d") - Looks for format specifiers (the
%dpart) - For each specifier, pops the next argument from registers or stack (depending on the ABI)
- Formats and prints the result
The key insight: printf assumes the format string is trusted.
It doesn’t know - or care - how many arguments you actually passed. If the format string contains %x %x %x, it will happily read three values from the stack, whether you provided them or not.
In our vulnerable function, when user_input becomes the format string in printf(input), the attacker controls what printf will do.
What Can Go Wrong?
1. Information Disclosure (Memory Leak)
An attacker can use %x or %p format specifiers to read stack memory. %x is traditionally used for stack walking, while %p often leaks addresses directly:
debug_log("%08x %08x %08x %08x %08x %08x %08x %08x %08p");
This prints 8 stack values in hexadecimal. With enough %x specifiers, an attacker can:
- Leak the stack canary, (useful if you also have a buffer overflow)
- Leak return addresses (defeating ASLR)
- Leak pointers to sensitive data
- Leak cryptographic keys or passwords stored in memory
2. Denial of Service (Crash)
The %s format specifier tells printf to treat the next stack value as a pointer to a string and dereference it:
debug_log("%s %s %s %s");
If those stack values aren’t valid pointers, the program crashes with a segmentation fault.
3. Arbitrary Memory Write
This is where it gets truly dangerous.
The %n format specifier is unusual: it writes instead of reading. It stores the number of bytes written so far into the address pointed to by the corresponding argument.
Normal usage:
int count;
printf("Hello%n", &count); // count now contains 5
In a format string vulnerability:
debug_log("AAAA%08x%08x%08x%n");
Here, AAAA is placed at the start of the input, which ends up on the stack. By adding the right number of %x specifiers, the attacker can “walk” up the stack so that %n treats those four bytes (AAAA, which is 0x41414141 in hexadecimal) as an address - and writes to it.
By controlling:
- What address gets written to (via the positioning of data on the stack)
- What value gets written (via the number of characters printed before
%n)
An attacker can achieve arbitrary memory write, potentially leading to code execution.
Think of it as a stack overflow, but with surgical precision. Instead of blindly overwriting memory (and possibly triggering a stack canary), you can “aim” at a specific address - like the return address on the stack - and write only that.
Modern Defenses
Format string vulnerabilities were at their peak in the late 1990s and early 2000s. Since then, defenses have evolved:
SAST and Compiler Warnings
SAST tools flag format string issues automatically, often with high confidence.
Modern compilers (GCC, Clang) check format strings at compile time if they’re literals:
printf(user_input); // Warning: format not a string literal
printf("%s", user_input); // Safe, no warning
Enable -Wformat -Wformat-security to catch these issues.
Runtime Protections
Besides the compiler, Linux and libc have evolved too, and now offer additional protections:
- Compile with
-D_FORTIFY_SOURCE=2: this will include runtime checks with modern libc implementations. For example, it may abort at runtime if dangerous format strings (such as%n) are detected in certain contexts. - RELRO (RELocation Read-Only): Makes the Global Offset Table (GOT) read-only after program startup, preventing attackers from overwriting function pointers via format string vulnerabilities.
- ASLR (Address Space Layout Randomization): Randomizes memory addresses (stack, heap, libraries), making it much harder for attackers to predict where to write or jump.
Language Design
Modern languages (Rust, Go, Python) handle string formatting differently, making this entire class of vulnerability impossible by design.
Why They Still Matter
Despite decades of awareness and tooling, format string bugs still appear in codebases where -Wformat-security isn’t enabled, or compiler warnings are ignored.
I’ve seen this predominantly with older toolchains, which are often used for embedded systems. Sometimes those still contain old compilers that don’t warn on format string issues. But I’ve also seen the warnings disabled to reduce “noise” - a red flag in clean code.
With time, the attack surface shrinks, but it never quite reaches zero.
What To Do in Modern Engineering Teams
The fix is straightforward:
Never pass user-controlled data as a format string.
Safe version of our debug function from above might look like this:
void debug_log(const char *user_input) {
char input[256];
// use snprintf to avoid buffer overflow
snprintf(input, sizeof(input), "Debug: %s", user_input);
if (to_syslog) {
syslog(LOG_INFO, "%s", input); // Fixed: format string is literal
} else {
printf("%s", input); // Fixed: format string is literal
}
}
Broader Strategies
- Enable and don’t ignore compiler warnings. Use
-Wformat -Wformat-security. I’ve pushed towards treating warnings as errors. - Compile with
-D_FORTIFY_SOURCE=2(or=3) for runtime checks, see feature_test_macros(7) for reference. - Static analysis in CI. Run SAST on every commit and don’t ignore the output.
- For new components, consider safer languages.
The Lesson
Format string vulnerabilities are a textbook example of implicit trust leading to disaster.
The printf family assumes the format string is benign. When that assumption breaks - when user input becomes the format string - a simple logging function transforms into an information leak, a denial-of-service vector, or a pathway to arbitrary code execution. It is a classic trust-boundary violation: untrusted data flows into an API that assumes trust.
In industry, the takeaway is clear:
When a function expects trusted input, enforce that trust at the boundary.
Don’t let a debug statement become an exploit primitive.
Source Code Example
For those interested in a working demonstration, there is a simplified demonstration program that shows the mechanics. Compile it yourself and see what happens.