Compare commits
4 Commits
8322ddba3b
...
1557b82c1d
| Author | SHA1 | Date | |
|---|---|---|---|
| 1557b82c1d | |||
| 3633346d53 | |||
| 08f21c06fb | |||
| 7186905ad2 |
83
README.md
83
README.md
@@ -1,5 +1,84 @@
|
|||||||
# Load-time patcher
|
# Flicker
|
||||||
|
|
||||||
|
Flicker is a universal load-time binary rewriter for native AMD64 Linux applications. It maps the
|
||||||
|
target executable into memory, performs a linear scan disassembly, and applies patches using a
|
||||||
|
hierarchy of tactics, allowing for instrumentation, debugging, and hook injection.
|
||||||
|
|
||||||
|
This approach allows Flicker to maintain control over the process lifecycle, enabling it to handle
|
||||||
|
Statically linked executables, Dynamically linked executables (via interpreter loading), and System
|
||||||
|
calls (e.g., intercepting `readlink`, `clone`).
|
||||||
|
|
||||||
|
It tries to offer a middle ground that aims for native execution speeds with the flexibility of
|
||||||
|
dynamic instrumentation.
|
||||||
|
|
||||||
|
## Work In Progress
|
||||||
|
|
||||||
|
This project is currently in active development.
|
||||||
|
|
||||||
|
Already supported are Statically linked executables, basic dynamically linked executables (via
|
||||||
|
`PT_INTERP` loading), and basic syscall interception.
|
||||||
|
|
||||||
|
Full `dlopen` support, JIT handling, signal handling, and a plugin system are pending.
|
||||||
|
|
||||||
|
## Build
|
||||||
|
|
||||||
|
Flicker uses the Zig build system. Ensure you have Zig 0.15.1 installed.
|
||||||
|
|
||||||
|
To build the release binary:
|
||||||
|
```bash
|
||||||
|
zig build -Doptimize=ReleaseSafe
|
||||||
|
```
|
||||||
|
|
||||||
|
To run the test suite (includes various static/dynamic executables):
|
||||||
|
```bash
|
||||||
|
zig build test
|
||||||
|
```
|
||||||
|
|
||||||
|
The compiled binary will be located at `zig-out/bin/flicker`.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
Flicker acts as a loader wrapper. Pass the target executable and its arguments directly to Flicker.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./flicker <executable> [args...]
|
||||||
|
# Example: Running 'ls' through Flicker
|
||||||
|
./zig-out/bin/flicker ls -la
|
||||||
|
```
|
||||||
|
|
||||||
|
## How it Works
|
||||||
|
|
||||||
|
For more information see the [Project Overview](docs/project_overview.md) and the [Use
|
||||||
|
Cases](docs/use_cases.md).
|
||||||
|
|
||||||
|
### The Loader
|
||||||
|
|
||||||
|
Flicker does not use `LD_PRELOAD`. Instead, it maps the target ELF binary into memory. If the binary
|
||||||
|
is dynamically linked, Flicker parses the `PT_INTERP` header, locates the dynamic linker (mostly
|
||||||
|
`ld-linux.so`), and maps that as well. It then rewrites the Auxiliary Vector (`AT_PHDR`, `AT_ENTRY`,
|
||||||
|
`AT_BASE`) on the stack to trick the C runtime into accepting the manually loaded environment.
|
||||||
|
|
||||||
|
### Patching Engine
|
||||||
|
|
||||||
|
Before transferring control to the entry point, Flicker scans executable segments for instructions
|
||||||
|
that require instrumentation. It allocates "Trampolines" - executable memory pages located within
|
||||||
|
±2GB of the target instruction.
|
||||||
|
|
||||||
|
To overwrite an instruction with a 5-byte jump (`jmp rel32`) without corrupting adjacent code or
|
||||||
|
breaking jump targets, Flicker uses a Back-to-Front scanning approach and a constraint solver to
|
||||||
|
find valid bytes for "instruction punning."
|
||||||
|
|
||||||
|
### Syscall Interception
|
||||||
|
|
||||||
|
Flicker can replace `syscall` opcodes with jumps to a custom handler. This handler emulates the
|
||||||
|
syscall logic or modifies arguments.
|
||||||
|
|
||||||
|
Special handling detects `clone` syscalls to ensure the child thread (which wakes up with a fresh
|
||||||
|
stack) does not crash when attempting to restore the parent's register state.
|
||||||
|
|
||||||
|
Path Spoofing: Intercepts readlink on `/proc/self/exe` to return the path of the target binary
|
||||||
|
rather than the Flicker loader.
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
Apache 2.0
|
Apache License 2.0
|
||||||
|
|||||||
47
docs/TODO.md
Normal file
47
docs/TODO.md
Normal file
@@ -0,0 +1,47 @@
|
|||||||
|
## General things
|
||||||
|
|
||||||
|
### Thread-locals
|
||||||
|
|
||||||
|
Right now we don't use any thread-local stuff in zig. This means that the application can freely
|
||||||
|
decide what to do with the `fs` segment. If we need some thread-locals in the future we have to
|
||||||
|
carefully think about how to do it.
|
||||||
|
|
||||||
|
If `FSGSBASE` is available we can swap out the segment real fast. If not we would need to fallback
|
||||||
|
to `arch_prctl` which is of course a lot slower. Fortunately `FSGSBASE` is available since Intel
|
||||||
|
IvyBridge(2012) and AMD Zen 2 Family 17H(2019) and Linux 5.9(2020).
|
||||||
|
|
||||||
|
## Major things
|
||||||
|
|
||||||
|
- [x] `clone`: with and without stack switching
|
||||||
|
- [x] `clone3`: with and without stack switching
|
||||||
|
- [x] `fork`: likely there is nothing to be done here but just to be sure, check again
|
||||||
|
- [x] `rt_sigreturn`: we can't use the normal `syscall` interception because we push something onto
|
||||||
|
the stack, so `ucontext` isn't on top anymore.
|
||||||
|
- [x] `/proc/self/exe`: intercept calls to `readlink`/`readlinkat` with that as argument
|
||||||
|
- [ ] `auxv`: check if that is setup correctly and completely
|
||||||
|
- [ ] JIT support: intercept `mmap`, `mprotect` and `mremap` that change pages to be executable
|
||||||
|
- [ ] `SIGILL` patching fallback
|
||||||
|
- [ ] `vdso` handling
|
||||||
|
|
||||||
|
## Minor things
|
||||||
|
|
||||||
|
- [ ] Cleanup: When a JIT engine frees code, our trampolines are "zombies", so over time we leak
|
||||||
|
memory and also reduce the patching percentage
|
||||||
|
- [ ] Ghost page edge case: In all patch strategies, if a range spans multiple pages and we `mmap`
|
||||||
|
the first one but can't `mmap` the second one we just let the first one mapped. It would be better
|
||||||
|
to unmap them
|
||||||
|
- [ ] Re-entrancy for `patchRegion`
|
||||||
|
- when a signal comes, while we are in that function, and we need to patch something due to the
|
||||||
|
signal we will deadlock
|
||||||
|
- [ ] strict disassembly mode: currently we warn on disassembly error, provide a flag to stop instead
|
||||||
|
- [ ] Separate stack for flicker
|
||||||
|
- when the application is run with a small stack (`sigaltstack`, goroutines) we might overflow
|
||||||
|
especially for the `patchRegion` call
|
||||||
|
- either one global stack for all to use(with a mutex) or a thread-local stack (though using
|
||||||
|
`fs` has other problems)
|
||||||
|
- [ ] `exec`: option to persist across `exec` calls, useful for things like `make`
|
||||||
|
- [ ] `prctl`/`arch_prctl`: check if/what we need to intercept and change
|
||||||
|
- [ ] `seccomp`: check what we need to intercept and change
|
||||||
|
- [ ] `modify_ldt`: check what we need to intercept and change
|
||||||
|
- [ ] `set_tid_address`: check what we need to intercept and change
|
||||||
|
- [ ] performance optimizations for patched code? Peephole might be possible
|
||||||
115
docs/project_overview.md
Normal file
115
docs/project_overview.md
Normal file
@@ -0,0 +1,115 @@
|
|||||||
|
# Project Flicker: Universal Load-Time Binary Rewriting
|
||||||
|
|
||||||
|
Flicker is a binary rewriting infrastructure designed for native amd64 Linux applications. Its
|
||||||
|
primary objective is to enable universal instrumentation-the ability to patch any instruction-with
|
||||||
|
minimal performance overhead.
|
||||||
|
|
||||||
|
Current approaches to binary rewriting force a difficult trade-off between coverage, performance,
|
||||||
|
and complexity. Flicker addresses this by operating at load-time, combining the transparency of
|
||||||
|
load-time injection with control-flow agnostic patching techniques. This architecture supports
|
||||||
|
statically linked executables, dynamically linked libraries, and Just-In-Time (JIT) compiled code
|
||||||
|
within a single unified framework.
|
||||||
|
|
||||||
|
## The Landscape of Binary Rewriting
|
||||||
|
|
||||||
|
To understand Flicker's position, it is helpful to look at the two dominant approaches: dynamic and
|
||||||
|
static rewriting.
|
||||||
|
|
||||||
|
Dynamic Binary Translation (DBT) tools, such as DynamoRIO or Pin, execute programs inside a virtual
|
||||||
|
machine-like environment. They act as interpreters that disassemble and translate code blocks on the
|
||||||
|
fly. This allows them to handle JIT code and shared libraries natively because they see the
|
||||||
|
instruction stream as it executes. However, this flexibility incurs significant overhead, often
|
||||||
|
slowing execution by 20% to 50% because the engine must constantly disassemble and translate code.
|
||||||
|
|
||||||
|
Static Binary Rewriting involves modifying the binary on disk before execution. While potentially
|
||||||
|
fast, this approach faces the theoretically undecidable problem of disassembly. Identifying all jump
|
||||||
|
targets in a stripped binary is reducible to the halting problem. If an instruction is moved to
|
||||||
|
insert a patch, existing jump targets break. Static tools often lift code to an Intermediate
|
||||||
|
Representation (IR) to manage this, but this adds complexity and brittleness.
|
||||||
|
|
||||||
|
## The Flicker Architecture: Load-Time Rewriting
|
||||||
|
|
||||||
|
Flicker pursues a third path: load-time binary rewriting. This occurs after the executable is mapped
|
||||||
|
into memory but before the entry point is executed. By implementing a custom user-space loader, the
|
||||||
|
system gains total control over the process lifecycle without incurring the runtime overhead of a
|
||||||
|
DBT engine.
|
||||||
|
|
||||||
|
The key advantage of this approach is the ability to use `mmap` to allocate trampoline pages
|
||||||
|
directly near the target code. This removes the need to hijack binary sections to embed loader and
|
||||||
|
trampoline information, which is a common limitation of static rewriting tools.
|
||||||
|
|
||||||
|
### The Patching Mechanism
|
||||||
|
|
||||||
|
To solve the static rewriting issue of shifting addresses, Flicker adopts the methodology used by
|
||||||
|
E9Patch. The core invariant is that the size of the code section never changes, and instructions are
|
||||||
|
never moved unless evicted to a trampoline. This makes the patching process control-flow agnostic;
|
||||||
|
valid jump targets remain valid because addresses do not shift.
|
||||||
|
|
||||||
|
Flicker applies patches using a hierarchy of tactics ordered by invasiveness. Ideally, if an
|
||||||
|
instruction is five bytes or larger, it is replaced with a standard 32-bit relative jump to a
|
||||||
|
trampoline. If the instruction is smaller than five bytes, the system attempts "Instruction
|
||||||
|
Punning," where it finds a jump offset that overlaps with the bytes of the following instructions to
|
||||||
|
form a valid target. If punning fails, the system tries using instruction prefixes to shift the jump
|
||||||
|
bytes (Padded Jumps).
|
||||||
|
|
||||||
|
When these non-destructive methods fail, Flicker employs eviction strategies. "Successor Eviction"
|
||||||
|
moves the following instruction to a trampoline to create space for the patch. If that is
|
||||||
|
insufficient, "Neighbor Eviction" searches for a neighboring instruction up to 128 bytes away,
|
||||||
|
evicting it to create a hole that can stage a short jump to the trampoline. As a final fallback to
|
||||||
|
guarantee 100% coverage, the system can insert an invalid instruction to trap execution, though this
|
||||||
|
comes at a performance cost.
|
||||||
|
|
||||||
|
### Universal Coverage via Induction
|
||||||
|
|
||||||
|
Flicker treats code discovery as an inductive problem, ensuring support for static executables,
|
||||||
|
dynamic libraries, and JIT code.
|
||||||
|
|
||||||
|
The base case is a statically linked executable. Flicker acts as the OS loader: it reads ELF
|
||||||
|
headers, maps segments, performs a linear scan of the executable sections, and applies patches
|
||||||
|
before jumping to the entry point. This relies on the assumption that modern compilers produce
|
||||||
|
tessellated code with no gaps.
|
||||||
|
|
||||||
|
The inductive step covers JIT code and dynamic libraries. on Linux, generating executable code
|
||||||
|
mostly follows a pattern: memory is mapped, code is written, and then `mprotect` is called to make
|
||||||
|
it executable. Flicker intercepts all `mprotect` and `mmap` calls. When a page transitions to
|
||||||
|
executable status, the system scans the buffer and applies patches before the kernel finalizes the
|
||||||
|
permissions.
|
||||||
|
|
||||||
|
This logic extends recursively to dynamic libraries. Because the dynamic loader (`ld.so`) uses
|
||||||
|
`mmap` and `mprotect` to load libraries (such as libc or libGL), intercepting the loader's system
|
||||||
|
calls allows Flicker to automatically patch every library loaded, including those loaded manually
|
||||||
|
via `dlopen`.
|
||||||
|
|
||||||
|
## System Integration and Edge Cases
|
||||||
|
|
||||||
|
Binary rewriting at this level encounters specific OS behaviors that require precise handling to
|
||||||
|
avoid crashes.
|
||||||
|
|
||||||
|
### Thread Creation and Stack Switching
|
||||||
|
|
||||||
|
The `clone` syscall, creates a thread with a fresh stack. If a patch intercepts `clone`, the
|
||||||
|
trampoline runs on the parent's stack. When `clone` returns, the child thread wakes up inside the
|
||||||
|
trampoline at the instruction following the syscall. The child then attempts to run the trampoline
|
||||||
|
epilogue to restore registers, but it does so using its new, empty stack, reading garbage data and
|
||||||
|
crashing.
|
||||||
|
|
||||||
|
To resolve this, the trampoline checks the return value. If it is the parent, execution proceeds
|
||||||
|
normally. If it is the child, the trampoline immediately jumps back to the original code, skipping
|
||||||
|
stack restoration.
|
||||||
|
|
||||||
|
### Signal Handling
|
||||||
|
|
||||||
|
When a signal handler returns, it calls `rt_sigreturn`, telling the kernel to restore the CPU state
|
||||||
|
from a `ucontext` struct saved on the stack. If a trampoline modifies the stack pointer to save
|
||||||
|
context, `rt_sigreturn` is called while the stack pointer is modified. The kernel then looks for
|
||||||
|
`ucontext` at the wrong address, corrupting the process state. Flicker handles this by detecting
|
||||||
|
`rt_sigreturn` and restoring the stack pointer to its exact pre-trampoline value before executing
|
||||||
|
the syscall.
|
||||||
|
|
||||||
|
### The vDSO and Concurrency
|
||||||
|
|
||||||
|
The virtual Dynamic Shared Object (vDSO) allows fast syscalls in user space. Flicker locates the
|
||||||
|
vDSO via the `AT_SYSINFO` auxiliary vector and patches it like any other shared library. Regarding
|
||||||
|
concurrency, a race condition exists where one thread executes JIT code while another modifies it.
|
||||||
|
Flicker mitigates this by intercepting the `mprotect` call while the page is still writable but not
|
||||||
|
yet executable, patching the code safely before the kernel atomically updates the permissions.
|
||||||
13
src/main.zig
13
src/main.zig
@@ -376,6 +376,19 @@ test "nolibc_pie_fork" {
|
|||||||
// );
|
// );
|
||||||
// }
|
// }
|
||||||
|
|
||||||
|
test "nolibc_nopie_signal_handler" {
|
||||||
|
try testHelper(
|
||||||
|
&.{ flicker_path, getTestExePath("nolibc_nopie_signal_handler") },
|
||||||
|
"In signal handler\nSignal handled successfully\n",
|
||||||
|
);
|
||||||
|
}
|
||||||
|
test "nolibc_pie_signal_handler" {
|
||||||
|
try testHelper(
|
||||||
|
&.{ flicker_path, getTestExePath("nolibc_pie_signal_handler") },
|
||||||
|
"In signal handler\nSignal handled successfully\n",
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
fn testPrintArgs(comptime name: []const u8) !void {
|
fn testPrintArgs(comptime name: []const u8) !void {
|
||||||
const exe_path = getTestExePath(name);
|
const exe_path = getTestExePath(name);
|
||||||
const loader_argv: []const []const u8 = &.{ flicker_path, exe_path, "foo", "bar", "baz hi" };
|
const loader_argv: []const []const u8 = &.{ flicker_path, exe_path, "foo", "bar", "baz hi" };
|
||||||
|
|||||||
@@ -64,7 +64,23 @@ export fn syscall_handler(ctx: *SavedContext) callconv(.c) void {
|
|||||||
return;
|
return;
|
||||||
},
|
},
|
||||||
.rt_sigreturn => {
|
.rt_sigreturn => {
|
||||||
@panic("sigreturn is not supported yet");
|
// The kernel expects the stack pointer to point to the `ucontext` structure. But in our
|
||||||
|
// case `syscallEntry` pushed the `SavedContext` onto the stack.
|
||||||
|
// So we just need to reset the stack pointer to what it was before `syscallEntry` was
|
||||||
|
// called. The `SavedContext` includes the return address pushed by the trampoline, so
|
||||||
|
// the original stack pointer is exactly at the end of `SavedContext`.
|
||||||
|
const rsp_orig = @intFromPtr(ctx) + @sizeOf(SavedContext);
|
||||||
|
|
||||||
|
asm volatile (
|
||||||
|
\\ mov %[rsp], %%rsp
|
||||||
|
\\ syscall
|
||||||
|
\\ ud2
|
||||||
|
:
|
||||||
|
: [rsp] "r" (rsp_orig),
|
||||||
|
[number] "{rax}" (ctx.rax),
|
||||||
|
: .{ .memory = true }
|
||||||
|
);
|
||||||
|
unreachable;
|
||||||
},
|
},
|
||||||
.execve, .execveat => |s| {
|
.execve, .execveat => |s| {
|
||||||
// TODO: option to persist across new processes
|
// TODO: option to persist across new processes
|
||||||
|
|||||||
35
src/test/signal_handler.zig
Normal file
35
src/test/signal_handler.zig
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
const std = @import("std");
|
||||||
|
const linux = std.os.linux;
|
||||||
|
|
||||||
|
var handled = false;
|
||||||
|
|
||||||
|
fn handler(sig: i32, _: *const linux.siginfo_t, _: ?*anyopaque) callconv(.c) void {
|
||||||
|
if (sig == linux.SIG.USR1) {
|
||||||
|
handled = true;
|
||||||
|
const msg = "In signal handler\n";
|
||||||
|
_ = linux.syscall3(.write, 1, @intFromPtr(msg.ptr), msg.len);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn main() !void {
|
||||||
|
const act = linux.Sigaction{
|
||||||
|
.handler = .{ .sigaction = handler },
|
||||||
|
.mask = std.mem.zeroes(linux.sigset_t),
|
||||||
|
.flags = linux.SA.SIGINFO | linux.SA.RESTART,
|
||||||
|
};
|
||||||
|
|
||||||
|
if (linux.sigaction(linux.SIG.USR1, &act, null) != 0) {
|
||||||
|
return error.SigactionFailed;
|
||||||
|
}
|
||||||
|
|
||||||
|
_ = linux.kill(linux.getpid(), linux.SIG.USR1);
|
||||||
|
|
||||||
|
if (handled) {
|
||||||
|
const msg = "Signal handled successfully\n";
|
||||||
|
_ = linux.syscall3(.write, 1, @intFromPtr(msg.ptr), msg.len);
|
||||||
|
} else {
|
||||||
|
const msg = "Signal NOT handled\n";
|
||||||
|
_ = linux.syscall3(.write, 1, @intFromPtr(msg.ptr), msg.len);
|
||||||
|
std.process.exit(1);
|
||||||
|
}
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user