Compare commits
17 Commits
403301a06e
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| d25cf59380 | |||
| d52cf8aaaf | |||
| eea0e6204d | |||
| 403fd6031b | |||
| de10ce58e2 | |||
| 3d7532c906 | |||
| 5d146140b9 | |||
| 7161b6d1a2 | |||
| 7eb5601eb6 | |||
| 1557b82c1d | |||
| 3633346d53 | |||
| 08f21c06fb | |||
| 7186905ad2 | |||
| 8322ddba3b | |||
| 85a07116af | |||
| 0a282259e3 | |||
| 33ce01d56d |
83
README.md
83
README.md
@@ -1,5 +1,84 @@
|
||||
# Load-time patcher
|
||||
# Flicker
|
||||
|
||||
Flicker is a universal load-time binary rewriter for native AMD64 Linux applications. It maps the
|
||||
target executable into memory, performs a linear scan disassembly, and applies patches using a
|
||||
hierarchy of tactics, allowing for instrumentation, debugging, and hook injection.
|
||||
|
||||
This approach allows Flicker to maintain control over the process lifecycle, enabling it to handle
|
||||
Statically linked executables, Dynamically linked executables (via interpreter loading), and System
|
||||
calls (e.g., intercepting `readlink`, `clone`).
|
||||
|
||||
It tries to offer a middle ground that aims for native execution speeds with the flexibility of
|
||||
dynamic instrumentation.
|
||||
|
||||
## Work In Progress
|
||||
|
||||
This project is currently in active development.
|
||||
|
||||
Already supported are Statically linked executables, basic dynamically linked executables (via
|
||||
`PT_INTERP` loading), and basic syscall interception.
|
||||
|
||||
Full `dlopen` support, JIT handling, signal handling, and a plugin system are pending.
|
||||
|
||||
## Build
|
||||
|
||||
Flicker uses the Zig build system. Ensure you have Zig 0.15.1 installed.
|
||||
|
||||
To build the release binary:
|
||||
```bash
|
||||
zig build -Doptimize=ReleaseSafe
|
||||
```
|
||||
|
||||
To run the test suite (includes various static/dynamic executables):
|
||||
```bash
|
||||
zig build test
|
||||
```
|
||||
|
||||
The compiled binary will be located at `zig-out/bin/flicker`.
|
||||
|
||||
## Usage
|
||||
|
||||
Flicker acts as a loader wrapper. Pass the target executable and its arguments directly to Flicker.
|
||||
|
||||
```bash
|
||||
./flicker <executable> [args...]
|
||||
# Example: Running 'ls' through Flicker
|
||||
./zig-out/bin/flicker ls -la
|
||||
```
|
||||
|
||||
## How it Works
|
||||
|
||||
For more information see the [Project Overview](docs/project_overview.md) and the [Use
|
||||
Cases](docs/use_cases.md).
|
||||
|
||||
### The Loader
|
||||
|
||||
Flicker does not use `LD_PRELOAD`. Instead, it maps the target ELF binary into memory. If the binary
|
||||
is dynamically linked, Flicker parses the `PT_INTERP` header, locates the dynamic linker (mostly
|
||||
`ld-linux.so`), and maps that as well. It then rewrites the Auxiliary Vector (`AT_PHDR`, `AT_ENTRY`,
|
||||
`AT_BASE`) on the stack to trick the C runtime into accepting the manually loaded environment.
|
||||
|
||||
### Patching Engine
|
||||
|
||||
Before transferring control to the entry point, Flicker scans executable segments for instructions
|
||||
that require instrumentation. It allocates "Trampolines" - executable memory pages located within
|
||||
±2GB of the target instruction.
|
||||
|
||||
To overwrite an instruction with a 5-byte jump (`jmp rel32`) without corrupting adjacent code or
|
||||
breaking jump targets, Flicker uses a Back-to-Front scanning approach and a constraint solver to
|
||||
find valid bytes for "instruction punning."
|
||||
|
||||
### Syscall Interception
|
||||
|
||||
Flicker can replace `syscall` opcodes with jumps to a custom handler. This handler emulates the
|
||||
syscall logic or modifies arguments.
|
||||
|
||||
Special handling detects `clone` syscalls to ensure the child thread (which wakes up with a fresh
|
||||
stack) does not crash when attempting to restore the parent's register state.
|
||||
|
||||
Path Spoofing: Intercepts readlink on `/proc/self/exe` to return the path of the target binary
|
||||
rather than the Flicker loader.
|
||||
|
||||
## License
|
||||
|
||||
Apache 2.0
|
||||
Apache License 2.0
|
||||
|
||||
52
docs/TODO.md
Normal file
52
docs/TODO.md
Normal file
@@ -0,0 +1,52 @@
|
||||
## General things
|
||||
|
||||
### Thread-locals
|
||||
|
||||
Right now we don't use any thread-local stuff in zig. This means that the application can freely
|
||||
decide what to do with the `fs` segment. If we need some thread-locals in the future we have to
|
||||
carefully think about how to do it.
|
||||
|
||||
If `FSGSBASE` is available we can swap out the segment real fast. If not we would need to fallback
|
||||
to `arch_prctl` which is of course a lot slower. Fortunately `FSGSBASE` is available since Intel
|
||||
IvyBridge(2012) and AMD Zen 2 Family 17H(2019) and Linux 5.9(2020).
|
||||
|
||||
## Major things
|
||||
|
||||
- [x] `clone`: with and without stack switching
|
||||
- [x] `clone3`: with and without stack switching
|
||||
- [x] `fork`: likely there is nothing to be done here but just to be sure, check again
|
||||
- [x] `rt_sigreturn`: we can't use the normal `syscall` interception because we push something onto
|
||||
the stack, so `ucontext` isn't on top anymore.
|
||||
- [x] `/proc/self/exe`: intercept calls to `readlink`/`readlinkat` with that as argument
|
||||
- [x] `auxv`: check if that is setup correctly and completely
|
||||
- [x] JIT support: intercept `mmap`, `mprotect` and `mremap` that change pages to be executable
|
||||
- [ ] `SIGILL` patching fallback
|
||||
- [x] `vdso` handling
|
||||
- [x] check why the libc tests are flaky
|
||||
|
||||
## Minor things
|
||||
|
||||
- [ ] Cleanup: When a JIT engine frees code, our trampolines are "zombies", so over time we leak
|
||||
memory and also reduce the patching percentage
|
||||
- [ ] Ghost page edge case: In all patch strategies, if a range spans multiple pages and we `mmap`
|
||||
the first one but can't `mmap` the second one we just let the first one mapped. It would be better
|
||||
to unmap them
|
||||
- [ ] Right now when patching we mmap a page and may not use it, but we still leave it mapped. This
|
||||
leaks memory. If we fix this correctly the Ghost page issue is also fixed
|
||||
- [ ] Re-entrancy for `patchRegion`
|
||||
- when a signal comes, while we are in that function, and we need to patch something due to the
|
||||
signal we will deadlock
|
||||
- [ ] strict disassembly mode: currently we warn on disassembly error, provide a flag to stop instead
|
||||
- [ ] Separate stack for flicker
|
||||
- when the application is run with a small stack (`sigaltstack`, goroutines) we might overflow
|
||||
especially for the `patchRegion` call
|
||||
- either one global stack for all to use(with a mutex) or a thread-local stack (though using
|
||||
`fs` has other problems)
|
||||
- [ ] `exec`: option to persist across `exec` calls, useful for things like `make`
|
||||
- [ ] `prctl`/`arch_prctl`: check if/what we need to intercept and change
|
||||
- [ ] `seccomp`: check what we need to intercept and change
|
||||
- [ ] `modify_ldt`: check what we need to intercept and change
|
||||
- [ ] `set_tid_address`: check what we need to intercept and change
|
||||
- [ ] performance optimizations for patched code? Peephole might be possible
|
||||
- [ ] maybe add a way to run something after the client is finished
|
||||
- could be useful for statistics, cleanup(if necessary), or notifying of suppressed warnings
|
||||
115
docs/project_overview.md
Normal file
115
docs/project_overview.md
Normal file
@@ -0,0 +1,115 @@
|
||||
# Project Flicker: Universal Load-Time Binary Rewriting
|
||||
|
||||
Flicker is a binary rewriting infrastructure designed for native amd64 Linux applications. Its
|
||||
primary objective is to enable universal instrumentation-the ability to patch any instruction-with
|
||||
minimal performance overhead.
|
||||
|
||||
Current approaches to binary rewriting force a difficult trade-off between coverage, performance,
|
||||
and complexity. Flicker addresses this by operating at load-time, combining the transparency of
|
||||
load-time injection with control-flow agnostic patching techniques. This architecture supports
|
||||
statically linked executables, dynamically linked libraries, and Just-In-Time (JIT) compiled code
|
||||
within a single unified framework.
|
||||
|
||||
## The Landscape of Binary Rewriting
|
||||
|
||||
To understand Flicker's position, it is helpful to look at the two dominant approaches: dynamic and
|
||||
static rewriting.
|
||||
|
||||
Dynamic Binary Translation (DBT) tools, such as DynamoRIO or Pin, execute programs inside a virtual
|
||||
machine-like environment. They act as interpreters that disassemble and translate code blocks on the
|
||||
fly. This allows them to handle JIT code and shared libraries natively because they see the
|
||||
instruction stream as it executes. However, this flexibility incurs significant overhead, often
|
||||
slowing execution by 20% to 50% because the engine must constantly disassemble and translate code.
|
||||
|
||||
Static Binary Rewriting involves modifying the binary on disk before execution. While potentially
|
||||
fast, this approach faces the theoretically undecidable problem of disassembly. Identifying all jump
|
||||
targets in a stripped binary is reducible to the halting problem. If an instruction is moved to
|
||||
insert a patch, existing jump targets break. Static tools often lift code to an Intermediate
|
||||
Representation (IR) to manage this, but this adds complexity and brittleness.
|
||||
|
||||
## The Flicker Architecture: Load-Time Rewriting
|
||||
|
||||
Flicker pursues a third path: load-time binary rewriting. This occurs after the executable is mapped
|
||||
into memory but before the entry point is executed. By implementing a custom user-space loader, the
|
||||
system gains total control over the process lifecycle without incurring the runtime overhead of a
|
||||
DBT engine.
|
||||
|
||||
The key advantage of this approach is the ability to use `mmap` to allocate trampoline pages
|
||||
directly near the target code. This removes the need to hijack binary sections to embed loader and
|
||||
trampoline information, which is a common limitation of static rewriting tools.
|
||||
|
||||
### The Patching Mechanism
|
||||
|
||||
To solve the static rewriting issue of shifting addresses, Flicker adopts the methodology used by
|
||||
E9Patch. The core invariant is that the size of the code section never changes, and instructions are
|
||||
never moved unless evicted to a trampoline. This makes the patching process control-flow agnostic;
|
||||
valid jump targets remain valid because addresses do not shift.
|
||||
|
||||
Flicker applies patches using a hierarchy of tactics ordered by invasiveness. Ideally, if an
|
||||
instruction is five bytes or larger, it is replaced with a standard 32-bit relative jump to a
|
||||
trampoline. If the instruction is smaller than five bytes, the system attempts "Instruction
|
||||
Punning," where it finds a jump offset that overlaps with the bytes of the following instructions to
|
||||
form a valid target. If punning fails, the system tries using instruction prefixes to shift the jump
|
||||
bytes (Padded Jumps).
|
||||
|
||||
When these non-destructive methods fail, Flicker employs eviction strategies. "Successor Eviction"
|
||||
moves the following instruction to a trampoline to create space for the patch. If that is
|
||||
insufficient, "Neighbor Eviction" searches for a neighboring instruction up to 128 bytes away,
|
||||
evicting it to create a hole that can stage a short jump to the trampoline. As a final fallback to
|
||||
guarantee 100% coverage, the system can insert an invalid instruction to trap execution, though this
|
||||
comes at a performance cost.
|
||||
|
||||
### Universal Coverage via Induction
|
||||
|
||||
Flicker treats code discovery as an inductive problem, ensuring support for static executables,
|
||||
dynamic libraries, and JIT code.
|
||||
|
||||
The base case is a statically linked executable. Flicker acts as the OS loader: it reads ELF
|
||||
headers, maps segments, performs a linear scan of the executable sections, and applies patches
|
||||
before jumping to the entry point. This relies on the assumption that modern compilers produce
|
||||
tessellated code with no gaps.
|
||||
|
||||
The inductive step covers JIT code and dynamic libraries. on Linux, generating executable code
|
||||
mostly follows a pattern: memory is mapped, code is written, and then `mprotect` is called to make
|
||||
it executable. Flicker intercepts all `mprotect` and `mmap` calls. When a page transitions to
|
||||
executable status, the system scans the buffer and applies patches before the kernel finalizes the
|
||||
permissions.
|
||||
|
||||
This logic extends recursively to dynamic libraries. Because the dynamic loader (`ld.so`) uses
|
||||
`mmap` and `mprotect` to load libraries (such as libc or libGL), intercepting the loader's system
|
||||
calls allows Flicker to automatically patch every library loaded, including those loaded manually
|
||||
via `dlopen`.
|
||||
|
||||
## System Integration and Edge Cases
|
||||
|
||||
Binary rewriting at this level encounters specific OS behaviors that require precise handling to
|
||||
avoid crashes.
|
||||
|
||||
### Thread Creation and Stack Switching
|
||||
|
||||
The `clone` syscall, creates a thread with a fresh stack. If a patch intercepts `clone`, the
|
||||
trampoline runs on the parent's stack. When `clone` returns, the child thread wakes up inside the
|
||||
trampoline at the instruction following the syscall. The child then attempts to run the trampoline
|
||||
epilogue to restore registers, but it does so using its new, empty stack, reading garbage data and
|
||||
crashing.
|
||||
|
||||
To resolve this, the trampoline checks the return value. If it is the parent, execution proceeds
|
||||
normally. If it is the child, the trampoline immediately jumps back to the original code, skipping
|
||||
stack restoration.
|
||||
|
||||
### Signal Handling
|
||||
|
||||
When a signal handler returns, it calls `rt_sigreturn`, telling the kernel to restore the CPU state
|
||||
from a `ucontext` struct saved on the stack. If a trampoline modifies the stack pointer to save
|
||||
context, `rt_sigreturn` is called while the stack pointer is modified. The kernel then looks for
|
||||
`ucontext` at the wrong address, corrupting the process state. Flicker handles this by detecting
|
||||
`rt_sigreturn` and restoring the stack pointer to its exact pre-trampoline value before executing
|
||||
the syscall.
|
||||
|
||||
### The vDSO and Concurrency
|
||||
|
||||
The virtual Dynamic Shared Object (vDSO) allows fast syscalls in user space. Flicker locates the
|
||||
vDSO via the `AT_SYSINFO` auxiliary vector and patches it like any other shared library. Regarding
|
||||
concurrency, a race condition exists where one thread executes JIT code while another modifies it.
|
||||
Flicker mitigates this by intercepting the `mprotect` call while the page is still writable but not
|
||||
yet executable, patching the code safely before the kernel atomically updates the permissions.
|
||||
@@ -71,7 +71,7 @@ pub fn init() !void {
|
||||
mem.writeInt(
|
||||
u64,
|
||||
syscall_flicken_bytes[2..][0..8],
|
||||
@intFromPtr(&syscalls.syscall_entry),
|
||||
@intFromPtr(&syscalls.syscallEntry),
|
||||
.little,
|
||||
);
|
||||
flicken_templates.putAssumeCapacity("syscall", .{ .name = "syscall", .bytes = &syscall_flicken_bytes });
|
||||
@@ -209,9 +209,13 @@ pub const Statistics = struct {
|
||||
/// Scans a memory region for instructions that require patching and applies the patches
|
||||
/// using a hierarchy of tactics (Direct/Punning -> Successor Eviction -> Neighbor Eviction).
|
||||
///
|
||||
/// The region is processed Back-to-Front to ensure that modifications (punning) only
|
||||
/// constrain instructions that have already been processed or are locked.
|
||||
/// NOTE: This function leaves the region as R|W and the caller is responsible for changing it to
|
||||
/// the desired protection
|
||||
pub fn patchRegion(region: []align(page_size) u8) !void {
|
||||
log.info(
|
||||
"Patching region: 0x{x} - 0x{x}",
|
||||
.{ @intFromPtr(region.ptr), @intFromPtr(®ion[region.len - 1]) },
|
||||
);
|
||||
// For now just do a coarse lock.
|
||||
// TODO: should we make this more fine grained?
|
||||
mutex.lock();
|
||||
@@ -296,8 +300,6 @@ pub fn patchRegion(region: []align(page_size) u8) !void {
|
||||
{
|
||||
// Apply patches.
|
||||
try posix.mprotect(region, posix.PROT.READ | posix.PROT.WRITE);
|
||||
defer posix.mprotect(region, posix.PROT.READ | posix.PROT.EXEC) catch
|
||||
@panic("patchRegion: mprotect back to R|X failed. Can't continue");
|
||||
|
||||
var stats = Statistics.empty;
|
||||
// Used to track which bytes have been modified or used for constraints (punning),
|
||||
@@ -854,7 +856,7 @@ fn ensureRangeWritable(
|
||||
const gop = try allocated_pages.getOrPut(gpa, page_addr);
|
||||
if (gop.found_existing) {
|
||||
const ptr: [*]align(page_size) u8 = @ptrFromInt(page_addr);
|
||||
try posix.mprotect(ptr[0..page_addr], protection);
|
||||
try posix.mprotect(ptr[0..page_size], protection);
|
||||
} else {
|
||||
const addr = posix.mmap(
|
||||
@ptrFromInt(page_addr),
|
||||
|
||||
@@ -6,10 +6,14 @@ const log = std.log.scoped(.disassembler);
|
||||
const assert = std.debug.assert;
|
||||
|
||||
pub const InstructionIterator = struct {
|
||||
/// Maximum number of warnings to print per iterator before suppressing.
|
||||
pub var max_warnings: u64 = 3;
|
||||
|
||||
decoder: zydis.ZydisDecoder,
|
||||
bytes: []const u8,
|
||||
instruction: zydis.ZydisDecodedInstruction,
|
||||
operands: [zydis.ZYDIS_MAX_OPERAND_COUNT]zydis.ZydisDecodedOperand,
|
||||
warnings: usize = 0,
|
||||
|
||||
pub fn init(bytes: []const u8) InstructionIterator {
|
||||
var decoder: zydis.ZydisDecoder = undefined;
|
||||
@@ -38,27 +42,33 @@ pub const InstructionIterator = struct {
|
||||
var address: u64 = @intFromPtr(iterator.bytes.ptr);
|
||||
|
||||
while (!zydis.ZYAN_SUCCESS(status)) {
|
||||
// TODO: handle common padding bytes
|
||||
switch (status) {
|
||||
zydis.ZYDIS_STATUS_NO_MORE_DATA => {
|
||||
log.info("next: Got status: NO_MORE_DATA. Iterator completed.", .{});
|
||||
return null;
|
||||
},
|
||||
zydis.ZYDIS_STATUS_ILLEGAL_LOCK => log.warn("next: Got status: ILLEGAL_LOCK. " ++
|
||||
"Byte stepping, to find next valid instruction begin", .{}),
|
||||
zydis.ZYDIS_STATUS_DECODING_ERROR => log.warn("next: Got status: DECODING_ERROR. " ++
|
||||
"Byte stepping, to find next valid instruction begin", .{}),
|
||||
else => log.warn("next: Got unknown status: 0x{x}. Byte stepping, to find next " ++
|
||||
"valid instruction begin", .{status}),
|
||||
if (status == zydis.ZYDIS_STATUS_NO_MORE_DATA) {
|
||||
log.debug("next: Got status: NO_MORE_DATA. Iterator completed.", .{});
|
||||
return null;
|
||||
}
|
||||
|
||||
// TODO: handle common padding bytes
|
||||
// TODO: add a flag to instead return an error
|
||||
iterator.warnings += 1;
|
||||
if (iterator.warnings <= max_warnings) {
|
||||
const err_desc = switch (status) {
|
||||
zydis.ZYDIS_STATUS_ILLEGAL_LOCK => "ILLEGAL_LOCK",
|
||||
zydis.ZYDIS_STATUS_DECODING_ERROR => "DECODING_ERROR",
|
||||
zydis.ZYDIS_STATUS_INVALID_MAP => "INVALID_MAP",
|
||||
else => "UNKNOWN",
|
||||
};
|
||||
log.warn(
|
||||
"next: Got status: {s} (0x{x}). Byte stepping, for next instruction begin",
|
||||
.{ err_desc, status },
|
||||
);
|
||||
if (iterator.warnings == max_warnings) {
|
||||
log.warn("next: Suppressing further warnings for this disassembly.", .{});
|
||||
}
|
||||
}
|
||||
|
||||
log.debug(
|
||||
"next: instruction length: {}, address: 0x{x}, bytes: 0x{x}",
|
||||
.{
|
||||
iterator.instruction.length,
|
||||
address,
|
||||
iterator.bytes[0..iterator.instruction.length],
|
||||
},
|
||||
"next: skipping byte at address: 0x{x}, byte: 0x{x}",
|
||||
.{ address, iterator.bytes[0] },
|
||||
);
|
||||
|
||||
iterator.bytes = iterator.bytes[1..];
|
||||
|
||||
134
src/main.zig
134
src/main.zig
@@ -71,6 +71,7 @@ pub fn main() !void {
|
||||
const base = try loadStaticElf(ehdr, &file_reader);
|
||||
const entry = ehdr.entry + if (ehdr.type == .DYN) base else 0;
|
||||
log.info("Executable loaded: base=0x{x}, entry=0x{x}", .{ base, entry });
|
||||
try patchLoadedElf(base);
|
||||
|
||||
// Check for dynamic linker
|
||||
var maybe_interp_base: ?usize = null;
|
||||
@@ -102,13 +103,13 @@ pub fn main() !void {
|
||||
"Interpreter loaded: base=0x{x}, entry=0x{x}",
|
||||
.{ interp_base, maybe_interp_entry.? },
|
||||
);
|
||||
try patchLoadedElf(interp_base);
|
||||
interp.close();
|
||||
}
|
||||
|
||||
var i: usize = 0;
|
||||
const auxv = std.os.linux.elf_aux_maybe.?;
|
||||
while (auxv[i].a_type != elf.AT_NULL) : (i += 1) {
|
||||
// TODO: look at other auxv types and check if we need to change them.
|
||||
auxv[i].a_un.a_val = switch (auxv[i].a_type) {
|
||||
elf.AT_PHDR => base + ehdr.phoff,
|
||||
elf.AT_PHENT => ehdr.phentsize,
|
||||
@@ -116,6 +117,21 @@ pub fn main() !void {
|
||||
elf.AT_BASE => maybe_interp_base orelse auxv[i].a_un.a_val,
|
||||
elf.AT_ENTRY => entry,
|
||||
elf.AT_EXECFN => @intFromPtr(std.os.argv[arg_index]),
|
||||
elf.AT_SYSINFO_EHDR => blk: {
|
||||
log.info("Found vDSO at 0x{x}", .{auxv[i].a_un.a_val});
|
||||
try patchLoadedElf(auxv[i].a_un.a_val);
|
||||
break :blk auxv[i].a_un.a_val;
|
||||
},
|
||||
elf.AT_EXECFD => {
|
||||
@panic("Got AT_EXECFD auxv value");
|
||||
// TODO: handle AT_EXECFD, when needed
|
||||
// The SysV ABI Specification says:
|
||||
// > At process creation the system may pass control to an interpreter program. When
|
||||
// > this happens, the system places either an entry of type AT_EXECFD or one of
|
||||
// > type AT_PHDR in the auxiliary vector. The entry for type AT_EXECFD uses the
|
||||
// > a_val member to contain a file descriptor open to read the application
|
||||
// > program’s object file.
|
||||
},
|
||||
else => auxv[i].a_un.a_val,
|
||||
};
|
||||
}
|
||||
@@ -210,16 +226,45 @@ fn loadStaticElf(ehdr: elf.Header, file_reader: *std.fs.File.Reader) !usize {
|
||||
return UnfinishedReadError.UnfinishedRead;
|
||||
|
||||
const protections = elfToMmapProt(phdr.p_flags);
|
||||
if (protections & posix.PROT.EXEC > 0) {
|
||||
log.info("Patching executable segment", .{});
|
||||
try Patcher.patchRegion(ptr);
|
||||
}
|
||||
try posix.mprotect(ptr, protections);
|
||||
}
|
||||
log.debug("loadElf returning base: 0x{x}", .{@intFromPtr(base.ptr)});
|
||||
return @intFromPtr(base.ptr);
|
||||
}
|
||||
|
||||
fn patchLoadedElf(base: usize) !void {
|
||||
const ehdr = @as(*const elf.Ehdr, @ptrFromInt(base));
|
||||
if (!mem.eql(u8, ehdr.e_ident[0..4], elf.MAGIC)) return error.InvalidElfMagic;
|
||||
|
||||
const phoff = ehdr.e_phoff;
|
||||
const phnum = ehdr.e_phnum;
|
||||
const phentsize = ehdr.e_phentsize;
|
||||
|
||||
var i: usize = 0;
|
||||
while (i < phnum) : (i += 1) {
|
||||
const phdr_ptr = base + phoff + (i * phentsize);
|
||||
const phdr = @as(*const elf.Phdr, @ptrFromInt(phdr_ptr));
|
||||
|
||||
if (phdr.p_type != elf.PT_LOAD) continue;
|
||||
if ((phdr.p_flags & elf.PF_X) == 0) continue;
|
||||
|
||||
// Determine VMA
|
||||
// For ET_EXEC, p_vaddr is absolute.
|
||||
// For ET_DYN, p_vaddr is offset from base.
|
||||
const vaddr = if (ehdr.e_type == elf.ET.DYN) base + phdr.p_vaddr else phdr.p_vaddr;
|
||||
const memsz = phdr.p_memsz;
|
||||
|
||||
const page_start = mem.alignBackward(usize, vaddr, page_size);
|
||||
const page_end = mem.alignForward(usize, vaddr + memsz, page_size);
|
||||
const size = page_end - page_start;
|
||||
|
||||
const region = @as([*]align(page_size) u8, @ptrFromInt(page_start))[0..size];
|
||||
|
||||
try Patcher.patchRegion(region);
|
||||
try posix.mprotect(region, elfToMmapProt(phdr.p_flags));
|
||||
}
|
||||
}
|
||||
|
||||
/// Converts ELF program header protection flags to mmap protection flags.
|
||||
fn elfToMmapProt(elf_prot: u64) u32 {
|
||||
var result: u32 = posix.PROT.NONE;
|
||||
@@ -288,10 +333,9 @@ test "nolibc_nopie_exit" {
|
||||
test "nolibc_pie_exit" {
|
||||
try testHelper(&.{ flicker_path, getTestExePath("nolibc_pie_exit") }, "");
|
||||
}
|
||||
// BUG: This one is flaky
|
||||
// test "libc_pie_exit" {
|
||||
// try testHelper(&.{ flicker_path, getTestExePath("libc_pie_exit") }, "");
|
||||
// }
|
||||
test "libc_pie_exit" {
|
||||
try testHelper(&.{ flicker_path, getTestExePath("libc_pie_exit") }, "");
|
||||
}
|
||||
|
||||
test "nolibc_nopie_helloWorld" {
|
||||
try testHelper(&.{ flicker_path, getTestExePath("nolibc_nopie_helloWorld") }, "Hello World!\n");
|
||||
@@ -299,10 +343,9 @@ test "nolibc_nopie_helloWorld" {
|
||||
test "nolibc_pie_helloWorld" {
|
||||
try testHelper(&.{ flicker_path, getTestExePath("nolibc_pie_helloWorld") }, "Hello World!\n");
|
||||
}
|
||||
// BUG: This one is flaky
|
||||
// test "libc_pie_helloWorld" {
|
||||
// try testHelper(&.{ flicker_path, getTestExePath("libc_pie_helloWorld") }, "Hello World!\n");
|
||||
// }
|
||||
test "libc_pie_helloWorld" {
|
||||
try testHelper(&.{ flicker_path, getTestExePath("libc_pie_helloWorld") }, "Hello World!\n");
|
||||
}
|
||||
|
||||
test "nolibc_nopie_printArgs" {
|
||||
try testPrintArgs("nolibc_nopie_printArgs");
|
||||
@@ -310,10 +353,9 @@ test "nolibc_nopie_printArgs" {
|
||||
test "nolibc_pie_printArgs" {
|
||||
try testPrintArgs("nolibc_pie_printArgs");
|
||||
}
|
||||
// BUG: This one is flaky
|
||||
// test "libc_pie_printArgs" {
|
||||
// try testPrintArgs("libc_pie_printArgs");
|
||||
// }
|
||||
test "libc_pie_printArgs" {
|
||||
try testPrintArgs("libc_pie_printArgs");
|
||||
}
|
||||
|
||||
test "nolibc_nopie_readlink" {
|
||||
try testReadlink("nolibc_nopie_readlink");
|
||||
@@ -321,10 +363,9 @@ test "nolibc_nopie_readlink" {
|
||||
test "nolibc_pie_readlink" {
|
||||
try testReadlink("nolibc_pie_readlink");
|
||||
}
|
||||
// BUG: This one just outputs the path to the flicker executable and is likely also flaky
|
||||
// test "libc_pie_readlink" {
|
||||
// try testReadlink("libc_pie_readlink");
|
||||
// }
|
||||
test "libc_pie_readlink" {
|
||||
try testReadlink("libc_pie_readlink");
|
||||
}
|
||||
|
||||
test "nolibc_nopie_clone_raw" {
|
||||
try testHelper(
|
||||
@@ -352,6 +393,57 @@ test "nolibc_pie_clone_no_new_stack" {
|
||||
);
|
||||
}
|
||||
|
||||
test "nolibc_nopie_fork" {
|
||||
try testHelper(
|
||||
&.{ flicker_path, getTestExePath("nolibc_nopie_fork") },
|
||||
"Child: I'm alive!\nParent: Child died.\n",
|
||||
);
|
||||
}
|
||||
test "nolibc_pie_fork" {
|
||||
try testHelper(
|
||||
&.{ flicker_path, getTestExePath("nolibc_pie_fork") },
|
||||
"Child: I'm alive!\nParent: Child died.\n",
|
||||
);
|
||||
}
|
||||
test "libc_pie_fork" {
|
||||
try testHelper(
|
||||
&.{ flicker_path, getTestExePath("libc_pie_fork") },
|
||||
"Child: I'm alive!\nParent: Child died.\n",
|
||||
);
|
||||
}
|
||||
|
||||
test "nolibc_nopie_signal_handler" {
|
||||
try testHelper(
|
||||
&.{ flicker_path, getTestExePath("nolibc_nopie_signal_handler") },
|
||||
"In signal handler\nSignal handled successfully\n",
|
||||
);
|
||||
}
|
||||
test "nolibc_pie_signal_handler" {
|
||||
try testHelper(
|
||||
&.{ flicker_path, getTestExePath("nolibc_pie_signal_handler") },
|
||||
"In signal handler\nSignal handled successfully\n",
|
||||
);
|
||||
}
|
||||
|
||||
test "nolibc_nopie_vdso_clock" {
|
||||
try testHelper(
|
||||
&.{ flicker_path, getTestExePath("nolibc_nopie_vdso_clock") },
|
||||
"Time gotten\n",
|
||||
);
|
||||
}
|
||||
test "nolibc_pie_vdso_clock" {
|
||||
try testHelper(
|
||||
&.{ flicker_path, getTestExePath("nolibc_pie_vdso_clock") },
|
||||
"Time gotten\n",
|
||||
);
|
||||
}
|
||||
test "libc_pie_vdso_clock" {
|
||||
try testHelper(
|
||||
&.{ flicker_path, getTestExePath("libc_pie_vdso_clock") },
|
||||
"Time gotten\n",
|
||||
);
|
||||
}
|
||||
|
||||
test "echo" {
|
||||
try testHelper(&.{ "echo", "Hello", "There" }, "Hello There\n");
|
||||
}
|
||||
|
||||
299
src/syscalls.zig
299
src/syscalls.zig
@@ -1,10 +1,15 @@
|
||||
const std = @import("std");
|
||||
const linux = std.os.linux;
|
||||
const posix = std.posix;
|
||||
const Patcher = @import("Patcher.zig");
|
||||
const assert = std.debug.assert;
|
||||
|
||||
/// Represents the stack layout pushed by `syscall_entry` before calling the handler.
|
||||
pub const UserRegs = extern struct {
|
||||
const page_size = std.heap.pageSize();
|
||||
|
||||
const log = std.log.scoped(.syscalls);
|
||||
|
||||
/// Represents the stack layout pushed by `syscallEntry` before calling the handler.
|
||||
pub const SavedContext = extern struct {
|
||||
padding: u64, // Result of `sub $8, %rsp` for alignment
|
||||
rflags: u64,
|
||||
rax: u64,
|
||||
@@ -22,27 +27,28 @@ pub const UserRegs = extern struct {
|
||||
r13: u64,
|
||||
r14: u64,
|
||||
r15: u64,
|
||||
/// This one isn't pushed on the stack by `syscall_entry`. It's pushed by the `call r11` to get
|
||||
/// to the `syscall_entry`
|
||||
/// Pushed automatically by the `call r11` instruction when entering `syscallEntry`.
|
||||
/// Crucially we copy this onto the child stack (if needed) because then we can just return at
|
||||
/// the end of the child handler inside `handleClone`.
|
||||
return_address: u64,
|
||||
};
|
||||
|
||||
/// The main entry point for intercepted syscalls.
|
||||
///
|
||||
/// This function is called from `syscall_entry` with a pointer to the saved registers.
|
||||
/// It effectively emulates the syscall instruction while allowing for interception.
|
||||
export fn syscall_handler(regs: *UserRegs) callconv(.c) void {
|
||||
/// This function is called from `syscallEntry` with a pointer to the saved context.
|
||||
/// It dispatches specific syscalls to handlers or executes them directly.
|
||||
export fn syscall_handler(ctx: *SavedContext) callconv(.c) void {
|
||||
// TODO: Handle signals (masking) to prevent re-entrancy issues if we touch global state.
|
||||
|
||||
const sys: linux.SYS = @enumFromInt(regs.rax);
|
||||
const sys: linux.SYS = @enumFromInt(ctx.rax);
|
||||
|
||||
switch (sys) {
|
||||
.readlink => {
|
||||
// readlink(const char *path, char *buf, size_t bufsiz)
|
||||
const path_ptr = @as([*:0]const u8, @ptrFromInt(regs.rdi));
|
||||
const path_ptr = @as([*:0]const u8, @ptrFromInt(ctx.rdi));
|
||||
// TODO: handle relative paths with cwd
|
||||
if (isProcSelfExe(path_ptr)) {
|
||||
handleReadlink(regs.rsi, regs.rdx, regs);
|
||||
handleReadlink(ctx.rsi, ctx.rdx, ctx);
|
||||
return;
|
||||
}
|
||||
},
|
||||
@@ -52,63 +58,131 @@ export fn syscall_handler(regs: *UserRegs) callconv(.c) void {
|
||||
// TODO: handle relative paths with dirfd pointing to /proc/self
|
||||
// TODO: handle relative paths with dirfd == AT_FDCWD (like readlink)
|
||||
// TODO: handle empty pathname
|
||||
const path_ptr = @as([*:0]const u8, @ptrFromInt(regs.rsi));
|
||||
const path_ptr = @as([*:0]const u8, @ptrFromInt(ctx.rsi));
|
||||
if (isProcSelfExe(path_ptr)) {
|
||||
handleReadlink(regs.rdx, regs.r10, regs);
|
||||
handleReadlink(ctx.rdx, ctx.r10, ctx);
|
||||
return;
|
||||
}
|
||||
},
|
||||
.clone, .clone3 => {
|
||||
handleClone(regs);
|
||||
handleClone(ctx);
|
||||
return;
|
||||
},
|
||||
.fork, .vfork => {
|
||||
// fork/vfork duplicate the stack (or share it until exec), so the return path via
|
||||
// syscall_entry works fine.
|
||||
},
|
||||
.rt_sigreturn => {
|
||||
@panic("sigreturn is not supported yet");
|
||||
// The kernel expects the stack pointer to point to the `ucontext` structure. But in our
|
||||
// case `syscallEntry` pushed the `SavedContext` onto the stack.
|
||||
// So we just need to reset the stack pointer to what it was before `syscallEntry` was
|
||||
// called. The `SavedContext` includes the return address pushed by the trampoline, so
|
||||
// the original stack pointer is exactly at the end of `SavedContext`.
|
||||
const rsp_orig = @intFromPtr(ctx) + @sizeOf(SavedContext);
|
||||
|
||||
asm volatile (
|
||||
\\ mov %[rsp], %%rsp
|
||||
\\ syscall
|
||||
:
|
||||
: [rsp] "r" (rsp_orig),
|
||||
[number] "{rax}" (ctx.rax),
|
||||
: .{ .memory = true });
|
||||
unreachable;
|
||||
},
|
||||
.execve, .execveat => |s| {
|
||||
.mmap => {
|
||||
// mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset)
|
||||
|
||||
const prot: u32 = @intCast(ctx.rdx);
|
||||
// Execute the syscall first to get the address (rax)
|
||||
ctx.rax = executeSyscall(ctx);
|
||||
const addr = ctx.rax;
|
||||
var len = ctx.rsi;
|
||||
const flags: linux.MAP = @bitCast(@as(u32, @intCast(ctx.r10)));
|
||||
const fd: linux.fd_t = @bitCast(@as(u32, @truncate(ctx.r8)));
|
||||
const offset = ctx.r9;
|
||||
|
||||
const is_error = @as(i64, @bitCast(ctx.rax)) < 0;
|
||||
if (is_error) return;
|
||||
if ((prot & posix.PROT.EXEC) == 0) return;
|
||||
|
||||
// If file-backed (not anonymous), clamp len to file size to avoid SIGBUS
|
||||
if (!flags.ANONYMOUS) {
|
||||
var stat: linux.Stat = undefined;
|
||||
if (0 == linux.fstat(fd, &stat) and linux.S.ISREG(stat.mode)) {
|
||||
const file_size: u64 = @intCast(stat.size);
|
||||
len = if (offset >= file_size) 0 else @min(len, file_size - offset);
|
||||
}
|
||||
}
|
||||
|
||||
if (len <= 0) return;
|
||||
// mmap addresses are always page aligned
|
||||
const ptr = @as([*]align(page_size) u8, @ptrFromInt(addr));
|
||||
// Check if we can patch it
|
||||
Patcher.patchRegion(ptr[0..len]) catch |err| {
|
||||
std.log.warn("JIT Patching failed: {}", .{err});
|
||||
};
|
||||
|
||||
// patchRegion leaves it as RW. We need to restore to requested prot.
|
||||
_ = linux.syscall3(.mprotect, addr, len, prot);
|
||||
return;
|
||||
},
|
||||
.mprotect => {
|
||||
// mprotect(void *addr, size_t len, int prot)
|
||||
// TODO: cleanup trampolines, when removing X
|
||||
const prot: u32 = @intCast(ctx.rdx);
|
||||
if ((prot & posix.PROT.EXEC) != 0) {
|
||||
const addr = ctx.rdi;
|
||||
const len = ctx.rsi;
|
||||
// mprotect requires addr to be page aligned.
|
||||
if (len > 0 and std.mem.isAligned(addr, page_size)) {
|
||||
const ptr = @as([*]align(page_size) u8, @ptrFromInt(addr));
|
||||
Patcher.patchRegion(ptr[0..len]) catch |err| {
|
||||
std.log.warn("mprotect Patching failed: {}", .{err});
|
||||
};
|
||||
// patchRegion leaves it R|W.
|
||||
}
|
||||
}
|
||||
ctx.rax = executeSyscall(ctx);
|
||||
return;
|
||||
},
|
||||
.execve, .execveat => {
|
||||
// TODO: option to persist across new processes
|
||||
std.debug.print("syscall {} called\n", .{s});
|
||||
ctx.rax = executeSyscall(ctx);
|
||||
return;
|
||||
},
|
||||
.prctl, .arch_prctl, .set_tid_address => |s| {
|
||||
.prctl, .arch_prctl, .set_tid_address => {
|
||||
// TODO: what do we need to handle from these?
|
||||
// process name
|
||||
// fs base(gs?)
|
||||
// thread id pointers
|
||||
std.debug.print("syscall {} called\n", .{s});
|
||||
},
|
||||
.mmap, .mprotect => {
|
||||
// TODO: JIT support
|
||||
// TODO: cleanup
|
||||
ctx.rax = executeSyscall(ctx);
|
||||
return;
|
||||
},
|
||||
.munmap, .mremap => {
|
||||
// TODO: cleanup
|
||||
ctx.rax = executeSyscall(ctx);
|
||||
return;
|
||||
},
|
||||
else => {
|
||||
// Write result back to the saved RAX so it is restored to the application.
|
||||
ctx.rax = executeSyscall(ctx);
|
||||
return;
|
||||
},
|
||||
|
||||
else => {},
|
||||
}
|
||||
|
||||
// Write result back to the saved RAX so it is restored to the application.
|
||||
regs.rax = executeSyscall(regs);
|
||||
unreachable;
|
||||
}
|
||||
|
||||
inline fn executeSyscall(regs: *UserRegs) u64 {
|
||||
inline fn executeSyscall(ctx: *SavedContext) u64 {
|
||||
return linux.syscall6(
|
||||
@enumFromInt(regs.rax),
|
||||
regs.rdi,
|
||||
regs.rsi,
|
||||
regs.rdx,
|
||||
regs.r10,
|
||||
regs.r8,
|
||||
regs.r9,
|
||||
@enumFromInt(ctx.rax),
|
||||
ctx.rdi,
|
||||
ctx.rsi,
|
||||
ctx.rdx,
|
||||
ctx.r10,
|
||||
ctx.r8,
|
||||
ctx.r9,
|
||||
);
|
||||
}
|
||||
|
||||
/// Assembly trampoline that saves state and calls the Zig handler.
|
||||
pub fn syscall_entry() callconv(.naked) void {
|
||||
/// This is the target of the `call r11` instruction in the syscall flicken.
|
||||
pub fn syscallEntry() callconv(.naked) void {
|
||||
asm volatile (
|
||||
\\ # Save all GPRs that must be preserved or are arguments
|
||||
\\ push %r15
|
||||
@@ -135,7 +209,7 @@ pub fn syscall_entry() callconv(.naked) void {
|
||||
\\ # Total misalign: 8 bytes. We need 16-byte alignment for 'call'.
|
||||
\\ sub $8, %rsp
|
||||
\\
|
||||
\\ # Pass pointer to regs (current rsp) as 1st argument (rdi) and call handler.
|
||||
\\ # Pass pointer to ctx (current rsp) as 1st argument (rdi) and call handler.
|
||||
\\ mov %rsp, %rdi
|
||||
\\ call syscall_handler
|
||||
\\
|
||||
@@ -175,14 +249,14 @@ fn isProcSelfExe(path: [*:0]const u8) bool {
|
||||
return path[i] == 0;
|
||||
}
|
||||
|
||||
fn handleReadlink(buf_addr: u64, buf_size: u64, regs: *UserRegs) void {
|
||||
fn handleReadlink(buf_addr: u64, buf_size: u64, ctx: *SavedContext) void {
|
||||
const target = Patcher.target_exec_path;
|
||||
const len = @min(target.len, buf_size);
|
||||
const dest = @as([*]u8, @ptrFromInt(buf_addr));
|
||||
@memcpy(dest[0..len], target[0..len]);
|
||||
|
||||
// readlink does not null-terminate if the buffer is full, it just returns length.
|
||||
regs.rax = len;
|
||||
ctx.rax = len;
|
||||
}
|
||||
|
||||
const CloneArgs = extern struct {
|
||||
@@ -199,44 +273,149 @@ const CloneArgs = extern struct {
|
||||
cgroup: u64,
|
||||
};
|
||||
|
||||
fn handleClone(regs: *UserRegs) void {
|
||||
const sys: linux.syscalls.X64 = @enumFromInt(regs.rax);
|
||||
std.debug.print("got: {}\n", .{sys});
|
||||
/// Handles `clone` and `clone3` syscalls, which are used for thread and process creation.
|
||||
///
|
||||
/// **The Stack Switching Problem:**
|
||||
/// When a thread is created, the caller provides a pointer to a new, empty stack (`child_stack`).
|
||||
/// 1. The parent enters the kernel via `syscallEntry` (the trampoline).
|
||||
/// 2. `syscallEntry` saves all registers and the return address onto the **parent's stack**.
|
||||
/// 3. The kernel creates the child thread and switches its stack pointer (`RSP`) to `child_stack`.
|
||||
/// 4. The child wakes up. If we simply let it return to `syscallEntry`, it would try to `pop`
|
||||
/// registers from its `child_stack`. But that stack is empty! It would pop garbage and crash.
|
||||
///
|
||||
/// **The Solution:**
|
||||
/// We manually replicate the parent's saved state onto the child's new stack *before* the syscall.
|
||||
///
|
||||
/// For that the following steps occur:
|
||||
/// 1. We decode the arguments to determine if this is `clone` or `clone3` and locate the target
|
||||
/// `child_stack`.
|
||||
/// 2. If `child_stack` is 0 (e.g., `fork`), no stack switching occurs. The function simply executes
|
||||
/// the syscall and handles the return value normally.
|
||||
/// 3. Else we need to stack switch:
|
||||
/// a. We calculate where `SavedContext` (registers + return addr) would sit on the top of the
|
||||
/// *new* `child_stack`. We then `memcpy` the current `ctx` (from the parent's stack) to this
|
||||
/// new location.
|
||||
/// b. We set `rax = 0` in the *copied* context, so the child sees itself as the child.
|
||||
/// c. We modify the syscall argument (the stack pointer passed to the kernel) to point to the
|
||||
/// *start* of our copied context on the new stack, rather than the raw top. This ensures that
|
||||
/// when the child wakes up, its `RSP` points exactly at the saved registers we just copied.
|
||||
/// d. We execute the raw syscall inline.
|
||||
/// - **Parent:** Returns from the syscall, updates `ctx.rax` with the Child PID, and returns
|
||||
/// to the trampoline normally.
|
||||
/// - **Child:** Wakes up on the new stack. It executes `postCloneChild`, restores all
|
||||
/// registers from the *new* stack (popping the values we copied in step 3a), and finally
|
||||
/// executes `ret`. This `ret` pops the `return_address` we copied, jumping directly back
|
||||
/// to the user code, effectively bypassing the `syscallEntry` epilogue.
|
||||
fn handleClone(ctx: *SavedContext) void {
|
||||
const sys: linux.syscalls.X64 = @enumFromInt(ctx.rax);
|
||||
var child_stack: u64 = 0;
|
||||
|
||||
// Determine stack
|
||||
if (sys == .clone) {
|
||||
// clone(flags, stack, ...)
|
||||
child_stack = regs.rsi;
|
||||
child_stack = ctx.rsi;
|
||||
} else {
|
||||
// clone3(struct clone_args *args, size_t size)
|
||||
const args = @as(*const CloneArgs, @ptrFromInt(regs.rdi));
|
||||
const args = @as(*const CloneArgs, @ptrFromInt(ctx.rdi));
|
||||
if (args.stack != 0) {
|
||||
child_stack = args.stack + args.stack_size;
|
||||
}
|
||||
}
|
||||
std.debug.print("child_stack: {x}\n", .{child_stack});
|
||||
|
||||
// If no new stack, just execute (like fork)
|
||||
if (child_stack == 0) {
|
||||
regs.rax = executeSyscall(regs);
|
||||
if (regs.rax == 0) {
|
||||
postCloneChild(regs);
|
||||
ctx.rax = executeSyscall(ctx);
|
||||
if (ctx.rax == 0) {
|
||||
postCloneChild(ctx);
|
||||
} else {
|
||||
assert(regs.rax > 0); // TODO:: error handling
|
||||
postCloneParent(regs);
|
||||
assert(ctx.rax > 0); // TODO:: error handling
|
||||
postCloneParent(ctx);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
@panic("case with a different stack is not handled yet");
|
||||
// Prepare child stack by copying SavedContext.
|
||||
// TODO: test alignment
|
||||
child_stack &= ~@as(u64, 0xf - 1); // align to 16 bytes
|
||||
const child_ctx_addr = child_stack - @sizeOf(SavedContext);
|
||||
const child_ctx = @as(*SavedContext, @ptrFromInt(child_ctx_addr));
|
||||
child_ctx.* = ctx.*;
|
||||
child_ctx.rax = 0;
|
||||
|
||||
// Prepare arguments for syscall
|
||||
var new_rsi = ctx.rsi;
|
||||
var new_rdi = ctx.rdi;
|
||||
var clone3_args_copy: CloneArgs = undefined;
|
||||
|
||||
if (sys == .clone) {
|
||||
new_rsi = child_ctx_addr;
|
||||
} else {
|
||||
const args = @as(*const CloneArgs, @ptrFromInt(ctx.rdi));
|
||||
clone3_args_copy = args.*;
|
||||
clone3_args_copy.stack = child_ctx_addr;
|
||||
clone3_args_copy.stack_size = 0; // TODO:
|
||||
new_rdi = @intFromPtr(&clone3_args_copy);
|
||||
}
|
||||
|
||||
// Execute clone/clone3 via inline assembly
|
||||
// We handle the child path entirely in assembly to avoid stack frame issues.
|
||||
const ret = asm volatile (
|
||||
\\ syscall
|
||||
\\ test %rax, %rax
|
||||
\\ jnz 1f
|
||||
\\
|
||||
\\ # --- CHILD PATH ---
|
||||
\\ # We are now on the new stack and %rsp points to child_ctx_addr
|
||||
\\
|
||||
\\ # Run Child Hook
|
||||
\\ # Argument 1 (rdi): Pointer to SavedContext (which is current rsp)
|
||||
\\ mov %rsp, %rdi
|
||||
\\ call postCloneChild
|
||||
\\
|
||||
\\ # Restore Context
|
||||
\\ add $8, %rsp # Skip padding
|
||||
\\ popfq
|
||||
\\ pop %rax
|
||||
\\ pop %rbx
|
||||
\\ pop %rcx
|
||||
\\ pop %rdx
|
||||
\\ pop %rsi
|
||||
\\ pop %rdi
|
||||
\\ pop %rbp
|
||||
\\ pop %r8
|
||||
\\ pop %r9
|
||||
\\ pop %r10
|
||||
\\ pop %r11
|
||||
\\ pop %r12
|
||||
\\ pop %r13
|
||||
\\ pop %r14
|
||||
\\ pop %r15
|
||||
\\
|
||||
\\ # %rsp now points to `return_address` so we can just return.
|
||||
\\ ret
|
||||
\\
|
||||
\\ 1:
|
||||
\\ # --- PARENT PATH ---
|
||||
: [ret] "={rax}" (-> usize),
|
||||
: [number] "{rax}" (ctx.rax),
|
||||
[arg1] "{rdi}" (new_rdi),
|
||||
[arg2] "{rsi}" (new_rsi),
|
||||
[arg3] "{rdx}" (ctx.rdx),
|
||||
[arg4] "{r10}" (ctx.r10),
|
||||
[arg5] "{r8}" (ctx.r8),
|
||||
[arg6] "{r9}" (ctx.r9),
|
||||
[child_hook] "i" (postCloneChild),
|
||||
: .{ .rcx = true, .r11 = true, .memory = true });
|
||||
|
||||
// Parent continues here
|
||||
ctx.rax = ret;
|
||||
postCloneParent(ctx);
|
||||
}
|
||||
|
||||
fn postCloneChild(regs: *UserRegs) void {
|
||||
_ = regs;
|
||||
std.debug.print("Child: post clone\n", .{});
|
||||
export fn postCloneChild(ctx: *SavedContext) callconv(.c) void {
|
||||
_ = ctx;
|
||||
}
|
||||
|
||||
fn postCloneParent(regs: *UserRegs) void {
|
||||
std.debug.print("Parent: post clone; Child PID: {}\n", .{regs.rax});
|
||||
fn postCloneParent(ctx: *SavedContext) void {
|
||||
_ = ctx;
|
||||
}
|
||||
|
||||
58
src/test/clone_no_new_stack.zig
Normal file
58
src/test/clone_no_new_stack.zig
Normal file
@@ -0,0 +1,58 @@
|
||||
const std = @import("std");
|
||||
const linux = std.os.linux;
|
||||
const clone = linux.CLONE;
|
||||
|
||||
pub fn main() !void {
|
||||
// SIGCHLD: Send signal to parent on exit (required for waitpid)
|
||||
const flags = clone.FILES | clone.FS | linux.SIG.CHLD;
|
||||
|
||||
const msg = "Child: Hello\n";
|
||||
const msg_len = msg.len;
|
||||
|
||||
// We use inline assembly to perform the clone syscall and handle the child path completely to
|
||||
// avoid the compiler generating code that relies on the parent's stack frame in the child
|
||||
// process (where the stack is empty).
|
||||
const ret = asm volatile (
|
||||
\\ syscall
|
||||
\\ test %%rax, %%rax
|
||||
\\ jnz 1f
|
||||
\\
|
||||
\\ # Child Path
|
||||
\\ # Write to stdout
|
||||
\\ mov $1, %%rdi # fd = 1 (stdout)
|
||||
\\ mov %[msg], %%rsi # buffer
|
||||
\\ mov %[len], %%rdx # length
|
||||
\\ mov $1, %%rax # SYS_write
|
||||
\\ syscall
|
||||
\\
|
||||
\\ # Exit
|
||||
\\ mov $0, %%rdi # code = 0
|
||||
\\ mov $60, %%rax # SYS_exit
|
||||
\\ syscall
|
||||
\\
|
||||
\\ 1:
|
||||
\\ # Parent Path continues
|
||||
: [ret] "={rax}" (-> usize),
|
||||
: [number] "{rax}" (@intFromEnum(linux.syscalls.X64.clone)),
|
||||
[arg1] "{rdi}" (flags),
|
||||
[arg2] "{rsi}" (0),
|
||||
[arg3] "{rdx}" (0),
|
||||
[arg4] "{r10}" (0),
|
||||
[arg5] "{r8}" (0),
|
||||
[msg] "r" (msg.ptr),
|
||||
[len] "r" (msg_len),
|
||||
: .{ .rcx = true, .r11 = true, .memory = true });
|
||||
|
||||
// Parent Process
|
||||
const child_pid: i32 = @intCast(ret);
|
||||
if (child_pid < 0) {
|
||||
_ = linux.syscall3(.write, 1, @intFromPtr("Parent: Clone failed\n"), 21);
|
||||
return;
|
||||
}
|
||||
|
||||
var status: u32 = 0;
|
||||
// wait4 for the child to exit
|
||||
_ = linux.syscall4(.wait4, @as(usize, @intCast(child_pid)), @intFromPtr(&status), 0, 0);
|
||||
|
||||
_ = linux.syscall3(.write, 1, @intFromPtr("Parent: Goodbye\n"), 16);
|
||||
}
|
||||
@@ -34,9 +34,6 @@ pub fn main() !void {
|
||||
\\ mov $60, %%rax # SYS_exit
|
||||
\\ syscall
|
||||
\\
|
||||
\\ # Should not be reached
|
||||
\\ ud2
|
||||
\\
|
||||
\\ 1:
|
||||
\\ # Parent Path continues
|
||||
: [ret] "={rax}" (-> usize),
|
||||
@@ -51,9 +48,12 @@ pub fn main() !void {
|
||||
: .{ .rcx = true, .r11 = true, .memory = true });
|
||||
|
||||
// Parent Process
|
||||
const child_pid: i32 = @intCast(ret);
|
||||
const child_pid: i64 = @bitCast(ret);
|
||||
if (child_pid < 0) {
|
||||
_ = linux.syscall3(.write, 1, @intFromPtr("Parent: Clone failed\n"), 21);
|
||||
std.debug.print(
|
||||
"Parent: Clone failed with: {}\n",
|
||||
.{@as(linux.E, @enumFromInt(-child_pid))},
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
|
||||
23
src/test/fork.zig
Normal file
23
src/test/fork.zig
Normal file
@@ -0,0 +1,23 @@
|
||||
const std = @import("std");
|
||||
const linux = std.os.linux;
|
||||
|
||||
pub fn main() !void {
|
||||
const ret = linux.syscall0(.fork);
|
||||
const pid: i32 = @intCast(ret);
|
||||
|
||||
if (pid == 0) {
|
||||
// --- Child ---
|
||||
const msg = "Child: I'm alive!\n";
|
||||
_ = linux.syscall3(.write, 1, @intFromPtr(msg.ptr), msg.len);
|
||||
linux.exit(0);
|
||||
} else if (pid > 0) {
|
||||
// --- Parent ---
|
||||
var status: u32 = 0;
|
||||
_ = linux.syscall4(.wait4, @intCast(pid), @intFromPtr(&status), 0, 0);
|
||||
const msg = "Parent: Child died.\n";
|
||||
_ = linux.syscall3(.write, 1, @intFromPtr(msg.ptr), msg.len);
|
||||
} else {
|
||||
const msg = "Fork failed!\n";
|
||||
_ = linux.syscall3(.write, 1, @intFromPtr(msg.ptr), msg.len);
|
||||
}
|
||||
}
|
||||
35
src/test/signal_handler.zig
Normal file
35
src/test/signal_handler.zig
Normal file
@@ -0,0 +1,35 @@
|
||||
const std = @import("std");
|
||||
const linux = std.os.linux;
|
||||
|
||||
var handled = false;
|
||||
|
||||
fn handler(sig: i32, _: *const linux.siginfo_t, _: ?*anyopaque) callconv(.c) void {
|
||||
if (sig == linux.SIG.USR1) {
|
||||
handled = true;
|
||||
const msg = "In signal handler\n";
|
||||
_ = linux.syscall3(.write, 1, @intFromPtr(msg.ptr), msg.len);
|
||||
}
|
||||
}
|
||||
|
||||
pub fn main() !void {
|
||||
const act = linux.Sigaction{
|
||||
.handler = .{ .sigaction = handler },
|
||||
.mask = std.mem.zeroes(linux.sigset_t),
|
||||
.flags = linux.SA.SIGINFO | linux.SA.RESTART,
|
||||
};
|
||||
|
||||
if (linux.sigaction(linux.SIG.USR1, &act, null) != 0) {
|
||||
return error.SigactionFailed;
|
||||
}
|
||||
|
||||
_ = linux.kill(linux.getpid(), linux.SIG.USR1);
|
||||
|
||||
if (handled) {
|
||||
const msg = "Signal handled successfully\n";
|
||||
_ = linux.syscall3(.write, 1, @intFromPtr(msg.ptr), msg.len);
|
||||
} else {
|
||||
const msg = "Signal NOT handled\n";
|
||||
_ = linux.syscall3(.write, 1, @intFromPtr(msg.ptr), msg.len);
|
||||
std.process.exit(1);
|
||||
}
|
||||
}
|
||||
8
src/test/vdso_clock.zig
Normal file
8
src/test/vdso_clock.zig
Normal file
@@ -0,0 +1,8 @@
|
||||
const std = @import("std");
|
||||
|
||||
pub fn main() !void {
|
||||
_ = try std.posix.clock_gettime(std.posix.CLOCK.MONOTONIC);
|
||||
|
||||
const msg = "Time gotten\n";
|
||||
_ = try std.posix.write(1, msg);
|
||||
}
|
||||
Reference in New Issue
Block a user