Files
flicker/docs/use_cases.md

6.1 KiB

Use Cases for Flicker

Flicker's architecture, load-time binary rewriting without control-flow recovery, uniquely positions it to handle scenarios where source code is unavailable (legacy/commercial software) and performance is critical. Unlike Dynamic Binary Translation (DBT) tools like Valgrind or QEMU, which incur high overhead due to JIT compilation/emulation, Flicker patches code to run natively.

Below are possible use cases categorized by domain.

High Performance Computing (HPC) & Optimization

Approximate Computing and Mixed-Precision Analysis

Scientific simulations often default to double precision (64-bit) for safety, even when single (32-bit) or half (16-bit) precision would yield accurate results with significantly higher performance. But rewriting massive legacy Fortran/C++ codebases to test precision sensitivity is impractical.

Flicker could instrument floating-point instructions to perform "Shadow Execution," running operations in both double and single precision to log divergence. Alternatively, it can mask lower bits of registers to simulate low-precision hardware.

Unlike compiler-based approaches that change the whole binary, Flicker can apply these patches selectively to specific "hot" functions at load-time, preserving accuracy in sensitive setup/solver phases while optimizing the bulk computation.

Profiling Memory Access Patterns (False Sharing)

In multi-threaded HPC applications, performance often degrades due to "False Sharing", where multiple threads modify independent variables that happen to reside on the same CPU cache line, causing cache thrashing.

Sampling profilers (like perf) provide statistical approximations but often miss precise interaction timings. Source-level instrumentation disrupts compiler optimizations.

Flicker could instrument memory store instructions (MOV etc.) to record effective addresses. By aggregating this data, it can generate heatmaps of cache line access density, precisely identifying false sharing or inefficient strided access patterns in optimized binaries.

Low-Overhead I/O Tracing

Parallel MPI jobs often inadvertently stress parallel filesystems (Lustre, GPFS) by performing excessive small writes or metadata operations.

Tools like strace force a context switch for every syscall, slowing down the application so much that the race conditions or I/O storms disappear (Heisenbugs).

By intercepting I/O syscalls (write, read, open, ...) inside the process memory, Flicker could aggregate I/O statistics (e.g., "Rank 7 performed 50,000 writes of 4 bytes") with negligible overhead, providing a lightweight alternative to strace for high-throughput jobs.

MPI Communication Profiling

HPC performance is often bound by network latency between nodes. Profiling tools like Vampir are heavy and costly. Flicker can patch shared library exports (like MPI_Send or MPI_Recv) at load-time. This allows lightweight logging of message sizes and latencies without recompiling the application or linking against special profiling libraries.

Security and Hardening

Coverage-Guided Fuzzing (Closed Source)

Fuzzing requires feedback on which code paths are executed to be effective. But for closed-source software, researchers typically use QEMU-mode in AFL. QEMU translates instructions dynamically, resulting in slow execution speeds (often 2-10x slower than native).

Flicker could inject coverage instrumentation (updating a shared memory bitmap on branch targets) directly into the binary at load time. This would allow closed-source binaries to be fuzzed at near-native speeds, significantly increasing the number of test cases run per second.

Software Shadow Stacks

Return-Oriented Programming (ROP) attacks exploit buffer overflows to overwrite return addresses on the stack.

Hardware enforcement (Intel CET/AMD Shadow Stack) requires modern CPUs (Intel 11th Gen+, Zen 3+) and recent kernels (Linux 6.6+). Older systems remain vulnerable.

Flicker could instrument CALL and RET instructions to implement a Software Shadow Stack. On CALL, the return address is pushed to a secure, isolated stack region. On RET, the address on the stack is compared against the shadow stack. If they mismatch, the program terminates, preventing ROP chains.

Binary-Only Address Sanitizer (ASan)

Memory safety errors (buffer overflows, use-after-free) in C/C++ are often found with ASan or Valgrind. ASan requires recompilation. Valgrind works on binaries but slows execution by 20x-50x, making it unusable for large datasets.

Flicker could intercept allocator calls (malloc/free) to poison "red zones" around memory and instrument memory access instructions to check these zones. This provides ASan-like capabilities for legacy binaries with significantly lower overhead than Valgrind.

Systems and Maintenance

Hardware Feature Emulation (Forward Compatibility)

HPC clusters are often heterogeneous, with older nodes lacking newer instruction sets (e.g., AVX-512, AMX). A binary compiled for a newer architecture will crash with SIGILL on an older node.

Flicker could detect these instructions and patch them to jump to a software emulation routine or a scalar fallback implementation. This allows binaries optimized for the latest hardware to run (albeit slower) on legacy nodes for testing or resource-filling purposes.

Fault Injection

To certify software for mission-critical environments, developers must verify how it handles hardware errors.

Flicker could instrument instructions to probabilistically flip bits in registers or memory ("Bit-flip injection"), or intercept syscalls to return error codes (e.g., returning ENOSPC on write). It can also simulate malfunctioning or intermittent devices by corrupting buffers returned by read. This allows testing error recovery paths without physical hardware damage.

Record/Replay Engine

Debugging non-deterministic bugs (race conditions) is difficult because they are hard to reproduce. By intercepting all sources of non-determinism (syscalls, rdtsc, atomic instructions, signals), Flicker could record a trace of an execution. This trace can be replayed later to force the exact same execution path, allowing developers to debug the error state interactively.