Friday, May 22, 2026

fapolicyd 1.5: Correctness First, Better Reports Second

   

The next fapolicyd release has a lot of internal work in it. This release started with a simple objective:

  • No silent visibility loss.
  • No silent policy replacement failure.
  • No hidden default-allow behavior.

fapolicyd is an application allowlisting daemon. It receives file access events from the kernel through fanotify, evaluates policy, and replies to the kernel. When something goes wrong in that path, the daemon should not quietly lose sight of events, quietly replace policy with a bad ruleset, or quietly allow executable content because policy had no opinion.

That is the first narrative of this release: fail-safe and correctness work. The second narrative is reporting. Once the important failure modes are made explicit, they need to show up in the right place. That led to separating status, metrics, and timing into different reports.

 

fapolicyd 1.5 makes important failure modes explicit and then reports them in the right place


Fail-safe and correctness work

One of the most important changes is the transactional rule reload. Previously, a reload could destroy the active rules before the replacement ruleset had fully loaded. If parsing failed after the old policy was gone, the daemon could be left with partial or empty policy. Since fapolicyd historically allows access when no rule has an opinion, this could become a fail-open condition.

The new approach is to build and validate the candidate ruleset separately. It owns the rules, attribute sets, syslog fields, proc-status mask, rule count, and rule identity. Only after the candidate is complete does it replace the active ruleset. A failed reload preserves the previous published policy.

That also means reload failures become useful operational data. The daemon can count rule reload failures and report them as failure-action metrics. Today those counters are observe-mode diagnostics. Later, high-security deployments can use the same named failure classes to decide whether a stronger action is needed.

The parser side was tightened for the same reason. If a subject or object assignment fails while parsing a rule, that must be a parse failure. It cannot be ignored. A malformed multi-attribute rule should not accidentally become a broader rule because one invalid attribute was skipped while another valid attribute remained.

The fanotify path also got attention. FAN_Q_OVERFLOW is the kernel telling us that fanotify events were lost. This is not the same as a denied access. It is visibility loss. If the kernel queue overflows and the daemon does not make that visible, an administrator has no way to know that enforcement degraded.

So fapolicyd now detects kernel queue overflow, counts it, logs it with rate limiting, and exposes it through reporting. This counter is not just for this release. It is groundwork for later failure policy. If a future configuration wants to treat kernel event loss as a high-severity condition, the daemon has a specific event class to build on.

The daemon also now counts failed or short fanotify response writes. Every permission event needs exactly one response back to the kernel. If the response write fails, that is a correctness issue. Counting it gives us another named failure class that can be monitored now and acted on more strongly later.

The no-opinion allow path was another area that needed guardrails. Fapolicyd does not try to deny every ordinary document open. Compatibility requires a default allow when policy has no opinion. But executable and programmatic content should not accidentally fall through because of incomplete rules, parser mistakes, missing file type classification, or internal errors.

The release adds counters that distinguish "Allowed by rule" from "Allowed by fallthrough". That keeps the compatibility behavior, but it stops hiding it. If a system has default-allow execute activity, the administrator can now see it and investigate.

This is also where the new rule linter fits.


$ fapolicyd-cli --check-rules --lint fapolicyd/src/tests/fixtures/rules-valid.rules
file is valid (3 rules)
Policy lint warning: executable events can fall through; no terminal broad execute deny found
Policy lint warning: %languages is not defined; programmatic ftype coverage
cannot be checked
$ fapolicyd-cli --check-rules --lint --verbose Rules file is valid (15 rules) Policy lint found no warnings

The linter is intentionally modest. It is not a formal proof of the policy. It looks for policy shapes that are easy to get wrong: executable events that can fall through because there is no broad terminal execute deny, missing "%languages" coverage for programmatic opens, and broad open allows that can shadow programmatic-content denies.

That matters because fapolicyd policies are ordered. A rule that looks harmless in isolation can change meaning when it appears before another rule. The linter gives administrators a cheap check before loading policy. It also ties back to the default-allow metrics: if the linter warns about a default-allow gap, the metrics report can show whether the running system is actually using that gap.

If you use rules.d, fagenrules now validates the compiled rules file before installing it. That closes a practical hole. The assembled policy should be checked before it is written as the policy the daemon will later load.

Subject deferral

The subject cache work deserves a closer look because it fixes a subtle correctness problem.

fapolicyd tracks process startup state. During exec, it may see the executable, the runtime linker, libraries, interpreter activity, and other file opens that belong to the same subject. While that sequence is still being understood, the subject can be in a BUILDING state.

The subject cache is indexed by a slot. More than one process can hash to the same slot over time. Under heavy fork/exec pressure, a new process can arrive at a slot that is already occupied by a different process that is still BUILDING. If fapolicyd evicts the BUILDING subject too early, it loses the startup context for the process already underway. When that original process comes back with another access, fapolicyd may see a later part of startup without the earlier context. That can lead to false pattern decisions such as bad "ld_so" detection.

One idea is to collect ejected slots and try to restore them later. That sounds appealing, but it is a losing proposition.

New processes keep coming. There is no guarantee that the process we ejected will ever make another access. It might exit. It might segfault. It might be stopped by a tracer. If it does come back, there is no clean way to find the saved subject once the same slot has been used by another access request. You also have to decide who owns the saved file descriptors and how long to keep blocked permission events around. That turns the cache into a second, unbounded, hard-to-reason-about subject store.

The implemented design takes the opposite approach. Do not eject the old BUILDING subject just because a new subject arrived. Store the new access request in a bounded deferral array. This puts back pressure on the new access while the subject already underway finishes and gets out of the way.


Subject deferral applies backpressure to the new access instead of evicting an in-progress subject.

The array is fixed size. That is important. A fanotify permission event can hold a task waiting for a decision, and it owns a file descriptor until the daemon replies or closes it. Deferral is acceptable because it is bounded and observable. If the array fills, fapolicyd falls back to the historical eviction behavior and counts that fallback.

There are two hard cases: traced subjects and stale subjects.

If a process is being traced, it may be stopped indefinitely. We do not know whether it will continue. If a traced process holds a BUILDING slot and another request needs that slot, waiting forever would be a denial-of-service bug. A malicious actor could try to fill slots with stopped BUILDING subjects and deadlock the system. The daemon now detects traced BUILDING subjects and can eject them when their slot is needed.

There is also a timeout. Even without tracing, a BUILDING subject that has not made progress can become stale. As a last resort, stale BUILDING slots are garbage collected so the system does not wait forever on a subject that may never complete startup.

The relevant counters are:

  • Subject deferred events
  • Subject defer max depth
  • Subject defer fallbacks
  • Early subject cache evictions
  • Subject BUILDING tracer evictions
  • Subject BUILDING stale evictions
  • Subject defer oldest age

These counters tell us whether the theory is working. A busy fork/exec workload may increase "Subject deferred events". That is expected. "Subject defer fallbacks" should not steadily climb during normal operation. Tracer and stale evictions should be rare and worth looking at.

The stress and timing article later in this series will show how to create subject-cache pressure deliberately and how to read these counters during a test.

Reporting philosophy

The second narrative is reporting. Before this work, the status report had become a mixed report. It included health, configuration, counters, cache statistics, and other runtime details. That makes it harder to know what a field means. Is it a current state value? A lifetime counter? A counter that should be reset? A health indicator?

The reports now have cleaner jobs.

"fapolicyd-cli --check-status" asks whether the daemon is healthy and configured as expected.


# fapolicyd-cli --check-status
Operating mode:
Permissive: false
Integrity: sha256
reset_strategy: manual
Timing collection mode: manual
Timing collection armed: false
Timing collection last start time: never
Timing collection last stop time: never
Ruleset generation: 1
Headline activity: Allowed accesses: 8762 Denied accesses: 0
Resource configuration: CPU cores: 32 q_size: 800 Subject defer array size: 256 Subject cache size: 4099 Object cache size: 16381 Trust database max pages: 18944
Resource utilization: Trust database pages in use: 14284 (75%) Subject slots in use: 207 (5%) Object slots in use: 3334 (20%) glibc arena (total memory) is: 25560 KiB, was: 1188 KiB glibc uordblks (in use memory) is: 8348 KiB, was: 960 KiB glibc fordblks (total free space) is: 17211 KiB, was: 227 KiB
Health indicators: Kernel queue overflow: 0 Filesystem errors: 0 Filesystem error last status: none Filesystem error last seen: never Reply errors: 0 Subject defer fallbacks: 0 Early subject cache evictions: 0 Subject BUILDING tracer evictions: 0 Subject BUILDING stale evictions: 0 Subject defer oldest age: 0ns Failure action queue_full (observe): 0 Failure action kernel_queue_overflow (observe): 0 Failure action worker_stall (observe): 0 Failure action rule_reload_failure (observe): 0 Failure action trust_reload_failure (observe): 0 Failure action response_write_failure (observe): 0 Failure action fanotify_filesystem_error (observe): 0
Watched mounts: watching mount: / watching mount: /dev/shm watching mount: /run watching mount: /run/credentials/systemd-journald.service watching mount: /tmp ...

Status includes operating mode, integrity mode, reset_strategy, timing collection state, ruleset generation, headline allow and deny activity, resource configuration, resource utilization, health indicators, failure action counters, and watched mounts.

The important thing is that status is where you look for current condition. Is the daemon permissive? What ruleset generation is loaded? Are there kernel queue overflows? Are reply errors non-zero? Is there an old deferred subject? Are the watched mounts what you expect?

"fapolicyd-cli --check-metrics" asks what happened in the current counter window. Metrics includes decision outcome counters, Allowed by rule, Allowed by fallthrough, queue and deferral activity, subject and object cache effectiveness, rule hit counts, and subject/object attribute lookup counters.

Some information appears in both reports on purpose. Allowed and denied accesses are useful headline activity in status, but they are also counters in metrics. Early subject evictions are health signals in status, but they also describe the current metrics window. The difference is the question being asked. Status asks whether something needs attention. Metrics asks what moved during the measured interval.

The old cache and counter information did not disappear. It moved to the metrics report where it belongs. If you previously looked in status for cache hits, misses, evictions, rule hits, or detailed runtime counters, look in "--check-metrics".

 Metrics can also be reset in controlled ways. The default reset_strategy is never, which preserves lifetime counters. manual allows a privileged "fapolicyd-cli --reset-metrics" request to snapshot and reset runtime counters.  auto is for interval reports that should each describe only the activity since the previous interval.

Timing is separate from both status and metrics. Timing collection is disabled by default because it adds measurement work to the decision path. When enabled with "timing_collection=manual", an administrator can start and stop a bounded timing window:

fapolicyd-cli --timer-start

fapolicyd-cli --timer-stop

The daemon then writes to "/run/fapolicyd/fapolicyd.timing". This report answers where time went during that specific run: queue wait, event build, rule evaluation, MIME detection, hashing, trust database lookup, logging, audit metadata, and the fanotify response write.

I will cover metrics in the next article and timing plus the stress harness in the one after that. Those deserve separate treatment. This article is only the overview: the release first made important correctness failures explicit, then gave administrators better places to look for the evidence.

The upstream project is here: https://github.com/linux-application-whitelisting/fapolicyd.

If you are new to fapolicyd itself, the Red Hat documentation on blocking and allowing applications with fapolicyd is a useful starting point.

Wednesday, February 4, 2026

Introducing cap-audit

Applications often run as root because figuring out which Linux capabilities they actually need is difficult. You might know your web server needs to bind to port 80, which requires CAP_NET_BIND_SERVICE. But what about that database daemon? Or that monitoring agent? Trial and error gets old fast, and running everything as root is a big risk.

The libcap-ng project now includes cap-audit, a tool that traces applications to determine exactly which capabilities they require. Unlike static analysis tools that guess based on which syscalls appear in the binary, cap-audit hooks into the kernel's actual capability checking functions. When the kernel asks "does this process have CAP_NET_RAW?", cap-audit records it. This is ground truth - not guesswork.

How It Works

Cap-audit uses eBPF to hook the kernel's capability checking functions - cap_capable(), ns_capable(), and their variants. These are the functions the kernel calls every time it needs to verify a capability. When you trace an application, cap-audit forks your target program, registers its PID with the eBPF program, and then watches for kernel events.

The eBPF program filters events by PID right at the kprobe entry point. This is critical. Without filtering, it would capture thousands of capability checks per second from every process on the system. By filtering for just the target application and its children, overhead drops to less than 1% for the traced app and effectively zero for everything else.

Each capability check generates an event that includes which capability was checked, whether it was granted or denied, which syscall triggered it, and a user-space stack trace. These events stream through a ring buffer to the userspace program, which aggregates them and generates a report showing exactly what your application needs.

 


Why System Context Matters

Cap-audit doesn't just tell you which capabilities were checked - it reads various sysctls to understand when those capabilities are actually required. For example, consider /proc/sys/kernel/yama/ptrace_scope. If it's set to 0, any process can ptrace any other process it could normally signal. But if it's 1 or higher, you need CAP_SYS_PTRACE. Same binary, different capability requirements depending on system configuration.

The tool gathers kernel.perf_event_paranoid, kernel.unprivileged_bpf_disabled, kernel.kptr_restrict, and several others. These aren't just informational - they directly affect which capabilities your application needs to run. A monitoring tool that reads /proc/kallsyms needs CAP_SYSLOG when kernel.kptr_restrict is 1, but not when it's 0. Cap-audit shows you both the capabilities that were actually checked and the system settings that made them necessary. 

This means the capability requirements cap-audit reports are specific to your kernel configuration. If you're deploying to containers or hardened systems with different sysctl values, you might need different capabilities. The report includes the system context so you can make informed decisions.

You Must Exercise All Code Paths 

Here's something important to understand: cap-audit traces what your application actually does, not what it could theoretically do. If your application can set the system clock which requires CAP_SYS_TIME, but you only trace it handling normal requests, you won't see that capability requirement.

Think of it like code coverage in testing. If you don't exercise a code path during tracing, its capability requirements won't appear in the report. For daemons, this means you need to trigger all administrative operations, error handling paths, and edge cases. For CLI tools, you need to use all major features and options.

Let's see it in action:

DEMO: Tracing a daemon
======================================================================
CAPABILITY ANALYSIS FOR: /usr/sbin/irqbalance (PID 8189)
======================================================================

SYSTEM CONTEXT:
----------------------------------------------------------------------
Kernel version: 6.18.7-100.fc42.x86_64
kernel.yama.ptrace_scope: 1
kernel.kptr_restrict: 1
kernel.dmesg_restrict: 1
kernel.modules_disabled: 0
kernel.perf_event_paranoid: 2
kernel.unprivileged_bpf_disabled: 2
net.core.bpf_jit_enable: 1
net.core.bpf_jit_harden: 1
net.core.bpf_jit_kallsyms: 1
vm.mmap_min_addr: 65536
fs.protected_hardlinks: 1
fs.protected_symlinks: 1
fs.suid_dumpable: 2

REQUIRED CAPABILITIES:
----------------------------------------------------------------------
setpcap (#8)
Checks: 43 granted, 0 denied
Reason: Used by prctl (syscall 157)

sys_admin (#21)
Checks: 34 granted, 0 denied
Reason: Used by clone (syscall 56)

CONDITIONAL CAPABILITIES:
----------------------------------------------------------------------
None

ATTEMPTED BUT DENIED:
----------------------------------------------------------------------
None

SUMMARY:
----------------------------------------------------------------------
Total capability checks: 77
Required capabilities: 2
Conditional capabilities: 0
Denied operations: 0

RECOMMENDATIONS:
----------------------------------------------------------------------
Programmatic solution (C with libcap-ng):
#include <cap-ng.h>
...
capng_clear(CAPNG_SELECT_BOTH);
capng_updatev(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, SETPCAP, SYS_ADMIN, -1);
if (capng_change_id(uid, gid, CAPNG_DROP_SUPP_GRP | CAPNG_CLEAR_BOUNDING))
perror("capng_change_id");

For systemd service:
[Service]
User=<non-root-user>
Group=<non-root-group>
AmbientCapabilities=setpcap sys_admin
CapabilityBoundingSet=setpcap sys_admin

For file capabilities (via filecap):
filecap /path/to/binary setpcap sys_admin

For Docker/Podman:
docker run --user $(id -u):$(id -g) \
--cap-drop=ALL \
--cap-add=setpcap \
--cap-add=sys_admin \
your-image:tag

For Kubernetes:
securityContext:
runAsUser: 1000
runAsGroup: 1000
capabilities:
drop:
- ALL
add:
- setpcap
- sys_admin

For applications that use file-based capabilities (like /usr/bin/arping with cap_net_raw+ep), cap-audit sees when those capabilities are actually exercised. A binary might have five capabilities set on the file, but only uses three during normal operation. Cap-audit shows you what's actually needed.

DEMO: File capabilities
======================================================================
CAPABILITY ANALYSIS FOR: /usr/bin/arping (PID 7940)
======================================================================

SYSTEM CONTEXT:
----------------------------------------------------------------------
Kernel version: 6.18.7-100.fc42.x86_64
kernel.yama.ptrace_scope: 1
kernel.kptr_restrict: 1
kernel.dmesg_restrict: 1
kernel.modules_disabled: 0
kernel.perf_event_paranoid: 2
kernel.unprivileged_bpf_disabled: 2
net.core.bpf_jit_enable: 1
net.core.bpf_jit_harden: 1
net.core.bpf_jit_kallsyms: 1
vm.mmap_min_addr: 65536
fs.protected_hardlinks: 1
fs.protected_symlinks: 1
fs.suid_dumpable: 2

REQUIRED CAPABILITIES:
----------------------------------------------------------------------
setpcap (#8)
Checks: 1 granted, 5 denied
Reason: Used by capset (syscall 126)

net_raw (#13)
Checks: 1 granted, 0 denied
Reason: Used by socket (syscall 41)

CONDITIONAL CAPABILITIES:
----------------------------------------------------------------------
None

ATTEMPTED BUT DENIED:
----------------------------------------------------------------------
setuid (#7)
Attempts: 1 (all denied)
Impact: Application may have reduced functionality

SUMMARY:
----------------------------------------------------------------------
Total capability checks: 8
Required capabilities: 2
Conditional capabilities: 0
Denied operations: 1

RECOMMENDATIONS:
----------------------------------------------------------------------
Programmatic solution (C with libcap-ng):
#include <cap-ng.h>
...
capng_clear(CAPNG_SELECT_BOTH);
capng_updatev(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, SETPCAP, NET_RAW, -1);
if (capng_change_id(uid, gid, CAPNG_DROP_SUPP_GRP | CAPNG_CLEAR_BOUNDING))
perror("capng_change_id");

For systemd service:
[Service]
User=<non-root-user>
Group=<non-root-group>
AmbientCapabilities=setpcap net_raw
CapabilityBoundingSet=setpcap net_raw

For file capabilities (via filecap):
filecap /path/to/binary setpcap net_raw

For Docker/Podman:
docker run --user $(id -u):$(id -g) \
--cap-drop=ALL \
--cap-add=setpcap \
--cap-add=net_raw \
your-image:tag

For Kubernetes:
securityContext:
runAsUser: 1000
runAsGroup: 1000
capabilities:
drop:
- ALL
add:
- setpcap
- net_raw

Reading the Report

The report breaks down into several sections. Required Capabilities shows capabilities that were successfully checked - these are capabilities your application actively used and will need to function. Each entry includes how many times it was checked and which syscall triggered it. "CAP_NET_BIND_SERVICE: Used by bind (syscall 49)" tells you exactly what's going on.

Conditional Capabilities shows requirements that depend on system configuration. You'll see entries like "CAP_SYS_PTRACE: Needed when kernel.yama.ptrace_scope > 0, Current value: 1 (capability needed)". This tells you the capability is required on your current system, but might not be on systems with different sysctl values.

Attempted But Denied shows capability checks that failed. These are interesting because they reveal functionality your application tried to use but couldn't. Sometimes this is fine - the application has a fallback path. Other times, it indicates reduced functionality. The report notes "Application may have reduced functionality" so you can investigate.

The recommendations section generates ready-to-use configurations for systemd, Docker, Kubernetes, and file-based capabilities. For systemd, you get AmbientCapabilities and CapabilityBoundingSet directives. For Docker, you get --cap-drop=ALL followed by specific --cap-add entries. For file capabilities, you get the filecap command with the exact capability set. These aren't just suggestions - they're the minimal set your application demonstrated it needs. It also produces a snipit of C or python code to show how to programmatically solve this in the program.

Ground Truth, Not Guesswork

The key insight is that cap-audit hooks the actual kernel capability checking functions. When mount() checks CAP_SYS_ADMIN, cap-audit sees it. When bind() checks CAP_NET_BIND_SERVICE, cap-audit sees it. There's no parsing of source code, no heuristics based on syscall names, no guessing. The kernel's security subsystem itself is telling you what capabilities are being checked.

This is why the tool requires CAP_BPF and CAP_PERFMON to run - it's instrumenting kernel internals. But once set up, it gives you authoritative answers about capability requirements. If cap-audit says your application needs three capabilities, those are the three it checked during your trace. If it says your application doesn't need elevated capabilities at all, you can confidently run it as an unprivileged user.

I should note that both of the example traces above show programs that are not doing capabilities right. There is a '-v' command line option for verbose output. It shows that even though arping is file system based capabilities, it is still calling capset when it shouldn't. Irqbalance, constantly tries setting capabilities when it should do it once and done. We'll dive into these in a future blog.

Run your applications through cap-audit during development or security audits. Exercise all functionality, check the system context, and use the generated configurations to properly scope your capabilities. File issues on github if you have a request or find something is off. 

Thursday, September 16, 2021

Fuzzing annocheck with AFL++ Part 1

In the last article, we fuzzed annocheck with radamsa. This found several crashes. But annobin-9.94 has everything cleaned up that radamsa can find. We're not done fuzzing yet. We need to try a guided coverage fuzzer like AFL++ to see if there are more problems.

The strategy for using AFL++ is to build annocheck as a static application. We will start fuzzing it and make some adjustments based on how the initial runs go. Then we will let it run for several hours to see what it finds.

The first step is to download a copy of annobin-9.94 from here. Next install and build the source rpm. Then cd into the annobin-9.94 source directory. If you need to setup an rpm build environment, there are steps here

Next, we build annocheck as follows:

export PATH=/home/builder/working/BUILD/repos/git-repos/AFLplusplus:/home/builder/working/BUILD/repos/git-repos/llvm-project/build/bin:$PATH
export AFL_PATH=/home/builder/working/BUILD/repos/git-repos/AFLplusplus
CC=afl-clang-lto CXX=afl-clang-lto++ RANLIB=llvm-ranlib AR=llvm-ar LD=afl-clang-lto++ NM="llvm-nm -g" ./configure --without-gcc_plugin --without-tests --without-docs --enable-static --with-gcc-plugin-dir=`gcc -print-file-name=plugin`

The configure script errors out like this:

configure: creating ./config.lt
config.lt: creating libtool
checking for GCC plugin headers... no
configure: error: GCC plugin headers not found; consider installing GCC plugin development package

After much digging around, I decided that since we are making a static app and not worried about the gcc plugins, we'll just fix the tests to think everything is OK. Apply the following patch:

diff -urp annobin-9.94.orig/config/gcc-plugin.m4 annobin-9.94/config/gcc-plugin.m4
--- annobin-9.94.orig/config/gcc-plugin.m4      2021-08-31 10:02:12.000000000 -0400
+++ annobin-9.94/config/gcc-plugin.m4   2021-09-15 16:31:49.946843182 -0400
@@ -129,7 +129,7 @@ int main () {}
 [gcc_plugin_headers=yes],
 [gcc_plugin_headers=no])

-if test x"$gcc_plugin_headers" = xyes; then
+if test x"$gcc_plugin_headers" = xyes -o x"$static_plugin" = xyes; then
   AC_MSG_RESULT([yes])
 else
   AC_MSG_RESULT([no])

Since we modified the m4 scripts, we need to regenerate the configure script. I used the autogen.sh file from the audit-userspace project as a convenience. Download it and do the following:

./autogen.sh
CC=afl-clang-lto CXX=afl-clang-lto++ RANLIB=llvm-ranlib AR=llvm-ar LD=afl-clang-lto++ NM="llvm-nm -g" ./configure --without-gcc_plugin --without-tests --without-docs --enable-static --with-gcc-plugin-dir=`gcc -print-file-name=plugin`

Now configure finishes correctly. We are now ready to build it. The first step is to export the instrument variable and the run make:

export AFL_LLVM_INSTRUMENT=NATIVE
make

This fails as follows:

afl-cc ++3.14c by Michal Zalewski, Laszlo Szekeres, Marc Heuse - mode: LLVM-LTO-PCGUARD
error: unable to load plugin 'annobin': 'annobin: cannot open shared object file: No such file or directory'
make[1]: *** [Makefile:442: annocheck-annocheck.o] Error 1

After much digging around, I found that the cause of this failure is a plugin option in the annocheck_CFLAGS. Open annocheck/Makefile and go down to line 338 and remove -fplugin=annocheck. Now run make again. Now it compiles annocheck.

Time to setup for fuzzing. We will use the same program that we created to fuzz using radamsa. Go check the last article if you need the recipe. Do the following:

cd annocheck
mkdir in
cp ~/test/hello in
mkdir -p /tmp/proj/out
ln -s /tmp/proj/out out
# Then we switch to root and do some house keeping to let this
# run as fast as possible
su - root
service auditd stop
service systemd-journald stop
service abrtd stop
auditctl -D
auditctl -e 0
echo core >/proc/sys/kernel/core_pattern
cd /sys/devices/system/cpu
echo performance | tee cpu*/cpufreq/scaling_governor
exit

Now it's time to fuzz!

afl-fuzz -i in -o out ./annocheck @@

And we get the following:

There are plenty of guides that tell you what each of these items mean. Please search the internet and find one if you are curious. My main goal in this is to show you how to overcome many obstacles to get to the prize.

What I'd like to point out is 2 things. We are getting about 248 executions per second. That is kind of low. And we hit 2 crashes in 25 seconds. I hit Ctl-C to exit so that we can take an initial look at the crashes. The directory structure of the out directory looks like this:

$ tree out
out
└── default
    ├── cmdline
    ├── crashes
    │   ├── id:000000,sig:06,src:000000,time:17826,op:havoc,rep:4
    │   ├── id:000001,sig:11,src:000000,time:23980,op:havoc,rep:4
    │   └── README.txt
    ├── fuzz_bitmap
    ├── fuzzer_setup
    ├── fuzzer_stats

The actual test cases are in the crashes directory. So, what do we actually do with these? The answer is that we build another copy of annocheck using the address sanitizer and pass these to the sanitized annocheck to see what happens.

The last article explains how to build a sanitized copy of annocheck. Just open the tarball in a different location so that you don't overwrite the AFL++ build of annocheck and compile it. I built it in a dirrectory called annobin-9.94.test. You can do it anywhere, just correct the location.

Next, from the sanitized annocheck dir, we run:

$ ./annocheck /tmp/proj/out/default/crashes/id:000000,sig:06,src:000000,time:17826,op:havoc,rep:4
annocheck: Version 9.94.
Hardened: id:000000,sig:06,src:000000,time:17826,op:havoc,rep:4: FAIL: pie test because not built with '-Wl,-pie' (gcc/clang) or '-buildmode pie' (go)
Hardened: id:000000,sig:06,src:000000,time:17826,op:havoc,rep:4: FAIL: bind-now test because not linked with -Wl,-z,now
annocheck: annocheck.c:1250: find_symbol_in: Assertion `symndx == sym_hdr->sh_size / sym_hdr->sh_entsize' failed.
Aborted

Aborted? That stinks. Programs that call abort trick AFL++ into thinking this is a valid crash. This is because abort creates a core dump and that's exactly what AFL++ is looking for, The solution to this is that we need to override the abort call. The program depends on abort stopping execution. Removing it altogether means it will be running in code that the developer never intended. Since it's a false positive, we'll replace the call to abort with exit. So, let's go find it and recompile.

Grep'ing for abort doesn't find anything. But looking again at the error message says this occurs on line 1250 in annocheck.c. Opening that file finds an assert macro at that line. That means we need to override the assert macro with one that calls exit.

In annocheck/annocheck.h we find the include of assert.h around line 21. Comment that out using old style /*  */ syntax. Next, a little farther down after all the includes, put the new macro:

/* Added this so that we don't trap abort in AFL++ */
#define assert(e) ((void) ((e) ? 0 : __assert (#e, __FILE__, __LINE__)))
#define __assert(e, file, line) ((void)printf ("%s:%u: failed assertion `%s'\n", file, line, e), exit(1), 0)

Now, save, run make clean, and make. One other thing to note, annocheck makes temp directories to work with. When it aborts or crashes, it cannot clean those up. For now 

rm -rf annocheck.data.*

Back to fuzzing. We really do not need the previous runs. You can clear them out by 

rm -rf out/default

Actually, let's take a quick detour before going back to fuzzing. If you recall from above, I mentioned that the number of executions per second is kind of low. Let's fix that before we start fuzzing.

AFL++ has a mode of operation called deferred instrumentation mode. What this does is instead of starting an application from scratch, letting the runtime linker resolve all symbols, open and parse configuration files, and do general initialization...you can insert code that marks a spot where all of that has completed. AFL++ will fork new copies from that spot. The placement matters. You need to do it after general startup, and before it reads the file with the fuzzing data in it.

So, if we open annocheck.c and locate the main function at line 1943, let's scan down looking for the place to put it. It checks some arguments, checks elf version, loads rpm's configuration (that takes some time), and then processes the command line arguments. This looks like the spot. Copy and paste this at line 1967 just above processing command line args:

#ifdef __AFL_HAVE_MANUAL_CONTROL
  __AFL_INIT();
#endif

Let's make one more change. Annocheck creates a temporary directory while processing a file. Let's put that over in /tmp so as not to wear out a real hard drive. Look for a function named create_tmpdir. At line 1296 it copies a file name template into a buffer. Let's prepend "/tmp/" to that string. If you look a little further down, it calls concat and rewrites the tempdir variable with the current working directory as the path root. We don't want that. So, comment it out and the following line which outputs where the tempdir is. You have to use /*  */ commenting on this program.

Save and exit. Rerun make. In another terminal, run the following script. It will periodically clear out the annocheck.data temp directories that aren't removed on a crash.

 while  true ; do sleep 30 ; rm -rf /tmp/annocheck.data.* ; done

And in the other terminal, from the annocheck directory, run this:

afl-fuzz -i in -o out ./annocheck @@

Now let's look at the executions per second:

Now we are getting about 4.5 times the speed. Using deferred instrumentation mode makes a huge improvement. The reason we want it to run faster is that the more executions per second, the more test cases it can try in the same amount of time. On very mature programs, it can take days of fuzzing to find even one bug.

We'll stop here. You can let it run for a couple hours. You will probably have over a hundred "unique" crashes to choose from. In the next article, I'll go over how to sort through all that efficiently. If you do get a collection, remember to copy them to persistent disk storage. The /tmp dir goes away when you reboot the computer.

Wednesday, September 8, 2021

Fuzzing annocheck with Radamsa

Recently I heard that annocheck was crashing when scanning some files. That gave me an idea - fuzz it! I got the latest code in rawhide which at the time was 9.93, built it using rpmbuild, and cd'ed into it's source directory. Then:

make clean
CFLAGS="-fsanitize=address,undefined,null,return,object-size,nonnull-
attribute,returns-nonnull-attribute,bool,enum -ggdb -fno-sanitize-
recover=signed-integer-overflow" ./configure --without-tests --without-docs
make -j `nproc`

This sets it up for the address sanitizer so that we can spot fuzzing induced problems. Running make fails unexpectedly:

In function ‘follow_debuglink’,
    inlined from ‘annocheck_walk_dwarf’ at annocheck.c:1092:18:
annocheck.c:776:7: error: ‘%s’ directive argument is null [-Werror=format-
overflow=]
  776 |       einfo (VERBOSE2, "%s:  try: %s", data->filename, debugfile);      
\
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
annocheck.c:805:7: note: in expansion of macro ‘TRY_DEBUG’
  805 |       TRY_DEBUG ("%s", debug_file);
      |       ^~~~~~~~~
annocheck.c: In function ‘annocheck_walk_dwarf’:
annocheck.c:776:35: note: format string is defined here
  776 |       einfo (VERBOSE2, "%s:  try: %s", data->filename, debugfile);

This goes on an on. As I understand it, this is a bug in gcc and will be fixed in an upcoming release. However, what is stopping the build is the -Werror flag in the Makefile. So, you want to edit annocheck/Makefile and remove -Werror from the make file. Now running make will produce the binaries. Prior to working with annocheck, I wrote and compiled a little "Hello World" program in a test directory. In doing this, I left it unstripped. To prepare for fuzzing, I did this:

cd annocheck
mkdir in
cp ~/test/hello   in/test
mkdir -p /tmp/out
ln -s /tmp/out out

Then I used a script similar to the one discussed in the Fuzzing with Radamsa article from a couple days ago.

#!/bin/sh
LOG="in/test"
TLOG="out/test"

while true
do
        cat $LOG | radamsa > $TLOG
        ./annocheck $TLOG >/dev/null
        rc="$?"
        if [ "$rc" == "1" ] ; then
                exit 1
        fi
        rm -f $TLOG
        echo "==="
done

The basic idea is put a seed program into the "in" directory. Radamsa mutates it and writes it to the "out" directory, which is a symlink to /tmp. As explained in the previous article, you want to do fuzzing writes to a tmpfs file system so that you don't wear out real hardware. Running the script found this on the first test case:

hardened.c:1081:7: runtime error: null pointer passed as argument 1, which is
declared to never be null
AddressSanitizer:DEADLYSIGNAL
=================================================================
==48860==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc
0x7ffb239405bb bp 0x7fff6413a7d0 sp 0x7fff64139f60 T0)
==48860==The signal is caused by a READ memory access.
==48860==Hint: address points to the zero page.
    #0 0x7ffb239405bb in __interceptor_strcmp.part.0 (/lib64/libasan.so.
6+0x8d5bb)
    #1 0x42b9eb in interesting_sec /home/builder/working/BUILD/annobin-9.93/
annocheck/hardened.c:1081
    #2 0x42b9eb in interesting_sec /home/builder/working/BUILD/annobin-9.93/
annocheck/hardened.c:1074
    #3 0x40dd23 in run_checkers /home/builder/working/BUILD/annobin-9.93/
annocheck/annocheck.c:618
<snip>

I re-run the sanitizer it stops a lot on this error. But if you keep restarting it, eventually you may get this one:

=================================================================
==49841==ERROR: AddressSanitizer: heap-use-after-free on address
0x603000035d40 at pc 0x7fd46885170c bp 0x7ffcf350afc0 sp 0x7ffcf350a770
READ of size 1 at 0x603000035d40 thread T0
    #0 0x7fd46885170b in __interceptor_strcmp.part.0 (/lib64/libasan.so.
6+0x8d70b)
    #1 0x42637d in check_for_gaps /home/builder/working/BUILD/annobin-9.93/
annocheck/hardened.c:3709
    #2 0x42637d in finish /home/builder/working/BUILD/annobin-9.93/annocheck/
hardened.c:3889
    #3 0x42637d in finish /home/builder/working/BUILD/annobin-9.93/annocheck/
hardened.c:3844
    #4 0x40e3f6 in run_checkers /home/builder/working/BUILD/annobin-9.93/
annocheck/annocheck.c:691
    #5 0x40e3f6 in process_elf /home/builder/working/BUILD/annobin-9.93/
annocheck/annocheck.c:1517
    #6 0x40f690 in process_file /home/builder/working/BUILD/annobin-9.93/
annocheck/annocheck.c:1732
    #7 0x408880 in process_files /home/builder/working/BUILD/annobin-9.93/
annocheck/annocheck.c:1890
    #8 0x408880 in main /home/builder/working/BUILD/annobin-9.93/annocheck/
annocheck.c:1982
    #9 0x7fd467b36b74 in __libc_start_main (/lib64/libc.so.6+0x27b74)
    #10 0x409dad in _start (/home/builder/working/BUILD/annobin-9.93/
annocheck/annocheck+0x409dad)

0x603000035d40 is located 0 bytes inside of 25-byte region
[0x603000035d40,0x603000035d59)
freed by thread T0 here:
    #0 0x7fd468872647 in free (/lib64/libasan.so.6+0xae647)
    #1 0x412cfb in annocheck_get_symbol_name_and_type /home/builder/working/
BUILD/annobin-9.93/annocheck/annocheck.c:1463

previously allocated by thread T0 here:
    #0 0x7fd46881d967 in strdup (/lib64/libasan.so.6+0x59967)
    #1 0x412d25 in annocheck_get_symbol_name_and_type /home/builder/working/
BUILD/annobin-9.93/annocheck/annocheck.c:1466

SUMMARY: AddressSanitizer: heap-use-after-free (/lib64/libasan.so.6+0x8d70b)
in __interceptor_strcmp.part.0



This is a good one. Use after free can be exploitable under the right conditions. I reported these to the annobin developer. He replicated the results, fixed it up, and released 9.94 to Fedora the next day. I checked it and can confirm that Radamsa finds no other problems.

Note that if you actually wanted to fix the bug, the test case is in out/test. Just make the patch, recompile, and manually run the test case to confirm it's fixed. If you would like to preserve the test cases for later, remember that if you shut down the system they are gone unless you moved it from out to a more permanent location.

Radamsa is a good first fuzzer to reach for when starting to fuzz a new program. It's simple to setup and get running. And it finds all the low hanging fruit. But to go deeper, you need a guided fuzzer like AFL++. And that is exactly what we'll do in a future blog post.

Tuesday, September 7, 2021

Checking application hardening with annocheck

Gcc and glibc have multiple mitigations that are available to prevent certain kinds of exploits. The redhat-rpm-macros package contains the flags that are passed into the build environment when a package is built. If you wanted to see this, look in /usr/lib/rpm/redhat/macros. There is a _hardened_build macro that is defined to a 1. That pulls in _hardened_cflags, which pulls in _hardening_cflags and more and more macros.

However, people write their own build system. They sometimes override environmental variables. Or maybe the spec file is written in a way that cflags cannot be injected. How can we check to see if the intended flags were applied?

One possibility is to use the checksec.sh program. If you google around, you can find it. I have put a copy into the security-assessor github tools. To use it, we can pass a file or a pid. For example:


./checksec.sh --file /usr/sbin/auditd
RELRO     STACK CANARY   FORTIFY SOURCE   PIE   FILE      PACKAGE
Full RELRO Canary found   Fortify found   PIE enabled /usr/sbin/auditd  audit-3.0.6


For common criteria, it calls out that applications should have stack smashing protection and ASLR. In the output of checksec.sh, this would be the stack canary and PIE columns. If PIE is not enabled then the application has some but not all parts randomized. Specifically the code segment doesn’t move around. However, making an application fully use ASLR causes a new layer of indirection to get added to applications. This becomes an attack point unless it’s made read only at application start-up. This is what the RELRO column is talking about. What we want is full RELRO so that we have full ASLR and complete symbol resolution so that all indirection is marked readonly. That leads to the question of how does checksec.sh determine that?

To detect whether stack smashing is enabled, we need to use readelf. What we can do is look in the symbol table. Stack smashing detection is done by placing a random number on the stack for each function call. On return, its checked to see if it's changed. If it is, then it calls the internal function __stack_check_fail(). On recent Fedora, the binutils were changed to shorten function names. To see them accurately, you need to use the ‘W’ argument. So, to check for stack smashing protection you would do this:

readelf -sW /usr/sbin/auditd | grep __stack_check_fail


To check for ASLR, we need to examine the ELF headers. One field is called Type. This is to say what type of Elf file it is. It can be an executable, dynamic, core, or object file. The dynamic type means that its a shared object or a library file. However, there is almost no difference between a shared object file and a program that is compiled with PIE. The only difference might be that the application has a main function, but so does libc. The check for PIE ASLR would look something like this:

readelf -hW /usr/sbin/auditd | grep 'Type:[[:space:]]*DYN'


But, the last item to check on is if we have full RELRO. All applications compiled on Fedora or RHEL automatically have partial RELRO. There was a patch applied to binutils that hardwires this. In order to have full RELRO, the program must be compiled with the bind_now linker flag. The check for this is located in the dynamic section of the program. A test would look like this:

readelf -dW /usr/sbin/auditd | grep  'BIND_NOW'


Simple...right? Not so fast. Some of these tests are certain to give you a correct answer. For example, there is only one program header. It can give you a reliable answer. However, what about the stack smashing protection? All we can tell is it’s enabled for at least one object file. We cannot tell if all object files were compiled with stack smashing protection. We also can’t tell if its regular, strong, or full protection. And that goes for other hardening flags such as stack clash or control flow integrity. If checksec.sh is all we have, then we are reduced to looking for the build logs and verifying that every file got every intended flag.


A Better Mousetrap

This is why we have the annobin and annocheck programs. The annobin program is a gcc plugin that annotates build information in a notes section of each object file. The annocheck program can then read these note sections and reason about the build policy being faithfully carried out. To use it, all you need to do is pass the full path to the program to it. It will check dozens of things about the application. To see these pass the --verbose flag. But what if we just wanted to recreate the 3 check that checksec.sh does? We can turn all tests off and then enable the ones we want like this:

# annocheck --verbose --skip-all --test-stack-prot --test-pie --test-bind-now /usr/sbin/auditd | grep -v info:
annocheck: Version 9.79.
Hardened: /usr/sbin/auditd: PASS: pie test
Hardened: /usr/sbin/auditd: PASS: bind-now test
Hardened: /usr/sbin/auditd: PASS: stack-prot test


Based on this, it’s possible to write a script and check all files for stack smashing protection like so:

#!/bin/sh
DIRS="/usr/lib64 /usr/lib /usr/bin /usr/sbin /usr/libexec"
FLAGS="--skip-all --ignore-gaps --test-stack-prot"
for d in $DIRS
do
        if [ ! -d $d ] ; then
                continue
        fi
        echo "Scanning files in $d..."
        for f in `/usr/bin/find $d -type f 2>/dev/null`
        do
                # Get just the elf executables
                testf=`echo $f | /usr/bin/file -n -f - 2>/dev/null | grep ELF`
                if [ x"$testf" != "x" ] ; then
                        # Get results dropping version and first 2 fields
                        res=`annocheck $FLAGS $f 2>/dev/null | grep -v '^annocheck:' | cut -d " " -f 3-`
                        if [ x"$res" != "x" ] ; then
                                echo "$f $res"
                        fi
                fi
        done
done

Saving this as check-ssp and running this on a fully patched Fedora 34 system gives the following results:

$ ./check-ssp | grep FAIL
/usr/lib64/libva.so.2.1100.0 FAIL: stack-prot test because only some functions protected
/usr/lib64/gimp/2.0/plug-ins/fourier FAIL: stack-prot test because stack protection deliberately disabled
/usr/lib64/ocaml/objinfo_helper FAIL: stack-prot test because stack protection deliberately disabled
/usr/lib64/libva-x11.so.2.1100.0 FAIL: stack-prot test because only some functions protected


There are a lot more failures in the video drivers. Hopefully there’s no bugs there.  :-)


Conclusion
The annobin / annocheck programs allow us to verify that the intended compiler mitigations are present in all ELF files in the distro. It is a better check than the old way. There are times when functions don’t have stack variables or do anything that cause stack smashing protection to be enabled. Only by looking at the annotation from the build can we tell that the flags were passed in and the compiler chose not to need the __stack_check_fail function. And without annocheck, there is otherwise no visibility into how much stack smashing protection is compiled in.

The annocheck program gives unprecedented visibility in the application hardening on your system. It can let you know if everything is good. It can also be used as a gating test when you build a package and intend to deploy it. It’s worth your time to know about. And it now has online documentation to help you fix programs.

Sunday, September 5, 2021

How to build AFL++ on Fedora 34

In the last article, I explained how to use Radamsa to fuzz applications. But what if you said I want to use a real fuzzer - one like AFL. Well, OK then. When you say you want to use AFL, I think you really mean AFL++, This is the community supported version based on the original, but with a whole lot of new ideas to make it faster and more aggressive. I'm here to show you how to do it...but on Fedora, it's harder than it needs to be.

The thing about AFL++ is that you really want to use the clang-lto mode. To do that means you need the clang gold linker. And for whatever reason, Fedora doesn't ship it. No gold linker, no clang-lto mode. So, the first step of building AFL++ is to build clang from scratch. And unless you have one of those nice AMD 12 or 16 core CPU's, this will take a while.

Suppose you have a 4 core machine, that gives you 8 hyperthreads. Typically when you compile, you can do:


make -j $(expr nproc)


but a lot of time is spent doing IO. So, you can get a little more speed by doubling that. And that is exact what we'll do in the instruction below.

Also, I wanted to go with a released version of llvm/clang instead of whatever's in the repo at the moment. So, I'll add the steps in to get 12.0.1 which is the current release as of this writing. Building clang takes about an hour on 4 core Xeon. See you on the other side. 


cd working/BUILD/
git clone --depth 1 --branch llvmorg-12.0.1 https://github.com/llvm/llvm-project.git
cd llvm-project
mkdir build
cd build
cmake -G "Unix Makefiles" -DLLVM_ENABLE_PROJECTS='clang;clang-tools-extra;compiler-rt;libclc;libcxx;libcxxabi;libunwind;lld' -DCMAKE_BUILD_TYPE=Release -DLLVM_BINUTILS_INCDIR="/usr/include" ../llvm
make -j $(expr `nproc` \* 2) ENABLE_OPTIMIZED=1

export PATH="~/working/BUILD/llvm-project/build/bin/:$PATH"


Hopefully you had something to do while that built. Anyways, on to doing the real job of making AFL++. This goes much faster.


git clone https://github.com/AFLplusplus/AFLplusplus.git
cd AFLplusplus
make -j $(expr `nproc` \* 2) source-only

export PATH=/home/builder/working/BUILD/AFLplusplus:/home/builder/working/BUILD/llvm-project/build/bin:$PATH
export AFL_PATH=/home/builder/working/BUILD/AFLplusplus


These last two updates of the environment are something you can put in your bashrc or as part of a script to setup for fuzzing. Also note that the first one includes the path to llvm-clang.

So, there you have it. We're all ready to fuzz a target. We'll start a fuzzing project in a future article to show how fuzz a real program that people are using.

Saturday, September 4, 2021

Simple fuzzing with Radamsa

 We will start looking into improving programs by fuzzing them. A simple fuzzer that gives very good results is Radamsa. It is part of the Fedora distribution, so all you need to do is install it.

Radamsa is a file mutator. It takes a file as input and modifies it. So, if we wanted to fuzz the audit search utility, we would gather a sample log, mutate it, and run a search on the mutated log. This is easily scriptable. For example, consider the following bash script:


#!/bin/sh
LOG_DIR="/tmp"
LOG="$LOG_DIR/test.log"
MLOG="$LOG_DIR/stmp.log"
PDIR="/home/audit-3.0.5"
OPTIONS="--format csv --extra-keys --extra-labels --extra-obj2 --extra-time"
export ASAN_OPTIONS=detect_stack_use_after_return=true:strict_string_checks=true:detect_invalid_pointer_pairs=2

# Get fresh log data to test with
echo "Collecting logs..."
ausearch -if /var/log/audit/audit.log --start today --raw > $LOG
echo "Log collected, starting to fuzz..."

# Now fuzz the logs over and over
while true
do
        cat $LOG | radamsa > $MLOG
        date
        LD_LIBRARY_PATH="$DIR"/auparse/.libs/:"$PDIR"/lib/.libs/  $PDIR/src/.libs/ausearch $OPTIONS -if $MLOG >/dev/null
        if [ "$?" != "0" ] ; then
                exit 1
        fi
        rm -f $MLOG
        echo "==="
done

The idea is to cause ausearch to choke on its input as it parses things. The thing is that many failures can happen but is not visible out side the program. Glibc uses 8 byte alignment for memory allocations. If we blow past the buffer by 1 byte, it is not likely to cause a crash.

To detect these kind of issues, we need to rebuild the audit software with gcc's address sanitizer. This build has all kinds of ways to look at the program to detect these overflows. What I would recommend is to download the source rpm and build it. If you need help setting up a build environment, I have instructions here.

Once it's done do the following:


cd audit-3.0.5
make clean
CFLAGS="-fsanitize=address,pointer-compare,pointer-subtract,unreachable,vla-bound,bounds,undefined,null,return,object-size,nonnull-attribute,returns-nonnull-attribute,bool,enum,builtin -ggdb -fno-sanitize-recover=signed-integer-overflow" ./configure --with-python=no --with-python3=yes --enable-gssapi-krb5=yes --with-arm --with-aarch64 --with-libcap-ng=yes --without-golang --enable-systemd --enable-experimental
make -j 8

Once this completes, you are ready to fuzz with the script above. If you do fuzz ausearch, you will probably find a couple direct memory leaks. If you look at the mutated log, you will probably see that it added 5 comm= fields. This will never happen in real life, so I have not patched ausearch to fix this.

You can apply this same fuzzing technique to other applications. Build it with the address sanitizer, get a sample input, adjust the script, and let it roll.