Security + Data Science: May 2026

The fapolicyd 1.5 release had a lot of internal work in it. This release started with a simple objective:

No silent visibility loss.
No silent policy replacement failure.
No hidden default-allow behavior.

fapolicyd is an application allowlisting daemon. It receives file access events from the kernel through fanotify, evaluates policy, and replies to the kernel. When something goes wrong in that path, the daemon should not quietly lose sight of events, quietly replace policy with a bad ruleset, or quietly allow executable content because policy had no opinion.

That is the first narrative of this release: fail-safe and correctness work. The second narrative is reporting. Once the important failure modes are made explicit, they need to show up in the right place. That led to separating status, metrics, and timing into different reports.

fapolicyd 1.5 makes important failure modes explicit and then reports them in the right place

Fail-safe and correctness work

One of the most important changes is the transactional rule reload. Previously, a reload could destroy the active rules before the replacement ruleset had fully loaded. If parsing failed after the old policy was gone, the daemon could be left with partial or empty policy. Since fapolicyd historically allows access when no rule has an opinion, this could become a fail-open condition.

The new approach is to build and validate the candidate ruleset separately. It owns the rules, attribute sets, syslog fields, proc-status mask, rule count, and rule identity. Only after the candidate is complete does it replace the active ruleset. A failed reload preserves the previous published policy.

That also means reload failures become useful operational data. The daemon can count rule reload failures and report them as failure-action metrics. Today those counters are observe-mode diagnostics. Later, high-security deployments can use the same named failure classes to decide whether a stronger action is needed.

The parser side was tightened for the same reason. If a subject or object assignment fails while parsing a rule, that must be a parse failure. It cannot be ignored. A malformed multi-attribute rule should not accidentally become a broader rule because one invalid attribute was skipped while another valid attribute remained.

The fanotify path also got attention. FAN_Q_OVERFLOW is the kernel telling us that fanotify events were lost. This is not the same as a denied access. It is visibility loss. If the kernel queue overflows and the daemon does not make that visible, an administrator has no way to know that enforcement degraded.

So fapolicyd now detects kernel queue overflow, counts it, logs it with rate limiting, and exposes it through reporting. This counter is not just for this release. It is groundwork for later failure policy. If a future configuration wants to treat kernel event loss as a high-severity condition, the daemon has a specific event class to build on.

The daemon also now counts failed or short fanotify response writes. Every permission event needs exactly one response back to the kernel. If the response write fails, that is a correctness issue. Counting it gives us another named failure class that can be monitored now and acted on more strongly later.

The no-opinion allow path was another area that needed guardrails. Fapolicyd does not try to deny every ordinary document open. Compatibility requires a default allow when policy has no opinion. But executable and programmatic content should not accidentally fall through because of incomplete rules, parser mistakes, missing file type classification, or internal errors.

The release adds counters that distinguish "Allowed by rule" from "Allowed by fallthrough". That keeps the compatibility behavior, but it stops hiding it. If a system has default-allow execute activity, the administrator can now see it and investigate.

This is also where the new rule linter fits.

$ fapolicyd-cli --check-rules --lint fapolicyd/src/tests/fixtures/rules-valid.rules
file is valid (3 rules)
Policy lint warning: executable events can fall through; no terminal broad execute deny found
Policy lint warning: %languages is not defined; programmatic ftype coverage
cannot be checked

$ fapolicyd-cli --check-rules --lint --verbose
Rules file is valid (15 rules)
Policy lint found no warnings

The linter is intentionally modest. It is not a formal proof of the policy. It looks for policy shapes that are easy to get wrong: executable events that can fall through because there is no broad terminal execute deny, missing "%languages" coverage for programmatic opens, and broad open allows that can shadow programmatic-content denies.

That matters because fapolicyd policies are ordered. A rule that looks harmless in isolation can change meaning when it appears before another rule. The linter gives administrators a cheap check before loading policy. It also ties back to the default-allow metrics: if the linter warns about a default-allow gap, the metrics report can show whether the running system is actually using that gap.

If you use rules.d, fagenrules now validates the compiled rules file before installing it. That closes a practical hole. The assembled policy should be checked before it is written as the policy the daemon will later load.

Subject deferral

The subject cache work deserves a closer look because it fixes a subtle correctness problem.

fapolicyd tracks process startup state. During exec, it may see the executable, the runtime linker, libraries, interpreter activity, and other file opens that belong to the same subject. While that sequence is still being understood, the subject can be in a BUILDING state.

The subject cache is indexed by a slot. More than one process can hash to the same slot over time. Under heavy fork/exec pressure, a new process can arrive at a slot that is already occupied by a different process that is still BUILDING. If fapolicyd evicts the BUILDING subject too early, it loses the startup context for the process already underway. When that original process comes back with another access, fapolicyd may see a later part of startup without the earlier context. That can lead to false pattern decisions such as bad "ld_so" detection.

One idea is to collect ejected slots and try to restore them later. That sounds appealing, but it is a losing proposition.

New processes keep coming. There is no guarantee that the process we ejected will ever make another access. It might exit. It might segfault. It might be stopped by a tracer. If it does come back, there is no clean way to find the saved subject once the same slot has been used by another access request. You also have to decide who owns the saved file descriptors and how long to keep blocked permission events around. That turns the cache into a second, unbounded, hard-to-reason-about subject store.

The implemented design takes the opposite approach. Do not eject the old BUILDING subject just because a new subject arrived. Store the new access request in a bounded deferral array. This puts back pressure on the new access while the subject already underway finishes and gets out of the way.

Subject deferral applies backpressure to the new access instead of evicting an in-progress subject.

The array is fixed size. That is important. A fanotify permission event can hold a task waiting for a decision, and it owns a file descriptor until the daemon replies or closes it. Deferral is acceptable because it is bounded and observable. If the array fills, fapolicyd falls back to the historical eviction behavior and counts that fallback.

There are two hard cases: traced subjects and stale subjects.

If a process is being traced, it may be stopped indefinitely. We do not know whether it will continue. If a traced process holds a BUILDING slot and another request needs that slot, waiting forever would be a denial-of-service bug. A malicious actor could try to fill slots with stopped BUILDING subjects and deadlock the system. The daemon now detects traced BUILDING subjects and can eject them when their slot is needed.

There is also a timeout. Even without tracing, a BUILDING subject that has not made progress can become stale. As a last resort, stale BUILDING slots are garbage collected so the system does not wait forever on a subject that may never complete startup.

The relevant counters are:

Subject deferred events
Subject defer max depth
Subject defer fallbacks
Early subject cache evictions
Subject BUILDING tracer evictions
Subject BUILDING stale evictions
Subject defer oldest age

These counters tell us whether the theory is working. A busy fork/exec workload may increase "Subject deferred events". That is expected. "Subject defer fallbacks" should not steadily climb during normal operation. Tracer and stale evictions should be rare and worth looking at.

The stress and timing article later in this series will show how to create subject-cache pressure deliberately and how to read these counters during a test.

Reporting philosophy

The second narrative is reporting. Before this work, the status report had become a mixed report. It included health, configuration, counters, cache statistics, and other runtime details. That makes it harder to know what a field means. Is it a current state value? A lifetime counter? A counter that should be reset? A health indicator?

The reports now have cleaner jobs.

"fapolicyd-cli --check-status" asks whether the daemon is healthy and configured as expected.

# fapolicyd-cli --check-status
Operating mode:
Permissive: false
Integrity: sha256
reset_strategy: manual
Timing collection mode: manual
Timing collection armed: false
Timing collection last start time: never
Timing collection last stop time: never
Ruleset generation: 1

Headline activity:
Allowed accesses: 8762
Denied accesses: 0

Resource configuration:
CPU cores: 32
q_size: 800
Subject defer array size: 256
Subject cache size: 4099
Object cache size: 16381
Trust database max pages: 18944

Resource utilization:
Trust database pages in use: 14284 (75%)
Subject slots in use: 207 (5%)
Object slots in use: 3334 (20%)
glibc arena (total memory) is: 25560 KiB, was: 1188 KiB
glibc uordblks (in use memory) is: 8348 KiB, was: 960 KiB
glibc fordblks (total free space) is: 17211 KiB, was: 227 KiB

Health indicators:
Kernel queue overflow: 0
Filesystem errors: 0
Filesystem error last status: none
Filesystem error last seen: never
Reply errors: 0
Subject defer fallbacks: 0
Early subject cache evictions: 0
Subject BUILDING tracer evictions: 0
Subject BUILDING stale evictions: 0
Subject defer oldest age: 0ns
Failure action queue_full (observe): 0
Failure action kernel_queue_overflow (observe): 0
Failure action worker_stall (observe): 0
Failure action rule_reload_failure (observe): 0
Failure action trust_reload_failure (observe): 0
Failure action response_write_failure (observe): 0
Failure action fanotify_filesystem_error (observe): 0

Watched mounts:
watching mount: /
watching mount: /dev/shm
watching mount: /run
watching mount: /run/credentials/systemd-journald.service
watching mount: /tmp
...

Status includes operating mode, integrity mode, reset_strategy, timing collection state, ruleset generation, headline allow and deny activity, resource configuration, resource utilization, health indicators, failure action counters, and watched mounts.

The important thing is that status is where you look for current condition. Is the daemon permissive? What ruleset generation is loaded? Are there kernel queue overflows? Are reply errors non-zero? Is there an old deferred subject? Are the watched mounts what you expect?

"fapolicyd-cli --check-metrics" asks what happened in the current counter window. Metrics includes decision outcome counters, Allowed by rule, Allowed by fallthrough, queue and deferral activity, subject and object cache effectiveness, rule hit counts, and subject/object attribute lookup counters.

Some information appears in both reports on purpose. Allowed and denied accesses are useful headline activity in status, but they are also counters in metrics. Early subject evictions are health signals in status, but they also describe the current metrics window. The difference is the question being asked. Status asks whether something needs attention. Metrics asks what moved during the measured interval.

The old cache and counter information did not disappear. It moved to the metrics report where it belongs. If you previously looked in status for cache hits, misses, evictions, rule hits, or detailed runtime counters, look in "--check-metrics".

Metrics can also be reset in controlled ways. The default reset_strategy is never, which preserves lifetime counters. manual allows a privileged "fapolicyd-cli --reset-metrics" request to snapshot and reset runtime counters. auto is for interval reports that should each describe only the activity since the previous interval.

Timing is separate from both status and metrics. Timing collection is disabled by default because it adds measurement work to the decision path. When enabled with "timing_collection=manual", an administrator can start and stop a bounded timing window:

fapolicyd-cli --timer-start

fapolicyd-cli --timer-stop

The daemon then writes to "/run/fapolicyd/fapolicyd.timing". This report answers where time went during that specific run: queue wait, event build, rule evaluation, MIME detection, hashing, trust database lookup, logging, audit metadata, and the fanotify response write.

I will cover metrics in the next article and timing plus the stress harness in the one after that. Those deserve separate treatment. This article is only the overview: the release first made important correctness failures explicit, then gave administrators better places to look for the evidence.

The upstream project is here: https://github.com/linux-application-whitelisting/fapolicyd.

If you are new to fapolicyd itself, the Red Hat documentation on blocking and allowing applications with fapolicyd is a useful starting point.

Security + Data Science

Friday, May 22, 2026

fapolicyd 1.5: Correctness First, Better Reports Second

Fail-safe and correctness work

Subject deferral

Reporting philosophy

Blog Archive