Sunday, May 24, 2026

Timing fapolicyd Decisions

The previous article looked at fapolicyd metrics. Metrics tell us what happened in a counter window. This article is about timing. Timing answers a different question: where did the daemon spend time while making decisions?

This is not something that should always be enabled. fapolicyd sits in the path of file opens and execs. Calling clock_gettime() around hot-path operations on every permission event has a cost. That is why timing collection is off by default and why the new model is manual. You arm timing, run a workload, stop timing, and read the report.

Before looking at the command line, it helps to understand what a decision is. The timing report is much easier to read if you know the path an access request takes through the daemon.

Anatomy of a Decision

At a high level, a fapolicyd decision has three phases:

event_build
evaluation
response
event_build turns the raw fanotify event into the internal data fapolicyd can reason about. evaluation walks the rules and computes the policy result. response records the outcome, prepares any logging or audit data, and writes the final decision back to the kernel.

How a fapolicyd access decision moves through the daemon.


 The timing report uses names that look like this:

phase:operation:child
For example, evaluation:mime_detection:libmagic_fallback means libmagic fallback happened while the daemon was in the evaluation phase. response:trust_db_lookup:read means trust database read work was triggered while building the response.

The stages are not a strict accounting tree. Some operations are nested. Some are lazy and only happen if a rule or log format asks for the data. Child rows do not have to add up exactly to parent rows. Treat them as measured gates in the decision path, not as a perfect flame graph.

Event Build

The first phase is event_build. The kernel gives fapolicyd a fanotify event. That event has a permission file descriptor, a process id, and a mask describing the requested access. fapolicyd has to turn that into an internal event with enough subject and object identity to evaluate policy.

The main timed gates are:

event_build:cache_flush
event_build:proc_fingerprint
event_build:fd_stat
event_build:cache_flush is measured when the daemon has to flush object cache state on the decision path. Cache flushes are not the normal cost of every decision. If this shows up prominently, something caused cached object state to be invalidated during the timing window.

event_build:proc_fingerprint reads enough process information to identify the subject and detect stale cache entries. This is the first process identity gate. On fork/exec-heavy systems, this can be a visible cost because every short lived command creates more process identity work.

event_build:fd_stat stats the fanotify file descriptor to identify the object being accessed. This gives fapolicyd stable file identity data. If this is expensive, the cost is in object fingerprinting before policy evaluation even starts.

The output of event_build is not the whole answer. It is the starting point. Some subject and object attributes are still lazy. They are computed only if a rule, syslog format, debug output, or audit response needs them.

Evaluation

The second phase is evaluation. This is where fapolicyd evaluates the active policy rules against the event.

The top-level evaluation gates are:

evaluation:lock_wait
evaluation:fd_path_resolution
evaluation:proc_detail_lookup
evaluation:lock_wait is time spent waiting for the rule lock before policy evaluation can proceed. In the current single-decision-thread design, this is usually not expected to dominate. It becomes more interesting as the code moves toward more read-side concurrency. If it is high now, rule reload or other policy maintenance activity may have overlapped the timing run.

evaluation:fd_path_resolution is path resolution for the object file descriptor. Not every rule needs a path, but rules that match on path or dir can force this work. A workload that opens many distinct files can make this more visible.

evaluation:proc_detail_lookup is the on-demand process detail gate. The initial fingerprint is collected in event_build, but rules may ask for more: auid, session id, executable path, process status fields, or other procfs details. If this row is high, the policy or output format is asking for process details that were not already cached.

MIME detection is a nested set of gates:
evaluation:mime_detection:fast_classification
evaluation:mime_detection:gather_elf
evaluation:mime_detection:libmagic_fallback
fapolicyd tries cheap classification first. The fast path recognizes common file types without using full libmagic. gather_elf is the ELF and script header scan used to identify executable and interpreter-related details. libmagic_fallback is the expensive fallback when the faster checks cannot classify the object.

If evaluation:mime_detection:libmagic_fallback is high, the daemon is often leaving the fast path. That can be caused by a workload full of files that are hard to classify cheaply, or by rules that ask for ftype on many objects.

Integrity and hashing gates are:

evaluation:hash_ima:total
evaluation:hash_sha:total
hash_ima is IMA digest collection from the security.ima extended attribute. hash_sha is SHA-style file digest collection. These are workload and policy dependent. If integrity mode or FILE_HASH rules force hashing of large files, these rows can dominate a timing run. This is one reason timing has to be a bounded diagnostic tool: hashing can be attacker-controlled cost if policy asks for it on large inputs.

Trust database lookup is also split:
evaluation:trust_db_lookup:lock_wait
evaluation:trust_db_lookup:read
lock_wait is time waiting for trust database update coordination before the read can proceed. read is the LMDB read-side work and related lookup logic. If read is high, trust lookup itself is material. If lock_wait is high, the decision path is waiting behind trust database maintenance or contention.

The evaluation phase decides whether a rule has an opinion. If a rule returns allow or deny, that is the decision source. If no rule has an opinion, fapolicyd preserves historical compatibility by allowing the access through the fallthrough path. The metrics report counts that separately as Allowed by fallthrough.

Response

The third phase is response. This is the work after rule evaluation has produced an outcome. It includes decision bookkeeping, optional syslog or debug formatting, audit response metadata, and the write back to the kernel.

The response phase has some gates that look familiar:

response:mime_detection:fast_classification
response:mime_detection:gather_elf
response:mime_detection:libmagic_fallback
response:trust_db_lookup:lock_wait
response:trust_db_lookup:read
response:fanotify_write
This is the part that can be confusing at first. Why can MIME detection or trust database lookup appear in both evaluation and response?

The answer is lazy attributes. fapolicyd does not compute every possible attribute up front. That would waste time. Instead, attributes are looked up when something asks for them. During evaluation, policy rules may ask for ftype, trust, path, hash, or process details. During response, syslog, debug, or audit formatting may ask for some of the same attributes so the daemon can explain the decision.

So the same helper can be charged to different phases depending on who caused the lookup.

If evaluation:mime_detection:* is high, policy evaluation needed file type information. If response:mime_detection:* is high, reporting or audit output needed file type information after the decision had already been made. The same logic applies to trust database lookup. Evaluation-side trust lookup is policy cost. Response-side trust lookup is reporting or audit cost.

This distinction is important. If a timing report says response-side MIME detection dominates, making policy rules simpler may not help. The cost may be coming from debug output or a detailed syslog_format. If evaluation-side MIME detection dominates, then the policy is probably asking for file type data in the hot path.

response:fanotify_write is the final write of the permission decision back to the kernel. A high value here means the daemon did the policy work and then spent measurable time completing the kernel response. That is not rule cost. It is response-path cost.

Using the Timer

Now the report structure makes more sense. The timer creates a bounded window around a workload so the daemon can aggregate timing data for those gates.

Enable manual timing in fapolicyd.conf:

timing_collection=manual
After changing the configuration, reload or restart fapolicyd so the running daemon has the setting active. Then check status:

fapolicyd-cli --check-status | head

Look for:
Timing collection mode: manual
Timing collection armed: false
If the mode is still off, the daemon will ignore timing requests. This is intentional. A system should not accidentally start collecting timing data just because someone ran the CLI.

Manual timing creates a bounded measurement window around a workload.

The CLI has two names for the commands. The documentation uses --timing-start and --timing-stop. There are also --timer-start and --timer-stop aliases. I like the timer spelling because it reads naturally, so I will use that here.

To start timing, run the following command:

sudo fapolicyd-cli --timer-start

Now run something that causes fapolicyd decisions. For a tiny smoke test, this can be as boring as:

for i in $(seq 1 100); do /usr/bin/id >/dev/null; done
Then to stop timing, run the following command:

sudo fapolicyd-cli --timer-stop

The stop command prints the timing report. The daemon writes the report to:

/run/fapolicyd/fapolicyd.timing
The report is aggregate data. It does not store one record per decision. Each stage keeps a count, total time, maximum time, and latency buckets. That keeps memory bounded even if the timing window covers a large number of decisions. Here is a sample report:

Mode: manual
Timing run: 2026-05-24 14:54:07 -0400 to 2026-05-24 14:55:13 -0400
Duration: 0:01:06
Workers: 1
Max queue depth: 399
Decisions: 576,495
Throughput: 8717.3 decisions/sec (wall clock)
Active decision rate: 9596.4 decisions/sec

TL;DR:
  - Queueing pressure reached max depth 399 of 800 (49.9%), p95 wait <=100ms, max wait 160 ms.

Overall decision latency:
  avg 104 us, max 2.81 ms
  p50 bucket <=100us, p95 bucket <=500us, p99 bucket <=500us
  <=50us 1.4%, <=100us 72.4%, <=500us 100.0%, <=1ms 100.0%, >10ms 0.0%

Queueing:
  avg wait: 27.2 ms
  max wait: 160 ms
  p95 bucket: <=100ms
  total queued time: 15707 s
  max queue depth: 399

Decision phase timing:
Phase                 Calls  Calls/Dec      Total        Avg        Max   p95 bucket   Notes
event_build         576,495       1.00     3.17 s    5.50 us     516 us       <=50us   
evaluation          576,495       1.00     1.66 s    2.89 us    2.80 ms       <=50us   
response            576,495       1.00     55.0 s    95.5 us    1.44 ms      <=500us   

Lazy helper attribution:
  Helper timings are attributed to the active logical driver: evaluation or response.
  Combined totals are evaluation + response.

Lazy helper attribution by driver:
Helper                             Eval total  Response total     Combined Response %
mime_detection:total                  11.1 ms            0 ns      11.1 ms        0.0%
mime_detection:fast_classification     510 us            0 ns       510 us        0.0%
mime_detection:gather_elf              186 us            0 ns       186 us        0.0%
mime_detection:libmagic_fallback      10.4 ms            0 ns      10.4 ms        0.0%
trust_db_lookup:total                  120 ms            0 ns       120 ms        0.0%
trust_db_lookup:read                   109 ms            0 ns       109 ms        0.0%
trust_db_lookup:lock_wait             2.43 ms            0 ns      2.43 ms        0.0%

Combined lazy helper attribution:
Helper path                                       Calls  Calls/Dec      Total   Avg/call  Amort/Dec        Max   p95 bucket
mime_detection:total                                564       0.00    11.1 ms    19.6 us      19 ns    2.54 ms       <=10us
mime_detection:fast_classification                  564       0.00     510 us     903 ns       0 ns    11.3 us       <=10us
mime_detection:gather_elf                            32       0.00     186 us    5.82 us       0 ns    10.9 us       <=10us
mime_detection:libmagic_fallback                      9       0.00    10.4 ms    1.16 ms      18 ns    2.53 ms        <=5ms
trust_db_lookup:total                            66,016       0.11     120 ms    1.81 us     207 ns    2.80 ms        <=5us
trust_db_lookup:read                             66,016       0.11     109 ms    1.65 us     189 ns    2.80 ms        <=5us
trust_db_lookup:lock_wait                        66,016       0.11    2.43 ms      36 ns       4 ns    25.7 us        <=1us
hash_sha:total                                       22       0.00    10.6 ms     481 us      18 ns    2.79 ms        <=5ms
proc_detail_lookup                               99,060       0.17     1.29 s    13.0 us    2.24 us     415 us       <=50us

Derived observations:
  Queueing showed moderate bursts: max queue depth 399 of 800 (49.9%), p95 wait <=100ms, max wait 160 ms.
  libmagic fallback is the biggest MIME contributor: 1.6% of MIME calls, 94.0% of MIME time, 18 ns amortized per decision.
  hash_sha is rare but expensive: 0.0% of decisions, 481 us avg when called, 18 ns amortized per decision.

Detailed stage timing, sorted by total time:
Stage                                                 Calls      Calls/Dec      Total        Avg        Max   p95 bucket
time_in_queue:total                                 576,495           1.00    15707 s    27.2 ms     160 ms      <=100ms
decision:total                                      576,495           1.00     60.1 s     104 us    2.81 ms      <=500us
response:total                                      576,495           1.00     55.0 s    95.5 us    1.44 ms      <=500us
response:fanotify_write                             576,495           1.00     54.0 s    93.7 us    1.44 ms      <=500us
event_build:total                                   576,495           1.00     3.17 s    5.50 us     516 us       <=50us
event_build:proc_fingerprint                        576,495           1.00     1.92 s    3.32 us     250 us       <=10us
evaluation:total                                    576,495           1.00     1.66 s    2.89 us    2.80 ms       <=50us
evaluation:proc_detail_lookup                        99,060           0.17     1.29 s    13.0 us     415 us       <=50us
event_build:fd_stat                                 576,495           1.00     733 ms    1.27 us     512 us        <=5us
evaluation:trust_db_lookup:total                     66,016           0.11     120 ms    1.81 us    2.80 ms        <=5us
evaluation:trust_db_lookup:read                      66,016           0.11     109 ms    1.65 us    2.80 ms        <=5us
evaluation:lock_wait                                576,495           1.00    24.8 ms      43 ns    42.4 us        <=1us
response:audit_metadata:total                       576,495           1.00    12.5 ms      21 ns    46.9 us        <=1us
evaluation:mime_detection:total                         564           0.00    11.1 ms    19.6 us    2.54 ms       <=10us
evaluation:hash_sha:total                                22           0.00    10.6 ms     481 us    2.79 ms        <=5ms
evaluation:mime_detection:libmagic_fallback               9           0.00    10.4 ms    1.16 ms    2.53 ms        <=5ms
evaluation:fd_path_resolution                           588           0.00    6.73 ms    11.4 us     164 us       <=50us
evaluation:trust_db_lookup:lock_wait                 66,016           0.11    2.43 ms      36 ns    25.7 us        <=1us
evaluation:mime_detection:fast_classification           564           0.00     510 us     903 ns    11.3 us       <=10us
evaluation:mime_detection:gather_elf                     32           0.00     186 us    5.82 us    10.9 us       <=10us

Stage tail summary:
  time_in_queue:total: >10ms 564,343/97.9%, >25ms 235,276/40.8%, >50ms 39,149/6.8%, >100ms 2,230/0.4%

Not observed:
  event_build:cache_flush, response:mime_detection:total, response:mime_detection:fast_classification, response:mime_detection:gather_elf, response:mime_detection:libmagic_fallback, evaluation:hash_ima:total, response:trust_db_lookup:total, response:trust_db_lookup:lock_wait, response:trust_db_lookup:read, response:syslog_debug_format:total

Notes:
  Largest queued-time contributor: time_in_queue:total (15707 s)
  Largest helper contributor: proc_detail_lookup (1.29 s)
  Largest decision phase contributor: response (55.0 s)
  Slowest observed row by max: time_in_queue:total (160 ms)

 

Interpreting the Report

The full timing report can look intimidating, but the reading order is simple:

1. Run summary
2. TL;DR
3. Overall decision latency
4. Queueing
5. Decision phase timing
6. Lazy helper attribution
7. Detailed stage timing
The run summary tells you whether the run is meaningful. Decisions should be non-zero. If it is zero, the workload did not generate timed daemon decisions. Max queue depth tells you  whether requests backed up while timing was armed.

The report has two rates. Throughput is decisions per wall-clock second while timing was armed. Active decision rate is based on accumulated decision:total worker time. If wall-clock throughput is low but active decision rate is high, the workload may have been bursty or idle part of the time. If both are low, decision work is expensive.

The TL;DR section is not magic. It is a compact set of observations derived from the same data later in the report. It may point out helper dominance, response formatting cost, queue health, or that no dominant findings were observed.

Overall decision latency is the end-to-end decision worker latency after an event has been dequeued. It includes event build, policy evaluation, optional logging or audit preparation, response selection, and the fanotify response write. Queue wait is reported separately.

Before the next section, let's talk about the p95 notation. The p95 bucket is the latency bucket that contains the 95th percentile observation. In plain terms: about 95% of the measured calls completed at or below that bucket, and about 5% were slower. If the report says a stage has p95 <=500us, then nearly all calls to that stage were fast. If it says p95 >10ms, then the slow path was not just one rare outlier; it happened often enough to affect the top 5% of calls.

It is significant because averages can hide tail latency, and maximums can overreact to one unusual event. The p95 bucket sits between them. It tells you whether slowness is recurring enough to matter. 

This section has average latency, maximum latency, p95 bucket, and tail buckets. The p95 bucket is usually more useful than the maximum. A high maximum with a low p95 means rare outliers. A high p95 means the slow path is common.

Queueing tells you how long events waited in fapolicyd's internal userspace queue before the decision worker started processing them. If queue wait is high and max queue depth is close to q_size, the daemon is being fed events faster than it can answer them.

Decision phase timing is the first section I would use for diagnosis:

event_build:total
evaluation:total
response:total
If event build dominates, look at process and object identity work. If evaluation dominates, look at rule traversal and evaluation-side helpers. If response dominates, look at logging, audit metadata, response-side helpers, and the kernel write.

Lazy helper attribution explains whether expensive helper work was caused by evaluation or response. This is where the duplicate-looking MIME and trust DB rows become useful. The same helper has very different tuning implications depending on whether policy needed it or output formatting needed it.

The detailed stage table is the final view. It is sorted by total measured time. Use it to find the main cost centers:
High event_build:proc_fingerprint              -> process identity lookup cost
High event_build:fd_stat                       -> object fingerprinting cost
High evaluation:lock_wait                      -> rule lock wait
High evaluation:total                          -> rule traversal or matching cost
High evaluation:mime_detection:libmagic_fallback -> expensive file type fallback
High evaluation:hash_sha:total                 -> SHA hashing cost
High evaluation:hash_ima:total                 -> IMA digest cost
High evaluation:trust_db_lookup:read           -> trust database read cost
High response:mime_detection:*                 -> reporting needed ftype data
High response:trust_db_lookup:read             -> reporting or audit needed trust data
High response:fanotify_write                   -> kernel response write cost
Do not expect every row to appear in every report. If a stage is not observed, the workload did not trigger that measured operation during the timing window. That can be good news. For example, no libmagic fallback means fast MIME classification handled the files observed in the run.

The best way to use timing is to make one change at a time. Change the workload, a rule, an integrity setting, or output format, then run another bounded timing window. Compare the same sections. If the dominant cost moves, you learned something.

The next article will cover the stress harness. The timer tells us where the daemon spent time. The stress tool gives us repeatable workloads that can move specific counters and timing gates on purpose.