The previous article looked at fapolicyd metrics. Metrics tell us what happened in a counter window. This article is about timing. Timing answers a different question: where did the daemon spend time while making decisions?
This is not something that should always be enabled. fapolicyd sits in the path of file opens and execs. Calling clock_gettime() around hot-path operations on every permission event has a cost. That is why timing collection is off by default and why the new model is manual. You arm timing, run a workload, stop timing, and read the report.
Before looking at the command line, it helps to understand what a decision is. The timing report is much easier to read if you know the path an access request takes through the daemon.
Anatomy of a Decision
At a high level, a fapolicyd decision has three phases:
event_buildevent_build turns the raw fanotify event into the internal data fapolicyd can reason about. evaluation walks the rules and computes the policy result. response records the outcome, prepares any logging or audit data, and writes the final decision back to the kernel.
evaluation
response
![]() |
| How a fapolicyd access decision moves through the daemon. |
The timing report uses names that look like this:
phase:operation:childFor example, evaluation:mime_detection:libmagic_fallback means libmagic fallback happened while the daemon was in the evaluation phase. response:trust_db_lookup:read means trust database read work was triggered while building the response.
The stages are not a strict accounting tree. Some operations are nested. Some are lazy and only happen if a rule or log format asks for the data. Child rows do not have to add up exactly to parent rows. Treat them as measured gates in the decision path, not as a perfect flame graph.
Event Build
The first phase is event_build. The kernel gives fapolicyd a fanotify event. That event has a permission file descriptor, a process id, and a mask describing the requested access. fapolicyd has to turn that into an internal event with enough subject and object identity to evaluate policy.
The main timed gates are:
event_build:cache_flushevent_build:cache_flush is measured when the daemon has to flush object cache state on the decision path. Cache flushes are not the normal cost of every decision. If this shows up prominently, something caused cached object state to be invalidated during the timing window.
event_build:proc_fingerprint
event_build:fd_stat
event_build:proc_fingerprint reads enough process information to identify the subject and detect stale cache entries. This is the first process identity gate. On fork/exec-heavy systems, this can be a visible cost because every short lived command creates more process identity work.
event_build:fd_stat stats the fanotify file descriptor to identify the object being accessed. This gives fapolicyd stable file identity data. If this is expensive, the cost is in object fingerprinting before policy evaluation even starts.
The output of event_build is not the whole answer. It is the starting point. Some subject and object attributes are still lazy. They are computed only if a rule, syslog format, debug output, or audit response needs them.
Evaluation
The second phase is evaluation. This is where fapolicyd evaluates the active policy rules against the event.
The top-level evaluation gates are:
evaluation:lock_waitevaluation:lock_wait is time spent waiting for the rule lock before policy evaluation can proceed. In the current single-decision-thread design, this is usually not expected to dominate. It becomes more interesting as the code moves toward more read-side concurrency. If it is high now, rule reload or other policy maintenance activity may have overlapped the timing run.
evaluation:fd_path_resolution
evaluation:proc_detail_lookup
evaluation:fd_path_resolution is path resolution for the object file descriptor. Not every rule needs a path, but rules that match on path or dir can force this work. A workload that opens many distinct files can make this more visible.
evaluation:proc_detail_lookup is the on-demand process detail gate. The initial fingerprint is collected in event_build, but rules may ask for more: auid, session id, executable path, process status fields, or other procfs details. If this row is high, the policy or output format is asking for process details that were not already cached.
MIME detection is a nested set of gates:
evaluation:mime_detection:fast_classificationfapolicyd tries cheap classification first. The fast path recognizes common file types without using full libmagic. gather_elf is the ELF and script header scan used to identify executable and interpreter-related details. libmagic_fallback is the expensive fallback when the faster checks cannot classify the object.
evaluation:mime_detection:gather_elf
evaluation:mime_detection:libmagic_fallback
If evaluation:mime_detection:libmagic_fallback is high, the daemon is often leaving the fast path. That can be caused by a workload full of files that are hard to classify cheaply, or by rules that ask for ftype on many objects.
Integrity and hashing gates are:
evaluation:hash_ima:totalhash_ima is IMA digest collection from the security.ima extended attribute. hash_sha is SHA-style file digest collection. These are workload and policy dependent. If integrity mode or FILE_HASH rules force hashing of large files, these rows can dominate a timing run. This is one reason timing has to be a bounded diagnostic tool: hashing can be attacker-controlled cost if policy asks for it on large inputs.
evaluation:hash_sha:total
Trust database lookup is also split:
evaluation:trust_db_lookup:lock_waitlock_wait is time waiting for trust database update coordination before the read can proceed. read is the LMDB read-side work and related lookup logic. If read is high, trust lookup itself is material. If lock_wait is high, the decision path is waiting behind trust database maintenance or contention.
evaluation:trust_db_lookup:read
The evaluation phase decides whether a rule has an opinion. If a rule returns allow or deny, that is the decision source. If no rule has an opinion, fapolicyd preserves historical compatibility by allowing the access through the fallthrough path. The metrics report counts that separately as Allowed by fallthrough.
Response
The third phase is response. This is the work after rule evaluation has produced an outcome. It includes decision bookkeeping, optional syslog or debug formatting, audit response metadata, and the write back to the kernel.
The response phase has some gates that look familiar:
response:mime_detection:fast_classificationThis is the part that can be confusing at first. Why can MIME detection or trust database lookup appear in both evaluation and response?
response:mime_detection:gather_elf
response:mime_detection:libmagic_fallback
response:trust_db_lookup:lock_wait
response:trust_db_lookup:read
response:fanotify_write
The answer is lazy attributes. fapolicyd does not compute every possible attribute up front. That would waste time. Instead, attributes are looked up when something asks for them. During evaluation, policy rules may ask for ftype, trust, path, hash, or process details. During response, syslog, debug, or audit formatting may ask for some of the same attributes so the daemon can explain the decision.
So the same helper can be charged to different phases depending on who caused the lookup.
If evaluation:mime_detection:* is high, policy evaluation needed file type information. If response:mime_detection:* is high, reporting or audit output needed file type information after the decision had already been made. The same logic applies to trust database lookup. Evaluation-side trust lookup is policy cost. Response-side trust lookup is reporting or audit cost.
This distinction is important. If a timing report says response-side MIME detection dominates, making policy rules simpler may not help. The cost may be coming from debug output or a detailed syslog_format. If evaluation-side MIME detection dominates, then the policy is probably asking for file type data in the hot path.
response:fanotify_write is the final write of the permission decision back to the kernel. A high value here means the daemon did the policy work and then spent measurable time completing the kernel response. That is not rule cost. It is response-path cost.
Using the Timer
Now the report structure makes more sense. The timer creates a bounded window around a workload so the daemon can aggregate timing data for those gates.
Enable manual timing in fapolicyd.conf:
timing_collection=manualAfter changing the configuration, reload or restart fapolicyd so the running daemon has the setting active. Then check status:
fapolicyd-cli --check-status | head
Look for:
Timing collection mode: manualIf the mode is still off, the daemon will ignore timing requests. This is intentional. A system should not accidentally start collecting timing data just because someone ran the CLI.
Timing collection armed: false
![]() |
| Manual timing creates a bounded measurement window around a workload. |
The CLI has two names for the commands. The documentation uses --timing-start and --timing-stop. There are also --timer-start and --timer-stop aliases. I like the timer spelling because it reads naturally, so I will use that here.
To start timing, run the following command:
sudo fapolicyd-cli --timer-start
Now run something that causes fapolicyd decisions. For a tiny smoke test, this can be as boring as:
for i in $(seq 1 100); do /usr/bin/id >/dev/null; doneThen to stop timing, run the following command:
sudo fapolicyd-cli --timer-stop
The stop command prints the timing report. The daemon writes the report to:
/run/fapolicyd/fapolicyd.timingThe report is aggregate data. It does not store one record per decision. Each stage keeps a count, total time, maximum time, and latency buckets. That keeps memory bounded even if the timing window covers a large number of decisions. Here is a sample report:
Mode: manual
Timing run: 2026-05-24 14:54:07 -0400 to 2026-05-24 14:55:13 -0400
Duration: 0:01:06
Workers: 1
Max queue depth: 399
Decisions: 576,495
Throughput: 8717.3 decisions/sec (wall clock)
Active decision rate: 9596.4 decisions/sec
TL;DR:
- Queueing pressure reached max depth 399 of 800 (49.9%), p95 wait <=100ms, max wait 160 ms.
Overall decision latency:
avg 104 us, max 2.81 ms
p50 bucket <=100us, p95 bucket <=500us, p99 bucket <=500us
<=50us 1.4%, <=100us 72.4%, <=500us 100.0%, <=1ms 100.0%, >10ms 0.0%
Queueing:
avg wait: 27.2 ms
max wait: 160 ms
p95 bucket: <=100ms
total queued time: 15707 s
max queue depth: 399
Decision phase timing:
Phase Calls Calls/Dec Total Avg Max p95 bucket Notes
event_build 576,495 1.00 3.17 s 5.50 us 516 us <=50us
evaluation 576,495 1.00 1.66 s 2.89 us 2.80 ms <=50us
response 576,495 1.00 55.0 s 95.5 us 1.44 ms <=500us
Lazy helper attribution:
Helper timings are attributed to the active logical driver: evaluation or response.
Combined totals are evaluation + response.
Lazy helper attribution by driver:
Helper Eval total Response total Combined Response %
mime_detection:total 11.1 ms 0 ns 11.1 ms 0.0%
mime_detection:fast_classification 510 us 0 ns 510 us 0.0%
mime_detection:gather_elf 186 us 0 ns 186 us 0.0%
mime_detection:libmagic_fallback 10.4 ms 0 ns 10.4 ms 0.0%
trust_db_lookup:total 120 ms 0 ns 120 ms 0.0%
trust_db_lookup:read 109 ms 0 ns 109 ms 0.0%
trust_db_lookup:lock_wait 2.43 ms 0 ns 2.43 ms 0.0%
Combined lazy helper attribution:
Helper path Calls Calls/Dec Total Avg/call Amort/Dec Max p95 bucket
mime_detection:total 564 0.00 11.1 ms 19.6 us 19 ns 2.54 ms <=10us
mime_detection:fast_classification 564 0.00 510 us 903 ns 0 ns 11.3 us <=10us
mime_detection:gather_elf 32 0.00 186 us 5.82 us 0 ns 10.9 us <=10us
mime_detection:libmagic_fallback 9 0.00 10.4 ms 1.16 ms 18 ns 2.53 ms <=5ms
trust_db_lookup:total 66,016 0.11 120 ms 1.81 us 207 ns 2.80 ms <=5us
trust_db_lookup:read 66,016 0.11 109 ms 1.65 us 189 ns 2.80 ms <=5us
trust_db_lookup:lock_wait 66,016 0.11 2.43 ms 36 ns 4 ns 25.7 us <=1us
hash_sha:total 22 0.00 10.6 ms 481 us 18 ns 2.79 ms <=5ms
proc_detail_lookup 99,060 0.17 1.29 s 13.0 us 2.24 us 415 us <=50us
Derived observations:
Queueing showed moderate bursts: max queue depth 399 of 800 (49.9%), p95 wait <=100ms, max wait 160 ms.
libmagic fallback is the biggest MIME contributor: 1.6% of MIME calls, 94.0% of MIME time, 18 ns amortized per decision.
hash_sha is rare but expensive: 0.0% of decisions, 481 us avg when called, 18 ns amortized per decision.
Detailed stage timing, sorted by total time:
Stage Calls Calls/Dec Total Avg Max p95 bucket
time_in_queue:total 576,495 1.00 15707 s 27.2 ms 160 ms <=100ms
decision:total 576,495 1.00 60.1 s 104 us 2.81 ms <=500us
response:total 576,495 1.00 55.0 s 95.5 us 1.44 ms <=500us
response:fanotify_write 576,495 1.00 54.0 s 93.7 us 1.44 ms <=500us
event_build:total 576,495 1.00 3.17 s 5.50 us 516 us <=50us
event_build:proc_fingerprint 576,495 1.00 1.92 s 3.32 us 250 us <=10us
evaluation:total 576,495 1.00 1.66 s 2.89 us 2.80 ms <=50us
evaluation:proc_detail_lookup 99,060 0.17 1.29 s 13.0 us 415 us <=50us
event_build:fd_stat 576,495 1.00 733 ms 1.27 us 512 us <=5us
evaluation:trust_db_lookup:total 66,016 0.11 120 ms 1.81 us 2.80 ms <=5us
evaluation:trust_db_lookup:read 66,016 0.11 109 ms 1.65 us 2.80 ms <=5us
evaluation:lock_wait 576,495 1.00 24.8 ms 43 ns 42.4 us <=1us
response:audit_metadata:total 576,495 1.00 12.5 ms 21 ns 46.9 us <=1us
evaluation:mime_detection:total 564 0.00 11.1 ms 19.6 us 2.54 ms <=10us
evaluation:hash_sha:total 22 0.00 10.6 ms 481 us 2.79 ms <=5ms
evaluation:mime_detection:libmagic_fallback 9 0.00 10.4 ms 1.16 ms 2.53 ms <=5ms
evaluation:fd_path_resolution 588 0.00 6.73 ms 11.4 us 164 us <=50us
evaluation:trust_db_lookup:lock_wait 66,016 0.11 2.43 ms 36 ns 25.7 us <=1us
evaluation:mime_detection:fast_classification 564 0.00 510 us 903 ns 11.3 us <=10us
evaluation:mime_detection:gather_elf 32 0.00 186 us 5.82 us 10.9 us <=10us
Stage tail summary:
time_in_queue:total: >10ms 564,343/97.9%, >25ms 235,276/40.8%, >50ms 39,149/6.8%, >100ms 2,230/0.4%
Not observed:
event_build:cache_flush, response:mime_detection:total, response:mime_detection:fast_classification, response:mime_detection:gather_elf, response:mime_detection:libmagic_fallback, evaluation:hash_ima:total, response:trust_db_lookup:total, response:trust_db_lookup:lock_wait, response:trust_db_lookup:read, response:syslog_debug_format:total
Notes:
Largest queued-time contributor: time_in_queue:total (15707 s)
Largest helper contributor: proc_detail_lookup (1.29 s)
Largest decision phase contributor: response (55.0 s)
Slowest observed row by max: time_in_queue:total (160 ms)
Interpreting the Report
The full timing report can look intimidating, but the reading order is simple:
1. Run summaryThe run summary tells you whether the run is meaningful. Decisions should be non-zero. If it is zero, the workload did not generate timed daemon decisions. Max queue depth tells you whether requests backed up while timing was armed.
2. TL;DR
3. Overall decision latency
4. Queueing
5. Decision phase timing
6. Lazy helper attribution
7. Detailed stage timing
The report has two rates. Throughput is decisions per wall-clock second while timing was armed. Active decision rate is based on accumulated decision:total worker time. If wall-clock throughput is low but active decision rate is high, the workload may have been bursty or idle part of the time. If both are low, decision work is expensive.
The TL;DR section is not magic. It is a compact set of observations derived from the same data later in the report. It may point out helper dominance, response formatting cost, queue health, or that no dominant findings were observed.
Overall decision latency is the end-to-end decision worker latency after an event has been dequeued. It includes event build, policy evaluation, optional logging or audit preparation, response selection, and the fanotify response write. Queue wait is reported separately.
Before the next section, let's talk about the p95 notation. The p95 bucket is the latency bucket that contains the 95th percentile observation. In plain terms: about 95% of the measured calls completed at or below that bucket, and about 5% were slower. If the report says a stage has p95 <=500us, then nearly all calls to that stage were fast. If it says p95 >10ms, then the slow path was not just one rare outlier; it happened often enough to affect the top 5% of calls.
It is significant because averages can hide tail latency, and maximums can overreact to one unusual event. The p95 bucket sits between them. It tells you whether slowness is recurring enough to matter.
This section has average latency, maximum latency, p95 bucket, and tail buckets. The p95 bucket is usually more useful than the maximum. A high maximum with a low p95 means rare outliers. A high p95 means the slow path is common.
Queueing tells you how long events waited in fapolicyd's internal userspace queue before the decision worker started processing them. If queue wait is high and max queue depth is close to q_size, the daemon is being fed events faster than it can answer them.
Decision phase timing is the first section I would use for diagnosis:
event_build:totalIf event build dominates, look at process and object identity work. If evaluation dominates, look at rule traversal and evaluation-side helpers. If response dominates, look at logging, audit metadata, response-side helpers, and the kernel write.
evaluation:total
response:total
Lazy helper attribution explains whether expensive helper work was caused by evaluation or response. This is where the duplicate-looking MIME and trust DB rows become useful. The same helper has very different tuning implications depending on whether policy needed it or output formatting needed it.
The detailed stage table is the final view. It is sorted by total measured time. Use it to find the main cost centers:
High event_build:proc_fingerprint -> process identity lookup costDo not expect every row to appear in every report. If a stage is not observed, the workload did not trigger that measured operation during the timing window. That can be good news. For example, no libmagic fallback means fast MIME classification handled the files observed in the run.
High event_build:fd_stat -> object fingerprinting cost
High evaluation:lock_wait -> rule lock wait
High evaluation:total -> rule traversal or matching cost
High evaluation:mime_detection:libmagic_fallback -> expensive file type fallback
High evaluation:hash_sha:total -> SHA hashing cost
High evaluation:hash_ima:total -> IMA digest cost
High evaluation:trust_db_lookup:read -> trust database read cost
High response:mime_detection:* -> reporting needed ftype data
High response:trust_db_lookup:read -> reporting or audit needed trust data
High response:fanotify_write -> kernel response write cost
The best way to use timing is to make one change at a time. Change the workload, a rule, an integrity setting, or output format, then run another bounded timing window. Compare the same sections. If the dominant cost moves, you learned something.
The next article will cover the stress harness. The timer tells us where the daemon spent time. The stress tool gives us repeatable workloads that can move specific counters and timing gates on purpose.

