As noted, recent changes to OpenBSD TCP handling[1] may improve performance.
On a 4 core machine I see between 12% to 22% improvement with 10 parallel TCP streams. When testing only with a single TCP stream, throughput increases between 38% to 100%.
I'm not sure that directly translates to better pf performance, and four cores is hardly remarkable these days but might be typical on a small low-power router?
Would be interesting if someone had a recent benchmark comparison of OpenBSD 7.8 PF vs. FreeBSD's latest.
[1] https://undeadly.org/cgi?action=article;sid=20250508122430
Can confirm. Lots of performance improvements lately in OpenBSD. Our Load Balancers basically doubled throughput after updating from 7.6 to 7.7
That particular change improves throughput received locally. Though over the past few years there's been a ton of work on unlocking the network layer generally to support more parallelism.
For a firewall I guess the critical question is the degree of parallelism supported by OpenBSD's PF stack, especially as it relates to common features like connection statefulness, NAT, etc.