Right - I mean, what you're describing makes sense, but it doesn't sound like what they're describing. Their benchmarks are running on an EC2 instance and the post's author is here saying that they run on virtualized hardware. Plus they run on top of a file system. None of that screams "direct DMA from our buffers" to me.
I'm not saying it's impossible, but typically people who want to lean on hardware guarantees for extra performance control more of the stack.