The whole “zero overhead” thing is IMO a red herring. I care about a few things: stability across versions and languages, space efficiency (sometimes) and performance. I do not care about “overhead” — performance trumps overhead every time.
Your deserializer is probably running on a CPU, and that CPU probably has a very fast L1 cache and might be targeted by a compiler that can do scalar replacement of aggregates and such. A non-zero-overhead deserializer can run very quickly and result in the output being streamed efficiently from its source and ending up hot in L1 in a useful format. A zero-overhead deserializer might do messy reads in a bad order without streaming hints and run much slower.
And then to get very very large records, as in the OP, where getting a good on-disk layout may require thought. And, frequently, the right layout isn’t even array-of-structs, which is why there are so many tools designed to query column stores like Parquet efficiently.
Serdes time can be significant. There are use cases for the zero copy formats even though they use more space. Likewise bit-packed asn1 is often slower than byte-aligned.