logoalt Hacker News

amlutotoday at 5:37 AM5 repliesview on HN

I can’t entirely tell what the article’s point is. It seems to be trying to say that many languages can mmap bytes, but:

> (as far as I'm aware) C is the only language that lets you specify a binary format and just use it.

I assume they mean:

    struct foo { fields; };
    foo *data = mmap(…);
And yes, C is one of relatively few languages that let you do this without complaint, because it’s a terrible idea. And C doesn’t even let you specify a binary format — it lets you write a struct that will correspond to a binary format in accordance with the C ABI on your particular system.

If you want to access a file containing a bunch of records using mmap, and you want a well defined format and good performance, then use something actually intended for the purpose. Cap’n Proto and FlatBuffers are fast but often produce rather large output; protobuf and its ilk are more space efficient and very widely supported; Parquet and Feather can have excellent performance and space efficiency if you use them for their intended purposes. And everything needs to deal with the fact that, if you carelessly access mmapped data that is modified while you read it in any C-like language, you get UB.


Replies

gopalvtoday at 6:09 AM

> correspond to a binary format in accordance with the C ABI on your particular system.

We're so deep in this hole that people are fixing this on a CPU with silicon.

The Graviton team made a little-endian version of ARM just to allow lazy code like this to migrate away from Intel chips without having to rewrite struct unpacking (& also IBM with the ppc64le).

Early in my career, I spent a lot of my time reading Java bytecode into little endian to match all the bytecode interpreter enums I had & completely hating how 0xCAFEBABE would literally say BE BA FE CA (jokingly referred as "be bull shit") in a (gdb) x views.

show 4 replies
dvttoday at 5:59 AM

Had the same thought. Also confused at the backhanded compliment that pickle got:

> Just look at Python's pickle: it's a completely insecure serialization format. Loading a file can cause code execution even if you just wanted some numbers... but still very widely used because it fits the mix-code-and-data model of python.

Like, are they saying it's bad? Are they saying it's good? I don't even get it. While I was reading the post, I was thinking about pickle the whole time (and how terrible that idea is, too).

show 1 reply
pjmlptoday at 8:18 AM

Yeah, and as you well put it, it isn't even some snowflake feature only possible in C.

The myth that it was a gift from Gods doing stuff nothing else can make it, persists.

And even on the languages that don't, it isn't if as a tiny Assembly thunk is the end of the world to write, but apparently at a sign of a plain mov people run to the hills nowadays.

show 1 reply
socalgal2today at 7:46 AM

it's not a terrible idea. It has it's uses. You just have to know when to use it and when not to use it.

For example, to have fast load times and zero temp memory overhead I've used that for several games. Other than changing a few offsets to pointers the data is used directly. I don't have to worry about incompatibilities. Either I'm shipping for a single platform or there's a different build for each platform, including the data. There's a version in the first few bytes just so during dev we don't try to load old format files with new struct defs. But otherwise, it's great for getting fast load times.

show 1 reply
Negitivefragstoday at 5:59 AM

Why is it such a terrible idea?

No need to add complexity, dependancies and reduced performance by using these libraries.

show 3 replies