logoalt Hacker News

VorpalWaytoday at 11:33 AM1 replyview on HN

> A fairer comparison would be to stream the file in C++ as well and maintain internal state for the count.

Wouldn't memory mapping the data in Python be the more fair comparison? If the language doesn't support that, then this seems to absolutely be a fair comparison.

> For most people that would be the first/naive approach as well when they programmed something like this I think.

I disagree, my mind immediately goes to mmap when I have to deal with a single file that I have to read in it's entirety. I think the non-obvious solution here is rather io-uring (which I would expect to be faster if dealing with lots of small files, as you can load them async concurrently from the file system).


Replies

dgb23today at 1:36 PM

I'd make the bet that "most people" (who can program) would not think of mmap, but either about streaming or would even just load the whole thing into memory.

Ask a bunch of coding agents and they will give you these two versions, which means it's likely that the LLMs have seen these way more often than the mmap version. Both Opus and GPT even pushed back when I asked for mmap, both said it would "add complexity".

show 1 reply