logoalt Hacker News

drob518yesterday at 6:56 PM3 repliesview on HN

From the article:

> I was later surprised all the real world find implementations I examined use tree-walk interpreters instead.

I’m not sure why this would be surprising. The find utility is totally dominated by disk IOPS. The interpretation performance of find conditions is totally swamped by reading stuff from disk. So, keep it simple and just use a tree-walk interpreter.


Replies

adrian_btoday at 10:43 AM

The assumption that "find" performance is dominated by disk IOPS is not generally valid.

For instance, I normally compile big software projects in RAM disks (Linux tmpfs), because I typically use computers with no less than 64 GB of DRAM.

Such big software projects may have very great numbers of files and subdirectories and their building scripts may use "find".

In such a case there are no SSD or HDD I/O operations, everything is done in the main memory, so the intrinsic performance of "find" may matter.

Someoneyesterday at 8:27 PM

Is it truly simpler to do that? A separate “command line to byte codes” module would be way easier to test than one that also does the work, including making any necessary syscalls.

Also, decreasing CPU usage many not speed up find (much), but it would leave more time for running other processes.

show 2 replies
chubotyesterday at 7:48 PM

Yeah that's basically what was discussed here: https://lobste.rs/s/xz6fwz/unix_find_expressions_compiled_by...

And then I pointed to this article on databases: https://notes.eatonphil.com/2023-09-21-how-do-databases-exec...

Even MySQL, Duck DB, and Cockroach DB apparently use tree-walking to evaluate expressions, not bytecode!

Probably for the same reason - many parts are dominated by I/O, so the work on optimization goes elsewhere

And MySQL is a super-mature codebase

show 1 reply