Hi, I am one of the maintainers of GNU Coreutils. Thanks for the article, it covers some interesting topics. In the little Rust that I have used, I have felt that it is far too easy to write TOCTOU races using std::fs. I hope the standard library gets an API similar to openat eventually.
I just want to mention that I disagree with the section titled "Rule: Resolve Paths Before Comparing Them". Generally, it is better to make calls to fstat and compare the st_dev and st_ino. However, that was mentioned in the article. A side effect that seems less often considered is the performance impact. Here is an example in practice:
$ mkdir -p $(yes a/ | head -n $((32 * 1024)) | tr -d '\n')
$ while cd $(yes a/ | head -n 1024 | tr -d '\n'); do :; done 2>/dev/null
$ echo a > file
$ time cp file copy
real 0m0.010s
user 0m0.002s
sys 0m0.003s
$ time uu_cp file copy
real 0m12.857s
user 0m0.064s
sys 0m12.702s
I know people are very unlikely to do something like that in real life. However, GNU software tends to work very hard to avoid arbitrary limits [1].Also, the larger point still stands, but the article says "The Rust rewrite has shipped zero of these [memory saftey bugs], over a comparable window of activity." However, this is not true [2]. :)
[1] https://www.gnu.org/prep/standards/standards.html#Semantics [2] https://github.com/advisories/GHSA-w9vv-q986-vj7x
First of all, thank you for presenting a succinct take on this viewpoint from the other side of the fence from where I am at.
So how can I learn from this? (Asking very aggressively, especially for Internet writing, to make the contrast unmistakable. And contrast helps with perceiving differences and mistakes.) (You also don’t owe me any of your time or mental bandwidth, whatsoever.)
So here goes:
Question 1:
How come "speed", "performance", race conditions and st_ino keep getting brought up?
Speed (latency), physically writing things out to storage (sequentially, atomically (ACID), all of HDD NVME SSD ODD FDD tape, "haskell monad", event horizons, finite speed of light and information, whatever) as well as race conditions all seem to boil down to the same thing. For reliable systems like accounting the path seems to be ACID or the highway. And "unreliable" systems forget fast enough that computers don’t seem to really make a difference there.
Question 2:
Does throughput really matter more than latency in everyday application?
Question 3 (explanation first, this time):
The focus on inode numbers is at least understandable with regards to the history of C and unix-like operating systems and GNU coreutils.
What about this basic example? Just make a USB thumb drive "work" for storing files (ignoring nand flash decay and USB). Without getting tripped up in libc IO buffering, fflush, kernel buffering (Hurd if you prefer it over Linux or FreeBSD), more than one application running on a multi-core and/or time-sliced system (to really weed out single-core CPUs running only a single user-land binary with blocking IO).
Sorry, complete noob here. Why didn't you just cd into $(yes a/ | head -n $((32 * 1024)) | tr -d '\n')? Why do you need to use the while loop for cd?
EDIT: got it. -bash: cd: a/a/a/....../a/a/: File name too long
To be fair, Vec::set_len bug in Rust was in 2021. And even then it had to be annotated as `unsafe`. It was then deprecated and a linter check was added: https://github.com/rust-lang/rust-clippy/issues/7681
Probably a dumb question, but is GNU Core utils interested in / planning on doing its own rust rewrite?