First common 32 bit system was Win 95, which required 4MB of RAM (not GB!). The 4-byte prefix would ...

theamk • today at 3:47 AM • 4 replies • view on HN

First common 32 bit system was Win 95, which required 4MB of RAM (not GB!). The 4-byte prefix would be considered extremely wasteful in those times - maybe not for a single string, but anytime when there is a list of strings involved, such as constants list. (As a point of reference, Turbo Pascal's default strings still had 1-byte length field).

Plus, C-style strings allow a lot of optimizations - if you have a mutable buffer with data, you can make a string out of them with zero copy and zero allocations. strtok(3) is an example of such approach, but I've implemented plenty of similar parsers back in the day. INI, CSV, JSON, XML - query file size, allocate buffer once, read it into the buffer, drop some NULL's into strategic positions, maybe shuffle some bytes around for that rare escape case, and you have a whole bunch of C strings, ready to use, and with no length limits.

Compared to this, Pascal strings would be incredibly painful to use... So you query file size, allocate, read it, and then what? 1-byte length is too short, and for 2+ byte length, you need a secondary buffer to copy string to. And how big should this buffer be? Are you going to be dynamically resizing it or wasting some space?

And sure, _today_ I no longer write code like that, I don't mind dropping std::string into my code, it'd just a meg or so of libraries and 3x overhead for short strings - but that's nothing those days. But back when those conventions were established, it was really really important.

Replies

kstrauser • today at 5:09 AM

> First common 32 bit system was Win 95

We're just going to ignore Amigas, and any Unix workstations?

amluto • today at 4:09 AM

> query file size, allocate buffer once, read it into the buffer, drop some NULL's into strategic positions, maybe shuffle some bytes around for that rare escape case, and you have a whole bunch of C strings, ready to use, and with no length limits.

I have also done this, but I would argue that, even at the time, the design was very poor. A much much better solution would have been wise pointers — pass around the length of the string separately from the pointer, much like string_view or Rust’s &str. Then you could skip the NULL-writing part.

Maybe C strings made sense on even older machines which had severely limited registers —- if you have an accumulator and one resister usable as a pointer, you want to minimize the number of variables involved in a computation.

delta_p_delta_x • today at 4:33 AM

> zero copy and zero allocations

This is a red herring, because when you actually read the strings out, you still need to iterate through the length for each string—zero copy, zero allocation, but linear complexity.

I write parsers in a very different way—I keep the file buffer around as read-only until the end of the pipeline, prepare string views into the buffer, and pipe those along to the next step.

➕ show 1 reply

dev-ns8 • today at 5:38 AM

Besides my DA/Algo classes in College, I've never used C seriously. And you know, it's semantics like this that really make me go WTF lol....

From strtok man page... "The first time that strtok() is called, str should be specified; subsequent calls, wishing to obtain further tokens from the same string, should pass a null pointer instead."

Really?? a null pointer.. This is valid code:

  char str[] = "C is fucking weird, ok? I said it, sue me.";
  char *result = strtok(str, ",");
  char *res = strtok(NULL, ",");

Why is that ok?

➕ show 2 replies

alt Hacker News

Replies