I've always wondered at the motivatons of the various string routines in C - every one of them ...

t43562 • yesterday at 2:40 PM • 6 replies • view on HN

I've always wondered at the motivatons of the various string routines in C - every one of them seems to have some huge caveat which makes them useless.

After years I now think it's essential to have a library which records at least how much memory is allocated to a string along with the pointer.

Something like this: https://github.com/msteinert/bstring

Replies

Someone • yesterday at 5:40 PM

> I've always wondered at the motivatons of the various string routines in C

This idiom:

    char hostname[20];
    ...
    strncpy( hostname, input, 20 );
    hostname[19]=0;

exists because strncpy was invented for copying file names that got stored in 14-byte arrays, zero terminated only if space permitted (https://stackoverflow.com/a/1454071)

➕ show 3 replies

jerf • yesterday at 4:57 PM

Yes, not having a length along with the string was a mistake. It dates from an era where every byte was precious and the thought of having two bytes instead of one for length was a significant loss.

I have long wondered how terrible it would have been to have some sort of "varint" at the beginning instead of a hard-coded number of bytes, but I don't have enough experience with that generation to have a good feel for it.

CupricTea • yesterday at 9:12 PM

>every one of them seems to have some huge caveat which makes them useless

They were added into C before enough of the people designing it knew the consequences they would bring. Another fundamentally broken oversight is array-to-pointer demotion in function signatures instead of having fat pointer types.

lesuorac • yesterday at 3:22 PM

It's from a time before computer viruses no?

But also all of this book-keeping takes up extra time and space which is a trade-off easily made nowadays.

➕ show 1 reply

formerly_proven • yesterday at 2:53 PM

strncpy is fairly easy, that's a special-purpose function for copying a C string into a fixed-width string, like typically used in old C applications for on-disk formats. E.g. you might have a char username[20] field which can contain up to 20 characters, with unused characters filled with NULs. That's what strncpy is for. The destination argument should always be a fixed-size char array.

A couple years ago we got a new manual page courtesy of Alejandro Colomar just about this: https://man.archlinux.org/man/string_copying.7.en

➕ show 4 replies

alexfoo • yesterday at 4:32 PM

Yet software developed in C, with all of the foibles of its string routines, has been sold and running for years with trillions of USD is total sales.

A library that records how much memory is allocated to a string along with the pointer isn't a necessity.

Most people who write in C professionally are completely used to it although the footgun is (and all of the others are) always there lurking.

You'd generally just see code like this:-

    char hostname[20];
    ...
    strncpy( hostname, input, 20 );
    hostname[19]=0;

The problem obviously comes if you forget the line to NUL that last byte AND you have a input that is greater than 19 characters long.

(It's also very easy to get this wrong, I almost wrote `hostname[20]=0;` first time round.)

I remember debugging a problem 20+ years ago on a customer site with some software that used Sybase Open/Server that was crashing on startup. The underlying TDS communications protocol (https://www.freetds.org/tds.html) had a fixed 30 byte field for the hostname and the customer had a particularly long FQDN that was being copied in without any checks on its length. An easy fix once identified.

Back then though the consequences of a buffer overrun were usually just a mild annoyance like a random crash or something like the Morris worm. Nowadays such a buffer overrun is deadly serious as it can easily lead to data exfiltration, an RCE and/or a complete compromise.

Heartbleed and Mongobleed had nothing to do with C string functions. They were both caused by trusting user supplied payload lengths. (C string functions are still a huge source of problems though.)

➕ show 5 replies

alt Hacker News

Replies