I've always wondered at the motivatons of the various string routines in C - every one of them seems to have some huge caveat which makes them useless.
After years I now think it's essential to have a library which records at least how much memory is allocated to a string along with the pointer.
Something like this: https://github.com/msteinert/bstring
Yes, not having a length along with the string was a mistake. It dates from an era where every byte was precious and the thought of having two bytes instead of one for length was a significant loss.
I have long wondered how terrible it would have been to have some sort of "varint" at the beginning instead of a hard-coded number of bytes, but I don't have enough experience with that generation to have a good feel for it.
>every one of them seems to have some huge caveat which makes them useless
They were added into C before enough of the people designing it knew the consequences they would bring. Another fundamentally broken oversight is array-to-pointer demotion in function signatures instead of having fat pointer types.
It's from a time before computer viruses no?
But also all of this book-keeping takes up extra time and space which is a trade-off easily made nowadays.
strncpy is fairly easy, that's a special-purpose function for copying a C string into a fixed-width string, like typically used in old C applications for on-disk formats. E.g. you might have a char username[20] field which can contain up to 20 characters, with unused characters filled with NULs. That's what strncpy is for. The destination argument should always be a fixed-size char array.
A couple years ago we got a new manual page courtesy of Alejandro Colomar just about this: https://man.archlinux.org/man/string_copying.7.en
Yet software developed in C, with all of the foibles of its string routines, has been sold and running for years with trillions of USD is total sales.
A library that records how much memory is allocated to a string along with the pointer isn't a necessity.
Most people who write in C professionally are completely used to it although the footgun is (and all of the others are) always there lurking.
You'd generally just see code like this:-
char hostname[20];
...
strncpy( hostname, input, 20 );
hostname[19]=0;
The problem obviously comes if you forget the line to NUL that last byte AND you have a input that is greater than 19 characters long.(It's also very easy to get this wrong, I almost wrote `hostname[20]=0;` first time round.)
I remember debugging a problem 20+ years ago on a customer site with some software that used Sybase Open/Server that was crashing on startup. The underlying TDS communications protocol (https://www.freetds.org/tds.html) had a fixed 30 byte field for the hostname and the customer had a particularly long FQDN that was being copied in without any checks on its length. An easy fix once identified.
Back then though the consequences of a buffer overrun were usually just a mild annoyance like a random crash or something like the Morris worm. Nowadays such a buffer overrun is deadly serious as it can easily lead to data exfiltration, an RCE and/or a complete compromise.
Heartbleed and Mongobleed had nothing to do with C string functions. They were both caused by trusting user supplied payload lengths. (C string functions are still a huge source of problems though.)
> I've always wondered at the motivatons of the various string routines in C
This idiom:
exists because strncpy was invented for copying file names that got stored in 14-byte arrays, zero terminated only if space permitted (https://stackoverflow.com/a/1454071)