Yet software developed in C, with all of the foibles of its string routines, has been sold and runni...

alexfoo • yesterday at 4:32 PM • 5 replies • view on HN

Yet software developed in C, with all of the foibles of its string routines, has been sold and running for years with trillions of USD is total sales.

A library that records how much memory is allocated to a string along with the pointer isn't a necessity.

Most people who write in C professionally are completely used to it although the footgun is (and all of the others are) always there lurking.

You'd generally just see code like this:-

    char hostname[20];
    ...
    strncpy( hostname, input, 20 );
    hostname[19]=0;

The problem obviously comes if you forget the line to NUL that last byte AND you have a input that is greater than 19 characters long.

(It's also very easy to get this wrong, I almost wrote `hostname[20]=0;` first time round.)

I remember debugging a problem 20+ years ago on a customer site with some software that used Sybase Open/Server that was crashing on startup. The underlying TDS communications protocol (https://www.freetds.org/tds.html) had a fixed 30 byte field for the hostname and the customer had a particularly long FQDN that was being copied in without any checks on its length. An easy fix once identified.

Back then though the consequences of a buffer overrun were usually just a mild annoyance like a random crash or something like the Morris worm. Nowadays such a buffer overrun is deadly serious as it can easily lead to data exfiltration, an RCE and/or a complete compromise.

Heartbleed and Mongobleed had nothing to do with C string functions. They were both caused by trusting user supplied payload lengths. (C string functions are still a huge source of problems though.)

Replies

__float • yesterday at 4:49 PM

> Yet software developed in C, with all of the foibles of its string routines, has been sold and running for years with trillions of USD is total sales.

This doesn't seem very relevant. The same can be said of countless other bad APIs: see years of bad PHP, tons of memory safety bugs in C, and things that have surely led to significant sums of money lost.

> It's also very easy to get this wrong, I almost wrote `hostname[20]=0;` first time round.

Why would you do this separately every single time, then?

The problem with bad APIs is that even the best programmers will occasionally make a mistake, and you should use interfaces (or...languages!) that prevent it from happening in the first place.

The fact we've gotten as far as we have with C does not mean this is a defensible API.

➕ show 1 reply

t43562 • yesterday at 5:40 PM

I learned C in about 1989/1990 and have used it a lot since then. I have worked on a fair amount of rotten commercial C code, sold at a high price, in which every millimeter of extra functionality was bought with sweat and blood. I once spent a month finding a memory corruption issue that happened every 2 weeks with a completely different stack trace which, in the end, required a 1-line fix.

The effort was usually out of proportion with the achievement.

I crashed my own computer a lot before I got Linux. Do you remember far pointers? :-( In those days millions of dollars were made by operating systems without memory protection that couldn't address more than 640k of memory. One accepted that programs sometimes crashed the whole computer - about once a week on average.

Despite considering myself an acceptable programmer I still make mistakes in C quite easily and I use valgrind or the sanitizers quite heavily to save myself from them. I think the proliferation of other languages is the result of all this.

In spite of this I find C elegant and I think 90% of my errors are in string handling so therefore if it had a decent string handling library it would be enormously better. I don't really think pure ASCIIZ strings are so marvelous or so fast that we have to accept their bullshit.

➕ show 1 reply

throw0101c • yesterday at 11:56 PM

> The underlying TDS communications protocol (https://www.freetds.org/tds.html) had a fixed 30 byte field for the hostname and the customer had a particularly long FQDN that was being copied in without any checks on its length. An easy fix once identified.

I had to file a bug with a vendor because their hostname handling had a similar issue: I think it was 64 max.

There was some pushback about if it was "really" a problem, so I ended up quoting the relevant RFCs to argue that they were not compliant with Internet standards, and eventually they fixed the issue.

saghm • yesterday at 11:05 PM

> Yet software developed in C, with all of the foibles of its string routines, has been sold and running for years with trillions of USD is total sales.

Even with the premise that sales of software is a good metric for analyzing design of the language (which I think is arguable at best), we don't know that even more money might have been made with better strings in C. You coming justify pretty much anything with that argument. MongoDB (which indicentally is on C++ and presumably makes plenty of use of std::string) made millions of dollars despite having the bug you mention, so why bother fixing it?

➕ show 1 reply

bluecalm • yesterday at 7:12 PM

>>(It's also very easy to get this wrong, I almost wrote `hostname[20]=0;` first time round.)

Impossible to get wrong with a modern compiler that will warn you on that or LSP that will scream the moment you type ; and hit enter/esc.

alt Hacker News

Replies