logoalt Hacker News

weinzierltoday at 9:20 AM10 repliesview on HN

"For nearly thirty years, notepad.exe was the gold standard for a "dumb" utility which was a simple, win32-backed buffer for strings that did exactly one thing...display text."

Well, except that this did not prevent it from having embarrassing bugs. Google "Bush hid the facts" for an example. I'm serious, you won't be disappointed.

I think complexity is relative. At the time of the "Bush hid the facts" bug, nailing down Unicode and text encodings was still considered rocket science. Now this is a solved problem and we have other battles we fight.


Replies

usrbinbashtoday at 11:44 AM

As funny as the "Bush hid the facts" bug may be, there is a world of difference between an embarassing mistake by a function that guesses the text encoding wrong, and a goddamn remote code execution with an 8.8 score

> and we have other battles we fight.

Except no, we don't. notepad.exe was DONE SOFTWARE. It was feature complete. It didn't have to change. This is not a battle that needed fighting, this was hitting a brick wall with ones fist for no good reason, and then complaining about the resulting pain.

show 4 replies
dspilletttoday at 11:14 AM

> nailing down Unicode and text encodings was still considered rocket science. Now this is a solved problem

I wish…

Detecting text encoding is only easy if all you need to contend with is UTF16-with-BOM, UTF8-with-BOM, UTF8-without-BOM, and plain ASCII (which is effectively also UTF8). As soon as you might see UTF16 or UCS without a BOM, or 8-bit codepages other than plain ASCII (many apps/libs assume that these are always CP1252, a superset of the printable characters of ISO-8859-1, which may not be the case), things are not fully deterministic.

Thankfully UTF8 has largely won out over the many 8-bit encodings, but that leaves the interesting case of UTF8-with-BOM. The standard recommends against using it, that plain UTF8 is the way to go, but to get Excel to correctly load a UTF8 encoded CSV or similar you must include the BOM (otherwise it assumes CP 1252 and characters above 127 are corrupted). But… some apps/libs are completely unaware that UTF8-with-BOM is a thing at all so they load such files with the first column header corrupted.

Source: we have clients pushing & pulling (or having us push/pull) data back & forth in various CSV formats, and we see some oddities in what we receive and what we are expected to send more regularly than you might think. The real fun comes when something at the client's end processes text badly (multiple steps with more than one of them incorrectly reading UTF8 as CP1252, for example) before we get hold of it, and we have to convince them that what they have sent is non-deterministically corrupt and we can't reliably fix it on the receiving end…

show 2 replies
bszatoday at 11:37 AM

There is a difference between a bug you laugh at and walk away and a bug a scammer laughs at as he walks away with your money.

When I open something in Notepad, I don't expect it to be a possible attack vector for installing ransomware on my machine. I expect it to be text. It being displayed incorrectly is supposed to be the worst thing that could happen. There should be no reason to make Notepad capable of recognizing links, let alone opening them. Save that crap for VS Code or some other app I already know not to trust.

reyqntoday at 10:03 AM

Embarrassing bugs are not RCEs. Also the industry should be more mature now, not less. But move fast and break things, I guess...

show 1 reply
nuancebydefaulttoday at 10:34 AM

To be honest, the 'bush hid the facts' bug was funny and was not really a vulnerability that could be exploited, unless... you understood Chinese and the alternative text would manage to pursuade you to do something harmful.

In fact, those were the good days, when a mere affair with your secretary would be enough to jeopardize your career. The pendulum couldn't have swung more since.

show 1 reply
g947otoday at 9:53 AM

I am pretty sure it's possible to fix that entire category of bugs without introducing RCE vulnerabilities.

show 1 reply
jama211today at 9:41 AM

Fascinating reading about that bug, thanks for sharing

croestoday at 10:07 AM

> Now this is a solved problem

Is that so? I ran pretty often in problems with programs having trouble with non-ANSI characters

direwolf20today at 9:46 AM

It's not solved, we just don't have to guess the encoding any more because it's always UTF-8.