For context, this is the Lars Ingebrigtsen who wrote the manual for Gnus[0], a common Emacs package for reading email and Usenet. It’s clever, funny, and wildly informative. Lars has probably forgotten more about email parsing than 99% of us here will ever have learned.
The manual itself says[1]:
> Often when I read the manual, I think that we should take a collection up to have Lars psycho-analysed.
0: https://www.gnu.org/software/emacs/manual/html_mono/gnus.htm...
The most interesting thing to me wasn't the equals signs, which I knew are from quoted-printable, but the fact that when an equals sign appears, a letter that should have been preceding or following it is missing. It's as if an off-by-one error has occurred, where instead of getting rid of the equals sign, it's gotten rid of part of the actual text. Perhaps the CRLF/LF thing is part of it.
> We see that that’s a quite a long line. Mail servers don’t like that
Why do mail server care about how long a line is? Why don't they just let the client reading the mail worry about wrapping the lines?
I thought the article would be about the various meanings of operators like = == === .=. <== ==> <<== ==>> (==) => =~=
I'm just wondering why this problem shows up now. Why do lots of people suddenly post their old emails with a defective QP decoder?
> For some reason or other, people have been posting a lot of excerpts from old emails on Twitter over the last few days.
On the risk of having missed the latest meme or social media drama, but does anyone know what this "some reason or other" is?
Edit: Question answered.
I wrote my own email archiving software. The hardest part was dealing with all the weird edge cases in my 20+ year collection of .eml files. For being so simple conceptually, email is surprisingly complicated.
> So what’s happened here? Well, whoever collected these emails first converted from CRLF (i.e., “Windows” line ending coding) to “NL” (i.e., “Unix” line ending coding). This is pretty normal if you want to deal with email. But you then have one byte fewer:
I think there is a second possible conclusion, which is that the transformation happened historically. Everyone assumes these emails are an exact dump from Gmail, but isn't it possible that Epstein was syncing emails from Gmail to a third party mail server?
Since the Stackoverflow post details the exact situation in 2011, I think we should be open to the idea that we're seeing data collected from a secondary mail server, not Gmail directly.
Do we have anything to discount this?
(If I'm not mistaken, I think you can also see the "=" issue simply by applying the Quoted-Printable encoding twice, not just by mishandling the line-endings, which also makes me think two mail servers. It also explains why the "=" symbol is retained.)
(The title of the blog reminded me the late Bob Pease [1] who had the signature, "What's all this XXX stuff, anyhow?" [2] where XXX might be "noise gain", "capacitor leakage"…)
CLRF vs LF strikes again. Partly at least.
I wonder why even have a max line length limit in the first place? I.e. is this for a technical reason or just display related?
https://web.archive.org/web/20260203094902/https://lars.inge...
Did the site get the HN kiss of death?
Fun how the archive.today article near the top has this exact issue
I love how HN always floats up the answers to questions that were in my mind, without occupying my mind.
I, too, was reading about the new Epstein files, wondering what text artifact was causing things to look like that.
My main takeaway from this article, is that I want to know what happened to the modified pigs with non-cloven hoofs
cat title | sed 's/anyway/in email/'
would save a click for those already familiar with =20 etc.Great. Can't wait for equal signs to be the next (((whatever this is))). Maybe it's a secret code. j/k
On a side note: There are actually products marketed as kosher bacon (it's usually beef or turkey). And secular Jews frequently make jokes like this about our kosher bros who aren't allowed to eat the real stuff for some dumb reason like it has too many toes.
"It’s a fascinating case of 'Abstraction Leak'.
We’ve become so accustomed to modern libraries handling encoding transparently that when raw data surfaces (like in these dumps), we often lack the 'Digital Archeology' skills to recognize basic Quoted-Printable.
These artifacts (=20, =3D) are effectively fossils of the transport layer. It’s a stark reminder that underneath our modern AI/React/JSON world, the internet is still largely held together by 7-bit ASCII constraints and protocols from the 1980s.
[dead]
[flagged]
Could be worsened by inaccurate optical character recognition in some cases.
Back in those days optical scanners were still used.
People posting Excel formulae?
Rock dots? You mean diacritics? Yeah someone invented them: the ancient Greeks, idiöt.
The real punchline is that this is a perfect example of "just enough knowledge to be dangerous." Whoever processed these emails knew enough to know emails aren't plain text, but not enough to know that quoted-printable decoding isn't something you hand-roll with find-and-replace. It's the same class of bug as manually parsing HTML with regex, it works right up until it doesn't, and then you get congressional evidence full of mystery equals signs.