Hijacking for a random concern:
I love JSON, but one of the technical problems we've ran into with JSON is that the spec forgot about all special characters.
I actually noticed it when reading Douglas Crockford's 2018 book, "How JavaScript Works". The mistake is on page 22.9 where it states that there are 32 control characters. There are not 32 control characters. There are 33 7-bit ASCII control characters and 65 Unicode control characters. When thinking in terms of ASCII, everyone always remembers the first 32 and forgets the 33rd, `del`. I then went back and noticed that it was also wrong in the RFC and subsequent revisions. (JSON is defined to be UTF-8 and is thus Unicode.)
Below is a RFC errata report just to point out the error for others.
Errata ID: 7673 Date Reported: 2023-10-11
Section 7 says:
The representation of strings is similar to conventions used in the C family of programming languages. A string begins and ends with quotation marks. All Unicode characters may be placed within the quotation marks, except for the characters that MUST be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).
It should say:
The representation of strings is similar to conventions used in the C family of programming languages. A string begins and ends with quotation marks. All Unicode characters may be placed within the quotation marks, except for the characters that MUST be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F, U+007F, and U+0080 through U+009F).
Notes:
There are 33 7-bit control characters, but the JSON RFC only listed 32 by omitting the inclusion of the last control character in the 7-bit ASCII range, 'del.' However, JSON is not limited to 7-bit ASCII; it is Unicode. Unicode encompasses 65 control characters from U+0080 to U+009F, totaling an additional 32 characters. The section that currently reads "U+0000 through U+001F" should include these additional control characters reading as "U+0000 through U+001F, U+007F, and U+0080 through U+009F"
---
I've chosen `del` to be my favorite control character since so many engineers forget it. Someone needs to remember that poor little guy.
The errata seems like a mistake.
Makes more sense to drop the term "control character" and leave the specification of which characters are not allowed as-is.
The cat's already out of the bag on this one. Changing the characters now will create a lot of invalid JSON in the world, with more being generated all the time.
Not to mention that it was set to be 127 so that it would be 8 holes punched out on paper tape, so you could use it to correct a paper tape by backspacing the tape by one position and hitting del.