Great write-up. Do most modern languages handle invalid surrogates gracefully, or is it still a "good luck" situation depending on the runtime?
The language handled it fine. It will generally just show replacement characters (�) for combos that don't map to anything.
It was really `encodeURIComponent` that didn't handle it gracefully.
If you just type this into the console (surrogate pair for cowboy smiley face emoji), you see it encodes it ("%F0%9F%A4%A0"):
encodeURIComponent("\uD83E\uDD20")
If you give it an invalid surrogate pair, it will throw an actual error:
encodeURIComponent("\uDD20\uD83E")
Modern string libraries largely use UTF-8 [0], and surrogates, regardless of whether they’re paired, are invalid in UTF-8. So, in a modern string library, as built in to most modern languages, you will not encounter surrogates except when translating between encodings.
[0] But everyone disagrees as to what indexing a string means, so you need to make an actual choice if you want anything involving indexing to match across languages.