/// Encode a UTF-8 codepoint. /// […] //...

deathanatos • 01/18/2025 • 0 replies • view on HN

  /// Encode a UTF-8 codepoint.
  /// […]
  /// Returns a length of zero for invalid codepoints (surrogates and out-of-bounds values);
  /// it's up to the caller to turn that into U+FFFD, or return an error.

It's not a "UTF-8 codepoint", that's horridly mangling the terminology. Code points are just code points.

The input to a UTF-8 encode is a scalar value, not a code point, and encoding a scalar value is infallible. What doubly kills me is that Rust has a dedicated type for scalar values. (`char`.)

(In languages with non-[USV]-strings…, Python raises an exception, JS emits garbage.)

alt Hacker News