logoalt Hacker News

bbminneryesterday at 3:52 PM2 repliesview on HN

I was really confused about the case folding, this page explained the motivation well https://jean.abou-samra.fr/blog/unicode-misconceptions

""" Continuing with the previous example of “ß”, one has lowercase("ss") != lowercase("ß") but uppercase("ss") == uppercase("ß"). Conversely, for legacy reasons (compatibility with encodings predating Unicode), there exists a Kelvin sign “K”, which is distinct from the Latin uppercase letter “K”, but also lowercases to the normal Latin lowercase letter “k”, so that uppercase("K") != uppercase("K") but lowercase("K") == lowercase("K").

The correct way is to use Unicode case folding, a form of normalization designed specifically for case-insensitive comparisons. Both casefold("ß") == casefold("ss") and casefold("K") == casefold("K") are true. Case folding usually yields the same result as lowercasing, but not always (e.g., “ß” lowercases to itself but case-folds to “ss”). """

One question I have is why have Kelvin sign that is distinct from Latin K and other indistinguishable symbols? To make quantified machine readable (oh, this is not a 100K license plate or money amount, but a temperature)? Or to make it easier for specialized software to display it in correct placed/units?


Replies

bee_rideryesterday at 4:03 PM

They seem to have (if I understand correctly) degree-Celsius and degree-Fahrenheit symbols. So maybe Kelvin is included for consistency, and it just happens to look identical to Latin K?

IMO the confusing bit is giving it a lower case. It is a symbol that happens to look like an upper case, not an actual letter…

show 1 reply
bawolffyesterday at 8:57 PM

> One question I have is why have Kelvin sign that is distinct from Latin K and other indistinguishable symbols?

Unicode has the goal of being a 1:1 mapping for all other character encodings. Usually weird things like this is so there can be a 1:1 reversible mapping to some ancient character encoding.