logoalt Hacker News

thinking_cactustoday at 4:50 AM2 repliesview on HN

My contribution: largest-order-first (big endian) makes sense in real life because people tend to make quick judgements in unreliable situations. For example, take the announcement that you're receiving $132551 dollars. You wouldn't want to hear something like "Hello! You have been awarded one and fifty and five hundred and... and one hundred thousand dollars!", you want to hear "You have been awarded One hundred and thirty two thousand and ... dollars!" The largest sums change decisions dramatically so it makes sense they come first.

On computers however, we basically always use exact arithmetic and exact, fixed logic where learning the higher order doesn't help (we're not doing approximations and decisions based on incomplete information), in fact for mathematical reasons in the exact cases it's usually better to compute and utilize the lowest bits first (e.g. in the case of sums and multiplication algos I am familiar with). [note1]

Overall I'm slightly surprised some automatic/universal translation methods for the most common languages haven't been made, although I guess there may be some significant difficulties or impossibilities (for example, if you send a bunch of bits/bytes outside, there's no general way to predict the endianess it should be in). I suspect LLMs will make this task much easier (without a more traditional universal translation algorithm).

[note1] Also, the time required to receive all bits from say a 64b number as opposed to the first k bits tends to be a negligible or even 0 difference, in both human terms (receiving data over a network) and machine terms (receiving data over a bus; optimizing an algorithm that uses numbers in complicated ways; etc.), again different from human communication and thought.


Replies

Joker_vDtoday at 6:07 AM

> My contribution: largest-order-first (big endian) makes sense in real life because people tend to make quick judgements in unreliable situations. For example, take the announcement that you're receiving $132551 dollars. You wouldn't want to hear something like "Hello! You have been awarded one and fifty and five hundred and... and one hundred thousand dollars!", you want to hear "You have been awarded One hundred and thirty two thousand and ... dollars!" The largest sums change decisions dramatically so it makes sense they come first.

And yet in Arabic, the numbers are written in order from the least to the most significant digit, even if they are not really pronounced that way, starting from the numbers in the hundreds and up: "1234" is read as essentially "one thousand two hundred four-and-thirty", the same way the German does it. And yes, the order looks like it's the same as in e.g. English, but Arabic is written right to left. So, no, it's absolutely fine to write numbers in little endian even in the language that pronounces it the big-endian or even the mixed-endian way.

show 1 reply
Veservtoday at 6:02 AM

There are plenty of ways for language to be better now that we know far more about arithmetic than when number words were created.

"One Five Five Two Three One" is 6 words, 6 syllables long where as "One Hundred and Thirty Two Thousand" is 6 words, 9 syllables long and conveys less information. Even shortening it to "One Hundred Thirty Two Thousand" is still 5 words, 8 syllables long and conveys less information.

You can also easily convey high order digits first by using a unambiguous "and/add" construction: "Thousand Two Three One Add One Five Five". You have now conveyed the three high order digits in 5 words, 5 syllables. You also convey the full number in 9 words, 9 syllables in contrast to "One Hundred Thirty Two Thousand One Hundred Fifty Five" which is 9 words, 14 syllables.

You could go even further and express things in pseudo-scientific notation which would be even more general and close to as efficient. "Zero E Three (10^3) Two Three One" which is 6 words, 6 syllables, but no longer requires unique separator words like "Thousand", "Million", "Billion", etc. This shows even greater efficiency if you are conveying "One Hundred Thirty Thousand" which would be something more like "Zero E Four (10^4) Three One" since the scientific notation digit position description is highly uniform.

This distinction might seem somewhat arbitrary since this just seems like it is changing the order for the sake of things. However, the advantage of little-endian description is that it is non-contextual. When you say the number "One" it literally always means the one's place "One". If you wish to speak of a different positional "One" you would prefix it with the position e.g. "Zero E Three (10^3) One". In contrast, in the normal way of speaking numbers "One" could mean any positional one. Are you saying "One Hundred", "One Thousand", "One Hundred Million"? You need to wait for subsequent words to know what "One" is being said. Transcription must fundamentally buffer a significant fraction of the word stream to disambiguate.

It also results in the hilariously duplicative "One Hundred Thirty Two Thousand One Hundred Fifty Five" which has positional signifiers for basically every word. "One Hundred Thir-ty Thousand One Hundred Fif-ty Five”. Fully 8 of the 14 syllables are used for positional disambiguation to reduce necessary lookahead. "And/Add" constructions get you that for a fraction of the word and syllable count. They allow arbitrary chunking since you can separate digit streams on any boundary. It also reinforces the fact that numbers are just composites of their components which may help with numeracy.

Little endian is actually just better in every respect, expect for compatibility and familiarity, if we use our modern robust knowledge of arithmetic to formulate the grammar rules.