logoalt Hacker News

Ferret7446today at 1:39 AM3 repliesview on HN

Text is just bytes, and bytes are just text. I assume this is talking about human readable ASCII specifically.

I think the obsession with text comes down to two factors: conflating binary data with closed standards and poor tooling support. Text implies a baseline level of acceptable mediocrity for both. Consider a CSV file will millions of base64 encoded columns and no column labels. That would really not be any friendlier than a binary file with a openly documented format and suitable editing tool, e.g. sqlite.

Maybe a lack of fundamental technical skills is another culprit, but binary files really aren't that scary.


Replies

bigstrat2003today at 2:37 AM

> Text is just bytes, and bytes are just text. I assume this is talking about human readable ASCII specifically.

Text is human readable writing (not necessarily ASCII). It is most certainly not just any old bytes the way you are saying.

show 2 replies
energy123today at 4:45 AM

Text is bytes that's accompanied with a major constraint on which sequences of bytes are permitted (a useful compression into principal axes that emerged over thousands of years of language evolution), along with a natural connection to human semantics that is due to universal adoption of the standard (allowing correlations to be modelled).

Text is like a complexity funnel (analogous to a tokenizer) that everyone shares. Its utility is derived from its compression and its standardization.

If everyone used binary data with their own custom interpretation schema, it might work better for that narrow vertical, but it would not have the same utility for LLMs.

xpetoday at 1:10 PM

> Maybe a lack of fundamental technical skills is another culprit, but binary files really aren't that scary.

Indeed, there is a galactic civilization centered around binary communication: https://memory-alpha.fandom.com/wiki/Bynar