The reason linear A is so difficult is that the total remaining corpus of Linear A text is ~7500 characters, spread out over ~1500 inscriptions.
If you have a 4k screen, you can fit all remaining Linear A text on your screen at once, in 14pt high font.
A lot of loonies make this claim, but Tom's work is credible enough that it's being reviewed by linguistics experts at Rutgers and Cambridge. Additional validation: his approach produces results. He's translated over 300 words, and that's never been done before, and his solution actually solves some problems in Linear B. Tom is an AI engineer, and Claude Code was key to his work. Disclosures: I know Tom socially, and I wrote the post at the link.
This is very exciting. Congrats to Tom on the accomplishment.
To be clear, this is an attempt at a decipherment. This is not proven, and we shouldn't consider Linear A to be "solved" until experts in the field have reviewed the work. In fact, it probably shouldn't be considered "proof" unless some more Linear A writings are uncovered and these are congruent with the method proposed. All that can be said for certain at this point is that this is an interesting conjecture.
But this is a story worth following. This could be the real deal. More research and validation should follow and we should have a better idea in the next few weeks or months whether Linear A has really been solved. At the very least, this is an interesting attempt, and optimistically, it could yield real insight into Minoan culture. Kudos.
Isn't a big problem with Linear A that there are so few symbols you can "solve" it relatively straightforwardly with no way to tell if you it's correct or not?
Interesting writeup. Would be nice to have a couple images of Linear A/B scripts to visualize. Looking on google, they're very daunting!
I wonder how you would even know if you have “cracked” it, given the corpus is so small?
If confirmed this is really cool and impressive work.
Honestly curious how many years before it can be one shotted in a coding harness with Fable.next by someone who’s not a linguistics expert.
Develop, test, and rank hypotheses about the phonetic values, morphology, grammar, and possible language family of Linear A using the full available corpus. Do not assume any decipherment is correct. Treat all candidate readings as hypotheses to be scored…”
I wonder if LLMs trained specifically for this purpose can perform well with "forgotten languages".
I know I'm simplifying a lot, but all this deciphering isn't it just some kind of pattern matching?
Can I get his decipher-forgotten-ancient-text skill? I want to try my hand at the Voynich Manuscript
crossing my fingers for this guy.
however, nawaya or what ever examples around it are not part of the Hebrew language.
Alot of the comments in this thread are disappointing. Rather that celebrating an achievement (whether or it is validated yet), many of you seem to want to put him down, or make it seem like claude did all the work.
Claiming that claude did all the work is patently ridiculous. Claude is a tool, like any other. The corpus of linear A is ~7500 characters across ~1500 inscriptions and claude, no matter how smart, doesn't just solve that on its own.
What a shame.
Is this extendible to a generalizable approach to translate any language pair (without a translation map or translation dataset)?
would like to hear more about Tom's learning/education path in ML/AI.
relevant xkcd: https://xkcd.com/2151/
Sorry but I don’t recognize this as being an achievement by an amateur. This dude had no chance in hell until we trained a model to use his time to suss it out.
As an amateur who's been fascinated by this puzzle himself, I will add some context that might be relevant in assessing the plausibility of this claim:
- The "Libation Formula", which the author used as the base for his translations, is the most studied piece of writing in Linear A, because it's the only recurring phrase (with grammatical variation) that we have. The corpus is extremely fragmentary, with just a handful of instances of longer text (and even then, the texts are the length of an average sentence in English). The majority of documents available to us are lists (of inventory, personnel, offerings or something of this sort). The longer texts make use of punctuation marks, likely put in between words. This gives us a non-trivial vocabulary, which still does not match that of any known language.
- With such fragmentary remaining material, we cannot be sure that a) all the texts we call "Linear A" are written in the same language, and b) the recognizable words are not abbreviations, for example.
- The author made an assumption that Linear A symbols which have counterparts in Linear B should have the same phonetic values. This gives us an already known glyph that represented "NA". "Duplicate" glyphs are only found in the P-series, and are assumed to represent syllables which were distinguished by the Linear A language, but not by Greek - such as aspirated/unaspirated P. There is a glyph that stands for "NWA" in Linear B, but instances of it have been found in Linear A as well.
- There are countless words with no known etymology in Ancient Greek, assumed to originate from a substrate language or languages spoken in the area at the time Greeks migrated to their present-day homeland. The language of Linear A would be a likely candidate for such substrate. If Linear A were a Semitic language, then we should already be able to establish Semitic etymologies for those words as they were in Greek. Of course it could also be the case that these words came from an another language which did not adopt writing or its writing did not survive to our times.