logoalt Hacker News

_verandaguyyesterday at 5:16 PM6 repliesview on HN

This isn't really a reasonable approach, is it?

The original prompts aren't provided, nor is the original context; even then, you can't really treat a stochastic system like an LLM as a major component in reproducibility.


Replies

peterfireflyyesterday at 9:20 PM

I think I caught this guy's reddit posts on the subject. Someone was playing around with statistical analyses of a big Linear A corpus + some other corpora. There was an extremely clear signal that Linear A seemed to be much more similar to one other corpus than to the others. This was the first time I've ever heard of something that might* have been a good hint for decipherment. There's a Dutch professor emeritus (in linguistics) who claims it is Hurrian-Urartian and he's been posting youtube videos about his "decipherment" but he didn't seem too convincing to me.

Claude helped write code to read and parse the corpora and to do some fairly basic statistical analysis along the lines of "which Linear A symbols most often occur together" and "if we use known Linear B sound values, which of the other corpora most often have vowel similarities with the Linear A corpus".

You can write that code yourself or you can ask an LLVM to write it for you. The provenience of the code isn't important.

*) He later deleted some of them, I think. What was still there on reddit a few weeks ago had dead links to a web site of his with statistical tables and I believe also code.

ben_wyesterday at 5:32 PM

> even then, you can't really treat a stochastic system like an LLM as a major component in reproducibility.

If you had the other things, being "stochastic" is not even remotely a show-stopper. Stochastic processes abound and are the reason the mathematics of statistics was developed in the first place, ultimately allowing us to create such things as LLMs.

When all the relevant steps gets published, I absolutely expect a lot of people to (attempt to) reproduce this work even though LLMs are stochastic.

show 1 reply
Kosturdistanyesterday at 6:22 PM

Claude code was used to organize the material and to run simulations. The simulations were to determine the likelihood that the text was Semitic vs Tom got lucky. Tom has assigned probabilities to each of the syllables he has proposed sound values for.

fragmedeyesterday at 5:32 PM

Sure it is. We're humans, not robots (well, I think I am, and I presume you are as well, but for all we know, we could be living in a simulation), so if the non-deterministic system decides to generate code that calls the variable foo one day and bar the next, as long as the code still does what's being asked of it, why do I care that the non deterministic system chose to call the variable something different when run on Tuesday? There's the computer science definition of determinism and the engineering result of "does it work", which are at odds. It's like the halting problem. We haven't solved the computer science definition of the halting problem, but give some C code with a loop that won't terminate to Claude, and it'll call that out as not halting.

show 2 replies
iwontberudeyesterday at 5:33 PM

Actually it is because Claude did the work and being a lay person isn’t really that high of a bar.

show 1 reply
TeMPOraLyesterday at 6:08 PM

> stochastic system

Every day when you lower your butt onto your chair, you trust a stochastic system enough to assume you'll rest on the chair safely and not spontaneously phase through, which would lead to rather gory and painful terminal experience.

Physics at macro scale is stochastic, which is a good reminder that stochastic != uniformly random. Expected distributions matter.

show 1 reply