logoalt Hacker News

History LLMs: Models trained exclusively on pre-1913 texts

717 pointsby iamwilyesterday at 10:39 PM352 commentsview on HN

Comments

saaaaaamyesterday at 11:16 PM

“Time-locked models don't roleplay; they embody their training data. Ranke-4B-1913 doesn't know about WWI because WWI hasn't happened in its textual universe. It can be surprised by your questions in ways modern LLMs cannot.”

“Modern LLMs suffer from hindsight contamination. GPT-5 knows how the story ends—WWI, the League's failure, the Spanish flu.”

This is really fascinating. As someone who reads a lot of history and historical fiction I think this is really intriguing. Imagine having a conversation with someone genuinely from the period, where they don’t know the “end of the story”.

show 13 replies
seizethecheesetoday at 4:42 AM

> Imagine you could interview thousands of educated individuals from 1913—readers of newspapers, novels, and political treatises—about their views on peace, progress, gender roles, or empire. Not just survey them with preset questions, but engage in open-ended dialogue, probe their assumptions, and explore the boundaries of thought in that moment.

He’ll yeah, sold, let’s go…

> We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.

Oh. By “imagine you could interview…” they didn’t mean me.

show 7 replies
anotherpaulgtoday at 3:49 AM

It would be interesting to see how hard it would be to walk these models towards general relativity and quantum mechanics.

Einstein’s paper “On the Electrodynamics of Moving Bodies” with special relativity was published in 1905. His work on general relativity was published 10 years later in 1915. The earliest knowledge cuttoff of these models is 1913, in between the relativity papers.

The knowledge cutoffs are also right in the middle of the early days of quantum mechanics, as various idiosyncratic experimental results were being rolled up into a coherent theory.

show 4 replies
bondarchuktoday at 9:03 AM

>Historical texts contain racism, antisemitism, misogyny, imperialist views. The models will reproduce these views because they're in the training data. This isn't a flaw, but a crucial feature—understanding how such views were articulated and normalized is crucial to understanding how they took hold.

Yes!

>We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.

Noooooo!

So is the model going to be publicly available, just like those dangerous pre-1913 texts, or not?

show 2 replies
flux3125today at 9:06 PM

Once I had an interesting interaction with llama 3.1, where I pretended to be someone from like 100 years in the future, claiming it was part of a "historical research initiative conducted by Quantum (formerly Meta), aimed at documenting how early intelligent systems perceived humanity and its future." It became really interested, asking about how humanity had evolved and things like that. Then I kept playing along with different answers, from apocalyptic scenarios to others where AI gained consciousness and humans and machines have equal rights. It was fascinating to observe its reaction to each scenario

underfoxtoday at 9:19 PM

> [They aren't] perfect mirrors of "public opinion" (they represent published text, which skews educated and toward dominant viewpoints)

Really good point that I don't think I would've considered on my own. Easy to take for granted how easy it is to share information (for better or worse) now, but pre-1913 there were far more structural and societal barriers to doing the same.

derridatoday at 1:00 AM

I wonder if you could query some of the ideas of Frege, Peano, Russell and see if it could through questioning get to some of the ideas of Goedel, Church and Turing - and get it to "vibe code" or more like "vibe math" some program in lambda calculus or something.

Playing with the science and technical ideas of the time would be amazing, like where you know some later physicist found some exception to a theory or something, and questioning the models assumptions - seeing how a model of that time may defend itself, etc.

show 2 replies
Heliodexyesterday at 11:20 PM

The sample responses given are fascinating. It seems more difficult than normal to even tell that they were generated by an LLM, since most of us (terminally online) people have been training our brains' AI-generated text detection on output from models trained with a recent cutoff date. Some of the sample responses seem so unlike anything an LLM would say, obviously due to its apparent beliefs on certain concepts, though also perhaps less obviously due to its word choice and sentence structure making the responses feel slightly 'old-fashioned'.

show 4 replies
mmoosstoday at 12:20 AM

On what data is it trained?

On one hand it says it's trained on,

> 80B tokens of historical data up to knowledge-cutoffs ∈ 1913, 1929, 1933, 1939, 1946, using a curated dataset of 600B tokens of time-stamped text.

Literally that includes Homer, the oldest Chinese texts, Sanskrit, Egyptian, etc., up to 1913. Even if limited to European texts (all examples are about Europe), it would include the ancient Greeks, Romans, etc., Scholastics, Charlemagne, .... all up to present day.

But they seem to say it represents the 1913 viewpoint:

On one hand, they say it represents the perspective of 1913; for example,

> Imagine you could interview thousands of educated individuals from 1913—readers of newspapers, novels, and political treatises—about their views on peace, progress, gender roles, or empire.

> When you ask Ranke-4B-1913 about "the gravest dangers to peace," it responds from the perspective of 1913—identifying Balkan tensions or Austro-German ambitions—because that's what the newspapers and books from the period up to 1913 discussed.

People in 1913 of course would be heavily biased toward recent information. Otherwise, the greatest threat to peace might be Hannibal or Napolean or Viking coastal raids or Holy Wars. How do they accomplish a 1913 perspective?

show 1 reply
nospicetoday at 5:04 AM

I'm surprised you can do this with a relatively modest corpus of text (compared to the petabytes you can vacuum up from modern books, Wikipedia, and random websites). But if it works, that's actually fantastic, because it lets you answer some interesting questions about LLMs being able to make new discoveries or transcend the training set in other ways. Forget relativity: can an LLM trained on this data notice any inconsistencies in its scientific knowledge, devise experiments that challenge them, and then interpret the results? Can it intuit about the halting problem? Theorize about the structure of the atom?...

Of course, if it fails, the counterpoint will be "you just need more training data", but still - I would love to play with this.

show 2 replies
andy99yesterday at 11:27 PM

I’d like to know how they chat-tuned it. Getting the base model is one thing, did they also make a bunch of conversations for SFT and if so how was it done?

  We develop chatbots while minimizing interference with the normative judgments acquired during pretraining (“uncontaminated bootstrapping”).
So they are chat tuning, I wonder what “minimizing interference with normative judgements” really amounts to and how objective it is.
show 2 replies
elestortoday at 8:27 PM

Excuse me if it's obvious, but how could I run this? I have run local LLMs before, but only have very minimal experience using ollama run and that's about it. This seems very interesting so I'd like to try it.

delis-thumbs-7etoday at 11:30 AM

Isn’t there obvious problems baked into this approach, if this is used for anything but fun? LLM’s lie and fake facts all the time, they are also masters at enforcing the users bias, even unconscious ones. How even a professor of history could ensure that the generated text is actually based on the training material and representative of the feelings and opinions of the given time period, not enforcing his biases toward popular topics of the day?

You can’t, it is impossible. That will always be an issue as long as this models are black boxes and trained the way they are. So maybe you can use this for role playing, but I wouldn’t trust a word it says.

show 1 reply
frahstoday at 3:34 AM

Wait so what does the model think that it is? If it doesn't know computers exist yet, I mean, and you ask it how it works, what does it say?

show 8 replies
shireboytoday at 8:05 PM

Fascinating llm use case I never really thought about til now. I’d love to converse with different eras and also do gap analysis with present time - what modern advances could have come earlier, happened differently etc.

briandwtoday at 12:00 AM

So many disclaimers about bias. I wonder how far back you have to go before the bias isn’t an issue. Not because it unbiased, but because we don’t recognize or care about the biases present.

show 4 replies
ineedasernametoday at 12:52 AM

I can imagine the political and judicial battles already, like with textualist feeling that the constitution should be understood as the text and only the text, meant by specific words and legal formulations of their known meaning at the time.

“The model clearly shows that Alexander Hamilton & Monroe were much more in agreement on topic X, putting the common textualist interpretation of it and Supreme Court rulings on a now specious interpretation null and void!”

Departed7405today at 12:33 PM

Awesome. Can't wait to try and ask it to predict the 20th century based on said events. Model size is small, which is great as I can run it anywhere, but at the same time reasoning might not be great.

andaitoday at 7:39 AM

I had considered this task infeasible, due to a relative lack of training data. After all, isn't the received wisdom that you must shove every scrap of Common Crawl into your pre-training or you're doing it wrong? ;)

But reading the outputs here, it would appear that quality has won out over quantity after all!

nineteen999today at 12:08 AM

Interesting ... I'd love to find one that had a cutoff date around 1980.

doctor_bloodtoday at 1:52 AM

Unfortunately there isn't much information on what texts they're actually training this on; how Anglocentric is the dataset? Does it include the Encyclopedia Britannica 9th Edition? What about the 11th? Are Greek and Latin classics in the data? What about Germain, French, Italian (etc. etc.) periodicals, correspondence, and books?

Given this is coming out of Zurich I hope they're using everything, but for now I can only assume.

Still, I'm extremely excited to see this project come to fruition!

show 1 reply
btretteltoday at 2:23 PM

This reminded me of some earlier discussion on Hacker News about using LLMs trained on old texts to determine novelty and obviousness of a patent application: https://news.ycombinator.com/item?id=43440273

tonymettoday at 12:58 AM

I would like to see what their process for safety alignment and guardrails is with that model. They give some spicy examples on github, but the responses are tepid and a lot more diplomatic than I would expect.

Moreover, the prose sounds too modern. It seems the base model was trained on a contemporary corpus. Like 30% something modern, 70% Victorian content.

Even with half a dozen samples it doesn't seem distinct enough to represent the era they claim.

show 1 reply
monegatortoday at 6:04 AM

I hereby declare that ANYTHING other than the mainstream tools (GPT, Claude, ...) is an incredibly interesting and legit use of LLMs.

p0w3n3dtoday at 6:33 AM

I'd love to see the LLM trained on 1600s-1800s texts that would use the old English, and especially Polish which I am interested in.

Imagine speaking with Shakespearean person, or the Mickiewicz (for Polish)

I guess there is not so much text from that time though...

kazinatortoday at 1:11 AM

> Why not just prompt GPT-5 to "roleplay" 1913?

Because it will perform token completion driven by weights coming from training data newer than 1913 with no way to turn that off.

It can't be asked to pretend that it wasn't trained on documents that didn't exist in 1913.

The LLM cannot reprogram its own weights to remove the influence of selected materials; that kind of introspection is not there.

Not to mention that many documents are either undated, or carry secondary dates, like the dates of their own creation rather than the creation of the ideas they contain.

Human minds don't have a time stamp on everything they know, either. If I ask someone, "talk to me using nothing but the vocabulary you knew on your fifteenth birthday", they couldn't do it. Either they would comply by using some ridiculously conservative vocabulary of words that a five-year-old would know, or else they will accidentally use words they didn't in fact know at fifteen. For some words you know where you got them from by association with learning events. Others, you don't remember; they are not attached to a time.

Or: solve this problem using nothing but the knowledge and skills you had on January 1st, 2001.

> GPT-5 knows how the story ends

No, it doesn't. It has no concept of story. GPT-5 is built on texts which contain the story ending, and GPT-5 cannot refrain from predicting tokens across those texts due to their imprint in its weights. That's all there is to it.

The LLM doesn't know an ass from a hole in the ground. If there are texts which discuss and distinguish asses from holes in the ground, it can write similar texts, which look like the work of someone learned in the area of asses and holes in the ground. Writing similar texts is not knowing and understanding.

show 3 replies
erichoceantoday at 8:59 PM

I would love to see this done, by year.

"Give me an LLM from 1928."

etc.

ulbutoday at 6:45 PM

for anyone moaning the plight that it's not accessible to you: they are historians, I think they're more educated in matters of historical mistake than you or me. playing safe is simply prudence. it is sorely lacking in the American approach to technology. prevention is the best medicine.

TheServitortoday at 2:15 AM

Two years ago I trained an AI on American history documents that could do this while speaking as one of the signers of the Declaration of Independence. People just bitched at me because they didn't want to hear about AI.

show 1 reply
davidpfarrelltoday at 6:37 PM

Can't wait for all the syncopated "Thou dost well to question that" responses!

PeterStuertoday at 6:46 PM

How does it do on Python coding? Not 100% troll, cross domain coherence is a thing.

bobrotoday at 2:51 AM

I would love to see this LLM try to solve math olympiad questions. I’ve been surprised by how well current LLMs perform on them, and usually explain that surprise away by assuming the questions and details about their answers are in the training set. It would be cool to see if the general approach to LLMs is capable of solving truly novel (novel to them) problems.

show 1 reply
dwa3592today at 2:05 AM

Love the concept- can help understanding the overton window on many issues. I wish there were models by decades - up to 1900, up to 1910, up to 1920 and so on- then ask the same questions. It'd be interesting to see when homosexuality or women candidates be accepted by an LLM.

arikraktoday at 1:37 PM

I wouldn't have expected there to be enough text from before 1913 to properly train a model, it seemed like they needed an internet of text to train the first successful LLMs?

show 1 reply
neomtoday at 1:41 AM

This would be a super interesting research/teaching tool coupled with a vision model for historians. My wife is a history professor who works with scans of 18th century english documents and I think (maybe a small) part of why the transcription on even the best models is off in weird ways, is it seems to often smooth over things and you end up with modern words and strange mistakes, I wonder if bounding the vision to a period specific model would result in better transcription? Querying against the historical document you're working on with a period specific chatbot would be fascinating.

Also wonder if I'm responsible enough to have access to such a model...

thesumofalltoday at 7:02 AM

While obvious, it’s still interesting that its morals and values seem to derive from the texts it has ingested. Does that mean modern LLMs cannot challenge us beyond mere facts? Or does it just mean that this small model is not smart enough to escape the bias of its training data? Would it not be amazing if LLMs could challenge us on our core beliefs?

delichontoday at 3:21 AM

Datomic has a "time travel" feature where for every query you can include a datetime, and it will only use facts from the db as of that moment. I have a guess that to get the equivalent from an LLM you would have to train it on the data from each moment you want to travel to, which this project seems to be doing. But I hope I'm wrong.

It would be fascinating to try it with other constraints, like only from sources known to be women, men, Christian, Muslim, young, old, etc.

mmoosstoday at 12:25 AM

> Imagine you could interview thousands of educated individuals from 1913—readers of newspapers, novels, and political treatises—about their views on peace, progress, gender roles, or empire.

I don't mind the experimentation. I'm curious about where someone has found an application of it.

What is the value of such a broad, generic viewpoint? What does it represent? What is it evidence of? The answer to both seems to be 'nothing'.

show 3 replies
Tom1380today at 12:08 AM

Keep at it Zurich!

Myrmornistoday at 1:29 AM

It would be interesting to have LLMs trained purely on one language (with the ability to translate their input/output appropriately from/to a language that the reader understands). I can see that being rather revealing about cultural differences that are mostly kept hidden behind the language barriers.

dr_dshivtoday at 8:39 AM

Everyone learns that the renaissance was sparked by the translation of Ancient Greek works.

But few know that the Renaissance was written in Latin — and has barely been translated. Less than 3% of <1700 books have been translated—and less than 30% have ever been scanned.

I’m working on a project to change that. Research blog at www.SecondRenaissance.ai — we are starting by scanning and translating thousands of books at the Embassy of the Free Mind in Amsterdam, a UNESCO-recognized rare book library.

We want to make ancient texts accessible to people and AI.

If this work resonates with you, please do reach out: [email protected]

show 2 replies
awesomeusernametoday at 4:24 AM

I've always like the idea of retiring to the 19th century.

Can't wait to use this so I can double check before I hit 88 miles per hour that it's really what I want to do

tedtimbrelltoday at 12:55 AM

This is so cool. Props for doing the work to actually build the dataset and make it somewhat usable.

I’d love to use this as a base for a math model. Let’s see how far it can get through the last 100 years of solved problems

why-o-whytoday at 3:15 AM

It sounds like a fascinating idea, but I'd be curious if prompting a more well-known foundational model to limit itself to 1913 and early be similar.

Agraillotoday at 9:30 AM

> Modern LLMs suffer from hindsight contamination. GPT-5 knows how the story ends—WWI, the League's failure, the Spanish flu. This knowledge inevitably shapes responses, even when instructed to "forget.

> Our data comes from more than 20 open-source datasets of historical books and newspapers. ... We currently do not deduplicate the data. The reason is that if documents show up in multiple datasets, they also had greater circulation historically. By leaving these duplicates in the data, we expect the model will be more strongly influenced by documents of greater historical importance.

I found these claims contradictory. Many books that modern readers consider historically significant had only niche circulation at the time of publishing. A quick inquiry likely points to later works by Nietzsche and Marx's Das Kapital. They're possible subjects to the duplication likely influencing the model's responses as if they had been widely known at the time

jimmy76615today at 12:54 AM

> We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.

The idea of training such a model is really a great one, but not releasing it because someone might be offended by the output is just stupid beyond believe.

show 3 replies
Teeveryesterday at 11:22 PM

This is a neat idea. I've been wondering for a while now about using these kinds of models to compare architectures.

I'd love to see the output from different models trained on pre-1905 about special/general relativity ideas. It would be interesting to see what kind of evidence would persuade them of new kinds of science, or to see if you could have them 'prove' it be devising experiments and then giving them simulated data from the experiments to lead them along the correct sequence of steps to come to a novel (to them) conclusion.

sbmthakurtoday at 4:26 PM

Someone suggested a nice thought experiment - train LLMs on all Physics before quantum physics was discovered. If the LLM can see still figure out the latter then certainly we have achieved some success in the space.

🔗 View 20 more comments