logoalt Hacker News

sejjelast Thursday at 4:14 PM2 repliesview on HN

I bet they'll only train on the internet snapshot from now, before LLMs.

Additional non-internet training material will probably be human created, or curated at least.


Replies

pc86last Thursday at 5:21 PM

This only makes sense if the percentage of LLM hallucinations is much higher than the percentage of things written on line being flat wrong (it's definitely not).

sosodevlast Thursday at 4:25 PM

Nope. Pretraining runs have been moving forward with internet snapshots that include plenty of LLM content.

show 1 reply