I am not a professional statistician (only a BSc dropout) so I won‘t be able to gain the expertise required to evaluate the claim here: That double descent eliminates overfitting in LLMs.
That said, I see red flags here. This is an extraordinary claim, and extraordinary claims require extraordinary evidence. My actual degree (not the drop-out one) is in Psychology and I used statistics a lot during my degree, it is only BSc so again, I cannot claim expertise here either. But this claim and the abstracts I scanned in various papers to evaluate this claim, ring alarm bells all over. I don‘t trust it. It is precisely the thing that we were told to be aware of when we were taught scientific thinking.
In contrast, this political activist provided an example (an anecdote if you will) which showed how easy it was for an actual scientist to poison LLM models with a made up symptom. This looks like overfitting to me. These two Medium blog posts very much feel like errors in the data set which the models are all to happy to output as if it was inferred.
EDIT: I just watched that video, and I actually believe the claims in the video, however I do not believe your claim. If we assume that video is correct, your errors will only manifest in fewer hallucinations. Note that the higher parameter models in the demonstration the regression model traversed every single datapoint the sample, and that there was an optimal model with fewer parameters which had a better fit then the overfitted ones. This means that trillions of parameters indeed makes a model quite vulnerable to poison.
Almost certainly those weren't even in the training data. They showed up too soon; LLMs are retrained only every 6-12 months.
Instead, the LLM did a web search for 'bixonimania' and summarized the top results. This is not an example of training data poisoning.
>This is an extraordinary claim, and extraordinary claims require extraordinary evidence.
Well, I don't know what to tell you; double descent is widely accepted in ML at this point. Neural networks are routinely larger than their training data, and yet still generalize quite well.
That said, even a model that does not overfit can still repeat false information if the training data contains false information. It's not magic.