I agree we shouldn't undersell or underestimate the complexity involved, but when LLM's start contributing significant ideas to scientists and mathematicians, its time to recognize that whatever tricks are used in biology (humans, octopuses, ...) may still be of interest and of value, but they no longer seem like the unique magical missing ingredients which were so long sought after.
From this point on its all about efficiencies:
modeling efficiency: how do we best fit the elephant, with bezier curves, rational polynomials, ...?
memory bandwidth training efficiency: when building coincidence statistics, say bigrams, is it really necessary to update the weights for all concepts? a co-occurence of 2 concepts should just increase the predicted probability for the just observed bigram and then decrease a global coefficient used to scale the predicted probabilities. I.e. observing a baobab tree + an elephant in the same image/sentence/... should not change the relative probabilities of observing french fries + milkshake versus bicycle + windmill. This indicates different architectures should be possible with much lower training costs, by only updating weights of the concepts observed in the last bigram.
and so on with all other kinds of efficiencies.