"Finally, maybe this is controversial but ultimately progress in science is bottlenecked by real-world experiments."
I feel like this has been the vast majority of consensus around these halls? I can't count the number of HN comments I've nodded at around the idea that irl will become the bottleneck.
This shows just how completely detached from reality this whole "takeoff" narrative is. It's utterly baffling that someone would consider it "controversial" that understanding the world requires *observing the world*.
The hallmark example of this is life extension. There's a not insignificant fraction of very powerful, very wealthy people who think that their machine god is going to read all of reddit and somehow cogitate its way to a cure for ageing. But how would we know if it works? Seriously, how else do we know if our AGI's life extension therapy is working besides just fucking waiting and seeing if people still die? Each iteration will take years (if not decades) just to test.
No expert, more a hobbyist, but my understanding is that most serious people with longer timelines all believe "embodiment" training data ie data from robots operating in the world is the data they need to make the next step change in the growth of these things.
How to best get masses of robotics operating in the real world data is debated. Can you get there in Sim2Real, where, if you can create a physically sound enough sim you can train your robots in the virtual world much easier than ours. See ... eureka ? dr eureka? i forget the main paper. Hand spinning a pen. The boston dynamics dog on a rolling yoga ball. After a billion robots train for a million "years" in your virtual world, just transfer the "brain" to a physical robot.
Jim Fan of nvidia is one to follow there. Then there's tele-operation believers. Then there's mass deployment and iterate believers (musk's "self driving" rollout), there's iirc research that believes video games and video interpretation will be able to confer some of that data from operating in the real world, similar to how it's said transformers learned utilized the implicit structure of language to learn from unclean data, even properly ordered text has meaning embedded in its relative positional values.
Just my summary of what I've seen of researchers who agree scaling text and train time is old news, I mostly see them trying to figure out how to scale "embodied" ai data collection. or derive a VLA model in fancy ways (bigger training sets of robotic behavior around a standard robot form factor maybe?) all types of avenues but yes most serious people recognize the need for "embodied" data - at least that I've read.