Scaling Karpathy's Autoresearch: What Happens When the Agent Gets a GPU Cluster

47 points • by hopechong • today at 4:55 PM • 18 comments • view on HN

Comments

I feel like most of this recent Autoresearch trend boils down to reinventing hyper-parameter tuning. Is the SOTA still Bayesian optimization when given a small cluster? It was ~3 years ago when I was doing this kind of work, haven't kept up since then.

Also, shoutout SkyPilot! It's been a huge help for going multi-cloud with our training and inference jobs (getting GPUs is still a nightmare...)!

➕ show 3 replies

zhwu • today at 6:00 PM

The most surprising part: the agent had access to both H100s and H200s. Without being told, it noticed H200s scored better and started screening ideas on H100s, then promoting winners to H200s for validation. That strategy emerged entirely on its own.

➕ show 4 replies

fabmilo • today at 7:01 PM

I am fascinated by this example of using AI to improve AI. I won a small prize using this technique on helion kernels at a pytorch hackathon in SF.

The next step are: - give the agent the whole deep learning literature research and do tree search over the various ideas that have been proposed in the past. - have some distributed notepad that any of these agents can read and improve upon.

ipsum2 • today at 6:25 PM

A cluster is 2 nodes? That's technically true, but not very exciting.

covi • today at 6:00 PM

This feels like the chimpanzee with a power drill. An agent is honestly just brute-force search, but guided.

➕ show 3 replies

pratelsingh • today at 6:17 PM

[dead]

alt Hacker News

Scaling Karpathy's Autoresearch: What Happens When the Agent Gets a GPU Cluster

Comments