logoalt Hacker News

visargayesterday at 6:27 AM1 replyview on HN

It's not about achieving AGI as a final product, it's about building a perpetual learning machine fueled by real-time human interaction. I call it the human-AI experience flywheel.

People bring problems to the LLM, the LLM produces some text, people use it and later return to iterate. This iteration functions as a feedback for earlier responses from the LLM. If you judge an AI response by the next 20 rounds of interaction or more you can gauge if it was useful or not. They can create RLHF data this way, using hindsight or extra context from other related conversations of the same user on the same topic. That works because users try the LLM ideas in reality and bring outcome results back to the model, or they simply recall from their personal experience if that approach would work or not. The system isn't just built to be right; it's built to be correctable by the user base, at scale.

OpenAI has 500M users, if they generate 1000 tokens/user/day that means 0.5T interactive tokens/day. The chat logs dwarf the original training set in size and are very diverse, targeted to our interests, and mixed with feedback. They are also "on policy" for the LLM, meaning they contain corrections to mistakes the LLM made, not generic information like web scrape.

You're right that LLMs eventually might not even need to crawl the web, they have the whole society dump data into their open mouths. That did not happen with web search engines, only social networks did that in the past. But social networks are filled with our cultural wars and self conscious posing, while the chat room is an environment where we don't need to signal our group alignment.

Web scraping gives you humanity's external productions - what we chose to publish. But conversational logs capture our thinking process, our mistakes, our iterative refinements. Google learned what we wanted to find, but LLMs learn how we think through problems.


Replies

FuckButtonsyesterday at 6:52 AM

I see where you’re coming from, but I think teasing out something that looks like a clear objective function that generalizes to improved intelligence from llm interaction logs is going to be hellishly difficult. Consider, that most of the best llm pre training comes from being very very judicious with the training data, selecting the right corpus of llm interaction logs and then defining an objective function that correctly models…? Being helpful? From that sounds far harder than just working from scratch with rlhf.

show 1 reply