logoalt Hacker News

Lichtsolast Saturday at 9:09 PM3 repliesview on HN

> Why bother developing chatbots

Maybe it is the reverse? It is not them offering a product, it is the users offering their interaction data. Data which might be harvested for further training of the real deal, which is not the product. Think about it: They (companies like OpenAI) have created a broad and diverse user base which without a second thought feeds them with up-to-date info about everything happening in the world, down to the individual life and even their inner thoughts. No one in the history of mankind ever had such a holistic view, almost gods eye. That is certainly something a super intelligence would be interested in. They may have achieved it already and we are seeing one of its strategies playing out. Not saying they have, but this observation would not be incompatible or indicate they haven't.


Replies

blibblelast Saturday at 10:25 PM

> No one in the history of mankind ever had such a holistic view, almost gods eye.

I distinctly remember search engines 30 years ago having a "live searches" page (with optional "include adult searches" mode)

show 2 replies
visargayesterday at 6:27 AM

It's not about achieving AGI as a final product, it's about building a perpetual learning machine fueled by real-time human interaction. I call it the human-AI experience flywheel.

People bring problems to the LLM, the LLM produces some text, people use it and later return to iterate. This iteration functions as a feedback for earlier responses from the LLM. If you judge an AI response by the next 20 rounds of interaction or more you can gauge if it was useful or not. They can create RLHF data this way, using hindsight or extra context from other related conversations of the same user on the same topic. That works because users try the LLM ideas in reality and bring outcome results back to the model, or they simply recall from their personal experience if that approach would work or not. The system isn't just built to be right; it's built to be correctable by the user base, at scale.

OpenAI has 500M users, if they generate 1000 tokens/user/day that means 0.5T interactive tokens/day. The chat logs dwarf the original training set in size and are very diverse, targeted to our interests, and mixed with feedback. They are also "on policy" for the LLM, meaning they contain corrections to mistakes the LLM made, not generic information like web scrape.

You're right that LLMs eventually might not even need to crawl the web, they have the whole society dump data into their open mouths. That did not happen with web search engines, only social networks did that in the past. But social networks are filled with our cultural wars and self conscious posing, while the chat room is an environment where we don't need to signal our group alignment.

Web scraping gives you humanity's external productions - what we chose to publish. But conversational logs capture our thinking process, our mistakes, our iterative refinements. Google learned what we wanted to find, but LLMs learn how we think through problems.

show 1 reply
ysofunnylast Saturday at 10:24 PM

that possibility makes me feel weird about paying a subscription... they should pay me!

or the best models should be free to use. if it's free to use then I think I can live with it