logoalt Hacker News

red75primetoday at 8:55 AM0 repliesview on HN

Does it generalize though? What a bag-of-words metaphor can say about a question "How many reinforcement learning training examples an LLM need to significantly improve performance on mathematical questions?"