Interestingly, I find that the models generalize decently well as long as the "training" (...

winwang • yesterday at 11:34 PM • 1 reply • view on HN

Interestingly, I find that the models generalize decently well as long as the "training" (more analogous to that for humans) fits in (small enough) context. That's to say, "in-context learning" seems good enough for real use.

But of course, that's not quite "long term"

Replies

fc417fc802 • today at 2:02 AM

Given that models don't currently learn as they go isn't that exactly what this benchmark is testing? If the model needs to either have been explicitly trained in a similar environment or else to have a human manually input a carefully crafted prompt then it isn't general. The latter case is a human tuning a powerful tool.

If it can add the necessary bits to its own prompt while working on the benchmark then it's generalizing.

alt Hacker News

Replies