Start front loading the models with 5k, 10k, 50k, 100k tokens of messy quasi related context, and th...

WarmWash • yesterday at 9:17 PM • 1 reply • view on HN

Start front loading the models with 5k, 10k, 50k, 100k tokens of messy quasi related context, and then run the benchmarks.

These models are ridiculously powerful with a blank slate. It's when they get loaded down with all the necessary (and inevitably unnecessary) context to complete the task that they really start to crumble and fold.

Replies

jballanc • yesterday at 10:09 PM

We need benchmarks that can distinguish between continuous learning and long-context extrapolation.

alt Hacker News

Replies