Curious about the baseline choice. modded-nanogpt was optimized for wall-clock speed, not data effic...

kseniamorph • yesterday at 9:08 PM • 1 reply • view on HN

Curious about the baseline choice. modded-nanogpt was optimized for wall-clock speed, not data efficiency, so it seems like an unusual reference point for this kind of benchmark. Why not vanilla NanoGPT?

Replies

timshel1 • yesterday at 10:09 PM

Modded-nanogpt is also much more data efficient than vanilla napogpt, even if some of the individual optimizations trade off higher throughput for worse data efficiency.

➕ show 1 reply

alt Hacker News

Replies