logoalt Hacker News

OliverGuytoday at 7:27 AM0 repliesview on HN

Edit, it looks like the paper does

TPUv5e with 16 tensor cores for 2 days for the 200M param model.

Claude reckons this is 60 hours on a 8xA100 rig, so very accessibile compared to LLMs for smaller labs