logoalt Hacker News

taupiyesterday at 9:11 PM0 repliesview on HN

Glad you found it interesting!

What you described is the goal of Attainable SOL, but using GPU utilization as the metric rather than throughput. We're answering "for a given model and workload, have you optimized this well enough?", where "optimized" includes hyperparameter tuning. So if someone hasn't tuned batch size, parallelism, or other knobs well for their workload, the gap between their current utilization and the Attainable SOL is what tells them there's still room to improve.

We're motivated by the fact that reaching 100% Compute SOL is impossible -- no model can run at the hardware's theoretical maximum -- but we want to provide a realistic target for optimization. And we've noticed that different model architectures have different realistic ceilings. For example, MoE models run at much worse utilization due to their sparsity. We don't expect you to retrain an MoE model in order to get a higher utilization, and no hyperparameter tuning can bring you close to 100%, so the maximum attainable SOL should be lower for that model.