logoalt Hacker News

ray__today at 4:06 PM1 replyview on HN

This is a cool idea—I know from snooping on sumbit scripts and node utilization on the HPC that I use at my institution that most submissions leave some compute on the table (and many of them are egregiously bad). I'd probably vote in favor of sending every submitted sbatch script through an LLM (at least for everyone else, I'd would prefer tuning my own usage myself :) ).

Presumably the underlying model here is also an LLM? To what degree is it "fine-tuned", or is it just given a set of tools to build a good picture of cluster usage?


Replies

ismaeel_bashirtoday at 5:27 PM

Nope :) the core model isn’t an LLM. It’s a custom architecture built from the ground up. We natively accept multimodal inputs such as source code, submission scripts and hardware topologies. The LLMs in the post are the baselines we beat.

This is also why fine-tuning matters for us. We train a cluster-specific model that gets better as more jobs run on your cluster, because the same code behaves differently on different topology. An LLM reasons about code/script in a vacuum with no native sense of how your nodes actually perform

show 1 reply