The unit economics seem pretty rough though. You're locking up 8xH100s for the compute of ~32B active parameters. I guess memory is the bottleneck but hard to see how the margins work on that.
Yes, it only makes sense economically if you have batching over many users.
Yes, it only makes sense economically if you have batching over many users.