logoalt Hacker News

plutodevtoday at 3:30 AM2 repliesview on HN

[flagged]


Replies

oefrhatoday at 7:48 AM

You’re subtly pushing the same product in basically every one of your comments. If these are good faith comments please edit out the product name, it’s unnecessary and doing so as a green account just makes people consider you a spammer. Establish yourself first.

show 2 replies
kouteiheikatoday at 6:45 AM

> On the infra side, training a 1.5B model in ~4 hours on 8×H100 is impressive.

It's hard to compare without more details about the training process and the dataset, but, is it? Genuine question, because I had the opposite impression. Like, for example, recently I did a full finetuning run on a 3B model chewing through a 146k entry dataset (with 116k entries having reasoning traces, so they're not short) in 7 hours on a single RTX 6000.

show 1 reply