INTELLECT–1: Launching the First Decentralized Training of a 10B Parameter Model

111 points • by jasondavies • 10/11/2024 • 36 comments • view on HN

Comments

PoignardAzur • 10/12/2024

A lot of comment are sneering at various aspects of this press release, and yeah, there's some cringeworthy stuff.

But the technical aspects are pretty cool:

- Fault-tolerant training where nodes and be added and removed mid-run without interrupting the other nodes.

- Sending quantized gradients during the synchronization phase.

- (In the OpenDiLoCo article) Async synchronization.

They're also mentioning potential trustless systems where everyone can contribute compute, which would make this a truly decentralized open platform. Overall it'll be pretty interesting to see where this goes!

➕ show 1 reply

oefrha • 10/12/2024

Well I don’t have 8xH100s, but if I do, I’m probably not gonna donate it a VC-funded company. Remember “Open”AI?

https://pitchbook.com/profiles/company/588977-92

➕ show 2 replies

ukuina • 10/12/2024

> Decentralized training of INTELLECT-1 currently requires 8x H100 SXM5 GPUs.

So, your garden-variety $0.5M desktop PC, then.

Cool, cool.

[1] https://viperatech.com/shop/nvidia-dgx-h100-p4387-system-640...

➕ show 1 reply

ikeashark • 10/12/2024

me: Oh cool, a project like Folding@Home but for AI compute, maybe I'll contribute as we-

> Decentralized training of INTELLECT-1 currently requires 8x H100 SXM5 GPUs.

me: and for that reason, I'm out

Also they state that later they will be adding the ability for you to contribute your own compute but how will they solve the problem of having to back-propagate to all of the remote nodes contributing to the project without egregiously slow training time?

macrolime • 10/12/2024

Not exactly what I would call decentralized training. More like distributed through multiple data centers.

Decentralized training would be when you can use consumer GPUs, but that's not likely to work with backpropagation directly, but maybe with one of the backpropagation approximating algorithms.

➕ show 1 reply

m3kw9 • 10/12/2024

But I can already train from 30 different vendors distributed across the US, why do I need to use a “decentralized” training system? Decentralized inferercing makes more sense as that is where things can be censored

dmitrygr • 10/11/2024

> solve decentralized training step-by-step to ensure AGI will be open-source, transparent, and accessible

One hell of an uncited leap from "we're multiplying a lot of numbers" to "AGI", as if it is a given

➕ show 1 reply

mountainriver • 10/12/2024

This is cool work, I’ve been watching the slow evolution of this space for a couple years and it feels like a good way we can ensure AI is owned and accessible to everyone.

openrisk • 10/12/2024

For some purposes a decentrally trained, open source LLM could be just fine? E.g. you want a stochastic parrot that is trained on a large, general purpose corpus of genuine public domain / creative commons content. Having such a tool widely available is still a quantum leap versus Lore Ipsum. Up to point you can take your time. There is no manic race to capitalize any hype. "slow open AI" instead of "fast closed AGI". Helpfully, the nature of the target corpus does not change every day. You can imagine, e.g., annual revisions, trained and rolled-out leisurely. Both costs and benefits get widely distributed.

James_K • 10/12/2024

My initial was quite negative, but having thought it through, I can see the logic in this. Having open models is better than closed models. That said, this page seems like a joke. Someone drank a little too much AI-koolaid methinks.

not_a_dane • 10/12/2024

Decentralised but very high entry barrier.

nickpsecurity • 10/12/2024

The main benefit of this type of decentralization seems to be minimizing the node cost. One can rent the cheapest nodes to use in the system. Even the temporary instances can be replaced with others. It’s also easy for system owners to donate time.

So, mostly cost reduction mixed with some cloud, vendor diversity.

pizza • 10/12/2024

So just spitballing here but this is likely a souped-up reverse engineered DisTrO [0] under the hood, right? Or could it be something else?

[0] https://www.youtube.com/watch?v=eLMJoCSjFbs

mt_ • 10/12/2024

> We quantize the pseudo-gradients to int8, reducing communication requirements by 400x.

Can someone explain if it does reduce the model quality overall?

➕ show 3 replies

monkeydust • 10/12/2024

Yea, come back when you can do this on BOINC.

saulrh • 10/12/2024

> Prime Intellect

Ah, yes, Prime Intellect, the AGI that went foom and genocided the universe because it was commanded to preserve human civilization without regard for human values. A strong contender for the least evil hostile superintelligence in fiction. What a wonderful thing to name your AI startup after. What's next, creating the Torment Nexus?

(my position on the book as a whole is more complex, but... really? Really?)

➕ show 3 replies

alt Hacker News

INTELLECT–1: Launching the First Decentralized Training of a 10B Parameter Model

Comments