> The Olivia system is an HPE Cray Supercomputing EX system, with 448 GPUs and 64,512 CPU cores.

solenoid0937 • today at 8:39 PM • 7 replies • view on HN

Training a sovereign LLM with this meager hardware as opposed to a LORA on some open source model seems like a huge mistake and a potential red flag.

There is no way these people have the resources to train a fully fledged LLM, so claiming that is their goal makes me think they don't intend for the LLM to be useful.

Which begs the question, whose money are they wasting - and why?

Replies

vslira • today at 9:10 PM

It may not be useful to anyone outside, but it's possible that one of the goals is institutional learning (that is, embedding the knowledge in how to build LLMs in an organization).

Even though it's nominally the national library behind this, they were probably chosen (as per the article) because they legally own and can use all NO material for this end. I'd guess researchers from related entities like unis will be involved in the process.

speedgoose • today at 9:19 PM

They successfully have made PoC finetunes before, so the next step is training fully fledged LLMs.

I don’t think they aim to anything worthwhile. The finetunes were incredibly broken. I’m guessing it’s more about having the method to do it. I’m not convinced it’s super useful but I’m not one to decide who gets to do what with the research funds.

One finetune I tried did make fun of humans expressing their feelings in the chat. Often.

One other finetune did hallucinate that it was a doctor and my baby had terrible diseases, every time I just wrote "hei" (with a generic neutral system prompt that likely triggered this behaviour though).

I think Olivia is big enough for what it’s used for. In my opinion it’s better to stay up to date and not waste too much money on hardware at the moment.

manquer • today at 9:46 PM

> this meager hardware

> they wasting - and why?

i18n language models are not area something frontier labs are focusing ton of resources on? ( certainly not in Norwegian)

The corpus of content in Norwegian - may not require very large clusters, or even if it does, this is best that the library could do, it would be certainly more than anyone else is investing in Norwegian models

SOTA models do not have the access to the quality of content that the national library does? The article mentions licensing with newspapers specifically, and the library has access to its own content archive.

English and Norwegian are not closely related language families, perhaps LoRA is not best approach?

I am curious if there is published research on how well localization works with LoRA depending on how far off the target language grammar/vocabulary is from English.

Projects like this typically have more than one objective and are not only building SOTA project, but is also to build/train foundational local talent , similar to universities launching satellites .

➕ show 1 reply

gunalx • today at 9:10 PM

The largest problem is available training data actually.

They have already done experiments with dittrent sub 10b models with both fine-tuning and fully from scratch. And last I check the fully from scratch captured the language in a better way.

kristjansson • today at 9:00 PM

DeepSeek claims to have trained on something like 2k H800, this is ~0.5k GH200 … it’s not nothing. Sure they’re not going to _serve_ it at scale, but that’s not the point?

Also the line between “finetuning a base model” and “man this is a real good initialization” gets pretty blurry at scale.

Altogether a pretty presumptuous take.

sgt • today at 8:45 PM

That's what they have access to right now. I am sure that will change in the future as the project progresses.

What do you suggest, that they stop and wait until they have the right HW?

➕ show 1 reply

otabdeveloper4 • today at 8:50 PM

> meager hardware

Qwen was made on a cluster about that size.

And this is before anybody ever thought about optimizing the training process. (Currently it's just pytorch analyst-as-coder slop, with extremely overprovisioned quantizations, etc.)

alt Hacker News

Replies