logoalt Hacker News

Norway's 2 petabytes of Huawei flash storage and LLM training

71 pointsby rbanffytoday at 7:37 PM40 commentsview on HN

Comments

solenoid0937today at 8:39 PM

> The Olivia system is an HPE Cray Supercomputing EX system, with 448 GPUs and 64,512 CPU cores.

Training a sovereign LLM with this meager hardware as opposed to a LORA on some open source model seems like a huge mistake and a potential red flag.

There is no way these people have the resources to train a fully fledged LLM, so claiming that is their goal makes me think they don't intend for the LLM to be useful.

Which begs the question, whose money are they wasting - and why?

show 7 replies
TrackerFFtoday at 8:44 PM

I'm a Norwegian, and I use the national library almost every day for searching through texts. They have truly one of the best working user interfaces (and functionality) for searching through the massive amounts of text.

show 1 reply
dzhiurgistoday at 9:50 PM

That's about 350MB per capita. Humans can produce 2-6kb per hour. That's 13 years of non-stop typing. Wonder where it all comes from. I guess it's websites that aren't compressed / extracted.

show 1 reply
timmgtoday at 9:14 PM

I wonder if instead (or in parallel), Norway should build a set of training data and share it (for free) with all the model builders.

Seems like making the frontier models know Norwegian and their culture is a better (or additional!) way to reach the end they are going for here.

show 1 reply
kvamtoday at 8:40 PM

As a Norwegian this sounds like a mistake. Who will use this LLM? Where? For what? The underlying data could be made more easily searchable and digestible for agents in general if the goal is better knowledge of Norwegian culture.

show 3 replies
Levitztoday at 8:37 PM

>As Husnes put it; Norway is a small country solving a problem every non-English-speaking nation will face: how do you build AI that reflects your language, your culture and your history? AI needs custodians, not just builders.

I'm afraid the answer is, mostly you don't.

Such a thing requires strong political will that, at least in my environment, seems basically impossible to align.

The costs are prohibitive, but beyond that, the type of person who cares about local representation like that is either completely fine with letting foreign companies implement it (after all, you can use ChatGPT in Basque if you want to) or is against the idea of AI altogether.

show 3 replies
arjietoday at 8:53 PM

This can’t be right. 2 PB of flash is like $200k. It’s within reach of many individuals. Then again I guess you don’t need that much storage so maybe it is.

show 2 replies
dalemhurleytoday at 8:44 PM

How about that, they actually asked for permission to use data and the companies said yes.

ipsum2today at 8:33 PM

This is how much storage the average r/datahoarder user has in their basement. Fewer than 100 hard drives.

show 1 reply
Den_VRtoday at 8:31 PM

> He asserted that any country with its own language that did not have a sovereign LLM trained in that language was at a disadvantage as a globally trained, English-speaking LLM would not know about that country’s history, news and culture that was described in the local language.

I don’t know this is true. But whatever sounds true enough and gets funding seems to be what flies these days.

show 1 reply
hank808today at 9:36 PM

Ehhh. None of this sounds right. Translation problems maybe. Lack or technical detail understanding maybe... I don't know. Probably not news.

jauntywundrkindtoday at 8:28 PM

384 core cpu cluster? 2 petabytes?

Dell just launched a 2U that fits almost 10 petabytes in it. It's probably not 384 core capable but that is very doable right now, Epyc chips are 192 cores each! https://www.techradar.com/pro/dell-launches-record-shatterin...

show 2 replies
7etoday at 8:24 PM

2 PB? They will not come close to training in on that amount. Maybe years from now.

show 3 replies
kreyenborgitoday at 8:38 PM

Ad for Huawei?