Yeah, make the network deeper. When all you have is a hammer... It makes a lot of sense that a tra...

marcosdumay • yesterday at 4:31 PM • 3 replies • view on HN

Yeah, make the network deeper.

When all you have is a hammer... It makes a lot of sense that a transformation layer that makes the tokens more semantically relevant will help optimize the entire network after it and increase the effective size of your context window. And one of the main immediate obstacle stopping those models from being intelligent is context window size.

On the other hand, the current models already cost something on the line of the median country GDP to train, and they are nowhere close to that in value. The saying that "if brute force didn't solve your problem, you didn't apply enough force" is intended to be listened as a joke.

Replies

jagraff • yesterday at 5:27 PM

I think the median country GDP is something like $100 Billion

https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(PPP)

Models are expensive, but they're not that expensive.

➕ show 5 replies

whiplash451 • yesterday at 5:33 PM

I get your point but do we have evidence behind “ something on the line of the median country GDP to train”?

Is this really true?

➕ show 1 reply

helloplanets • today at 7:04 AM

> the current models already cost something on the line of the median country GDP to train

This is just blatantly false.

> According to AI Index estimates, the training costs of state-of-the-art AI models have reached unprecedented levels. For example, OpenAI’s GPT-4 used an estimated $78 million worth of compute to train, while Google’s Gemini Ultra cost $191 million for compute.

https://hai.stanford.edu/ai-index/2024-ai-index-report

No need to even open up the actual report to find that. Just scroll down the page to read the 'key takeaways'.

alt Hacker News

Replies