>The AMD 395+ uses unified memory, like Apple, so nearly all of it is addressable to be used by t...

reactordev • last Sunday at 2:41 PM • 5 replies • view on HN

>The AMD 395+ uses unified memory, like Apple, so nearly all of it is addressable to be used by the GPU.

This is why they went with the “laptop” cpu. While it’s slightly slower than dedicated memory, it allows you to run the big models, at decent token speeds.

Replies

beala • last Monday at 12:45 AM

Geerling benchmarked LLM performance on the Framework Desktop and the results look pretty lackluster to me. First, the software seems really immature. He couldn't get ROCm or the NPU working. When he finally got the iGPU working with Vulkan, he could only generate 5 tok/sec with Llama 3.1 70b (40 GB model). That's intolerably slow for anything interactive like coding or chatting imo, but I suppose that's a matter of opinion.

https://github.com/geerlingguy/ollama-benchmark/issues/21

➕ show 2 replies

dajonker • last Sunday at 8:59 PM

I guess that depends on your definition of "decent". For the smaller models that can run on a 16/24/32 GB nvidia card, the chip is anywhere between 3x and 10x slower compared to say a 4080 super or a 3090 which are relatively cheap used.

Biggest limitations are the memory bandwidth which limits token generation and the fact it's not a CUDA chip, meaning longer time until first token for theoretically similar hardware specifications.

Any model bigger than what fits in 32 GB VRAM is - in my opinion - currently unusable on "consumer" hardware. Perhaps a tinybox with 144 GB of VRAM and close to 6 TB/s memory bandwidth will get you a nice experience for consumer grade hardware but it's quite the investment (and power draw)

➕ show 1 reply

nottorp • last Sunday at 6:08 PM

unified and soldered :(

I understand it's faster but still...

Did they at least do an internal PSU if they went the Apple way or does it come with a power brick twice the size of the case?

Edit: wait. They do have an internal PSU! Goodness!

➕ show 2 replies

heavyset_go • last Sunday at 11:39 PM

I'm curious how something like CUDIMM memory would perform under the same workloads.

Currently avoid machines with soldered memory, but if memory can be replaced and still have similar performance, that would change things.

➕ show 1 reply

SV_BubbleTime • last Sunday at 8:16 PM

> it allows you to run the big models, at decent token speeds

Without CUDA, being an AMD GPU. Big warning depending on the tools you want to use.

➕ show 1 reply

alt Hacker News

Replies