logoalt Hacker News

ericdtoday at 1:48 PM2 repliesview on HN

You probably want to try renting some time on a dedicated box with roughly the specs you’re considering and running the open models for a bit to see if you would actually use them before dropping a lot on local hardware. A 128 gig MacBook Pro isn’t going to get you an amazing model, and certainly not amazing speed. GLM 5.2 wants something like 350+ gigs at fp4 iirc.


Replies

zackifytoday at 3:23 PM

I ran glm 5.2 on rented 8x h200 it could only do 2x concurrency at a cost of $40 an hour. It felt great but dang I wish it was cheaper... It needs 750 at fp8

traceroute66today at 2:50 PM

> You probably want to try renting some time on a dedicated box with roughly the specs you’re considering and running the open models

You don't even need to go that far. For example, with Exoscale Dedicated Inference[1] you just point it at the Hugging Face for the model and quantisation you want to test and it automagically spits out an OpenAI-compatible API endpoint.

[1] https://www.exoscale.com/ai-cloud-infrastructure/dedicated-i...

(I have no relationship with Exoscale, this particular product just crossed my radar recently)

show 1 reply