logoalt Hacker News

xaskasdftoday at 1:11 AM2 repliesview on HN

yeah, actually I wanted to see if this was possible at all. I managed to get around 3000 tokens/s on a ps2 with classic transformers, since the emotion engine is capable of 32 bit addresses, but it has like 32gb of ram. So I ran into the question of why was that fast and I couldn't get that speed even with small models, and the deal is that the instructions went right of the memory to the gpu and that's the main difference that does when a regular computer does inference: it has to request the instructions to the cpu every time. As I mentioned too, on professional cards you can avoid these problems naturally, since they got instructions precisely for this, but sadly I don't have 30k bucks to spare on a gpu :(


Replies

derstandertoday at 1:32 AM

*32MB of RAM (plus 4MB of video RAM and a little sound and IOP memory).

anoncowtoday at 4:14 AM

3000 tokens per sec on 32 mb Ram?