Regarding the first, parallel requests to the same loaded model seem to work pretty well, I'm t...

c7b • yesterday at 10:04 PM • 1 reply • view on HN

Regarding the first, parallel requests to the same loaded model seem to work pretty well, I'm trying to find time to look more into it myself, but this may be something that might already be within reach for local models.

Replies

colechristensen • today at 3:25 AM

Sure, it's possible, but you'd start to use it much more and in more advanced ways. Like "thinking hard" would consist of spawning a dozen different inferences from the same cached point and then picking the best one.

alt Hacker News

Replies