logoalt Hacker News

mannyvtoday at 3:46 PM1 replyview on HN

The software has real software engineers working on it instead of researchers.

Remember when people were arguing about whether to use mmap? What a ridiculous argument.

At some point someone will figure out how to tile the weights and the memory requirements will drop again.


Replies

snovv_crashtoday at 3:57 PM

The real improvement will be when the software engineers get into the training loop. Then we can have MoE that use cache-friendly expert utilisation and maybe even learned prefetching for what the next experts will be.

show 1 reply