logoalt Hacker News

behaviorsyesterday at 11:31 PM0 repliesview on HN

A model framework for an in house suite of models.

From dataset harvest, to training intricacies on CUDA/ROCm to fun HIP kernels. Full circle to inference testing, building it around consumer hardware(the challenge). Using this as a "how it works" deep dive, allowing me to learn more about the how, more than endless papers will. It's a MoE and I'm slowly running a human loop, research, build, correct, research.