Compiling models to megakernels

31 points • by jafioti • yesterday at 5:12 AM • 16 comments • view on HN

Comments

So if I'm understanding correctly, you decompose kernels into their per_sm_workload, then you figure out per_sm_data_dependency and then you can schedule sm_workloads from the next kernel to start running as soon as the data dependency is satisfied, not needing to wait for the other sms from the previous kernel to finish.

In this case are you're strickly fusing pre defined kernels or are you also optimizing them? Is this complimentary to your earlier work on search-based compilers?

measurablefunc • today at 7:24 AM

There are only 4 optimizations in computer science: inlining, partial evaluation, dead code elimination, & caching. It looks like AI researchers just discovered inlining & they already knew about caching so eventually they'll get to partial evaluation & dead code elimination.

➕ show 5 replies

alt Hacker News

Compiling models to megakernels

Comments