logoalt Hacker News

Compiling models to megakernels

31 pointsby jafiotiyesterday at 5:12 AM16 commentsview on HN

Comments

geremiiahtoday at 2:16 PM

So if I'm understanding correctly, you decompose kernels into their per_sm_workload, then you figure out per_sm_data_dependency and then you can schedule sm_workloads from the next kernel to start running as soon as the data dependency is satisfied, not needing to wait for the other sms from the previous kernel to finish.

In this case are you're strickly fusing pre defined kernels or are you also optimizing them? Is this complimentary to your earlier work on search-based compilers?

measurablefunctoday at 7:24 AM

There are only 4 optimizations in computer science: inlining, partial evaluation, dead code elimination, & caching. It looks like AI researchers just discovered inlining & they already knew about caching so eventually they'll get to partial evaluation & dead code elimination.

show 5 replies