logoalt Hacker News

geremiiahtoday at 2:16 PM0 repliesview on HN

So if I'm understanding correctly, you decompose kernels into their per_sm_workload, then you figure out per_sm_data_dependency and then you can schedule sm_workloads from the next kernel to start running as soon as the data dependency is satisfied, not needing to wait for the other sms from the previous kernel to finish.

In this case are you're strickly fusing pre defined kernels or are you also optimizing them? Is this complimentary to your earlier work on search-based compilers?