Any thoughts on layering on-GPU work stealing or cudf on top?
For gfql (graph query language mapping down to cudf calls), we're trying to jettison the hot loop of python->cpu->gpu, so been loosely watching cuTile evolve!
[flagged]
[flagged]