logoalt Hacker News

jaberjaber23yesterday at 8:36 AM0 repliesview on HN

absolutely. it really depends on the kernel type, target architecture, and what you're optimizing for. the 2x-4x isn’t the limit, it's just what users often see out of the box. we do real-time profiling on actual GPUs, so you get results based on real performance on a specific arch, not guesses. when the baseline is rough, we’ve seen well over 10x