Has there been much research into slightly flawed matrix multiplications?
If you have a measure of correctness, and a measure of performance. Is there a maximum value of correctness per some unit of processing that exists below a full matrix multiply
Obviously it can be done with precision, since that is what floating point is. But is there anything where you can save x% of computation and have fewer than x% incorrect values in a matrix multiplications?
Gradient descent wouldn't really care about a few (Reliably) dud values.
One of the author here, don't hesitate if you have any question or comment!
I had bet that matmult would be in transformer-optimized hardware costing a fraction of GPUs first class in torch 2 years ago with no reason to use GPUs any more. Wrong.
Very interesting, willing to try burn
burn is awesome
I'm sorry this is a low brow comment but this is the dumbest thing you can do in this space:
> Unit (thread in CUDA, invocation in Vulkan/Wgpu): the smallest execution entity performing computations.
> Plane (warp in CUDA, subgroup in Vulkan/Wgpu): a group of (typically 32) units executing in lockstep and able to share data efficiently through registers.
> Cube (thread block in CUDA, workgroup in Vulkan/Wgpu): a group of units that execute on the same SM, sharing memory and able to synchronize
It's already bad enough that the vendors themselves insisted on different names but why in the bejesus would you rename these concepts and diverge from literally all existing naming conventions when you're providing middleware. Ie when using your tool I'm still going to reference NVIDIA's or AMD's docs to understand how the hardware actually works. Like do you really think otherwise - that your thing is gonna be end of the line???
FYI the word warp isn't random techno babble but is actually a very clever pun that actually fits very well conceptually:
GPUs came about because of the need for faster float 4x4 and 3x3 matrix, and 3 and 4 vector math ops like multiply, multiply-accumulate, and such, and faster pushing of pixels with things like texture mapping. All hail OpenGL and dual Voodoo2 SLI. ;)