Has there been much research into slightly flawed matrix multiplications?
If you have a measure of correctness, and a measure of performance. Is there a maximum value of correctness per some unit of processing that exists below a full matrix multiply
Obviously it can be done with precision, since that is what floating point is. But is there anything where you can save x% of computation and have fewer than x% incorrect values in a matrix multiplications?
Gradient descent wouldn't really care about a few (Reliably) dud values.
Randomized matrix sketching is one way to get at this (see https://arxiv.org/abs/2302.11474), the problem is hardware is heavily optimized for dense multiplies so what you save in flops doesn't translate to real runtime speeds ups.