You can do that in a single matmul of course.
So can you take an arbitrary transformer and somehow turn it into a compact set of low-power fast gates by some algorithm?
So can you take an arbitrary transformer and somehow turn it into a compact set of low-power fast gates by some algorithm?