This is actually super cool, you can use those as both math accelerators and as io, and them being in lockstep you can kind of use them as int only shader units. I don't know how this is useful yet.
Btw I am curious what about edge cases. Maybe I have missed that from the article but what is the size of the FIFO?
Or the more dangerous part that is you have complex to determine timing now for complex cases like each reqd from FIFO is and ISR and you have until the next read from the FIFO amount of instructions otherwise you would stall the system and that looks to me too hard to debug.