Ah I think I see what you mean. Outcomes from one clock cycle can impact the data input to various h...

MobiusHorizons • yesterday at 6:28 PM • 0 replies • view on HN

Ah I think I see what you mean. Outcomes from one clock cycle can impact the data input to various hardware on the next cycle.

I think the intuition I was trying to convey (and which I think feels odd for software folks) is that within a given clock cycle, you often compute all the outputs you might have, and then choose between them with a mux. You really can design an ALU by passing the inputs through multiple operations in parallel and having the output decided by a mux. This technique generally produces shorter dependency chains which increases the maximum clock speed for a given block. Like all optimization techniques it has trade offs, (in this case using more gates total) so you might find a middle ground for instance by reusing the adder for both addition and subtraction at only a few additional gates worth of latency. I have built an ALU this way.

Of course on modern CPUs the ALU isn’t one monolithic block any more, but rather multiple separate units where a scheduler does the work of sending data to the right queue over several clock cycles.

alt Hacker News