That is one way of implementing such conditionals in hardware, but it's just one aspect of the ...

tsimionescu • last Wednesday at 10:10 PM • 1 reply • view on HN

That is one way of implementing such conditionals in hardware, but it's just one aspect of the computation. First, we can both agree that in the next clock cycle, a single instruction will be executed, not both instructions that could result after the jump - so clearly the hardware doesn't always do both things.

Secondly, if we think about the instruction decoding itself, it should become pretty clear that even if the hardware of course always exists and is always going to output something, that's not equivalent to saying it will compute all options. If the next instruction is `add ax, bx`, the hardware is not going to compute both ax + bx and ax - bx and ax & bx and ax | bx and so on, and then choose which result to feed back to ax through a mux. Instead, the ALU is constructed of logic gates which evaluate a single logical expression that assigns each output bit to a logical combination of the input bits and the control signal.

Replies

MobiusHorizons • yesterday at 6:28 PM

Ah I think I see what you mean. Outcomes from one clock cycle can impact the data input to various hardware on the next cycle.

I think the intuition I was trying to convey (and which I think feels odd for software folks) is that within a given clock cycle, you often compute all the outputs you might have, and then choose between them with a mux. You really can design an ALU by passing the inputs through multiple operations in parallel and having the output decided by a mux. This technique generally produces shorter dependency chains which increases the maximum clock speed for a given block. Like all optimization techniques it has trade offs, (in this case using more gates total) so you might find a middle ground for instance by reusing the adder for both addition and subtraction at only a few additional gates worth of latency. I have built an ALU this way.

Of course on modern CPUs the ALU isn’t one monolithic block any more, but rather multiple separate units where a scheduler does the work of sending data to the right queue over several clock cycles.

alt Hacker News

Replies