Why does it need the same bits of memory as steps?
Bits are a bit misleading here, it would be more accurate to say "units of computation". If you're problem space operates on 32-bit integers this would be 32bits * number of steps, these papers solve for individual bits as the smallest individual unit of computation we can commonly reason about.
The basic intuition of the two extremes are:
* High space requirement: we need a bucket per step to track the state
* Low space requirement: we only need a single bucket, if we can mix/unmix the various components of the state
High-space requirement algorithms will be those with a large amount of "unmixable" complex state. Low-space reqs will be those with either a small state, or an easily mixed/unmixed state.
Example of mixing: suppose we need an odd number and a parity (+,-) -- then we can store odd/even numbers: taking even to means (-1 * (number-1)) and odd means odd.
So 7 = +7, 8 = -7
I don't know if this example is really that good, but I believe the very basics of the intution is correct -- its to do with how we can trade "interpretation" of data (computation) for the amount of data in such a way that data can kinda be mixed/unmixed in the interpretation