i think i just misunderstood the writing, it does explicitly say 4bits for castling. the prose around is just describing what castling is - i thought it was implying that you could determine whether castling is possible from the position of the pieces.
It starts off with 4 bits for castling, then optimizes it into a piece swap that takes 0 bits (though the piece swap as written might be flawed).
He starts out by using 4 bits for castling rights.
Then he introduces the other method (signify that castling is allowed by saying the rook on that side is on the same square as the king) and with that method he doesn't need any extra bits for castling rights.
Edit: it would be better on average to keep the castling bits, and omit the positions of kings and rooks if castling is possible. But that's variable length and it's simply 4 extra bits in the worst case.