Lichess uses a scheme which is probably more efficient on average, described on revoof's blog[0]. Basically, it's a variable length scheme where the first 64 bits encode square occupancies, followed by piece codes (including castling, side to move, and ep with some trickery), followed by half-move clocks if necessary.
0: https://lichess.org/@/revoof/blog/adapting-nnue-pytorchs-bin...
It also can encode chess960 positions. With the article's encoding, uncastled rooks can only be decoded if their starting position is known, which it isn't in chess960.
It’s mathematically dissatisfying, but often the most optimal storage (or algorithm) solutions involve clever heuristics that are dynamically applied.
Some systems just have to be observed in order for solutions to be optimally designed around how they actually behave, rather than how they theoretically behave.