Yes, I made a few mistakes and I did realize I didn't need the whole extra bit to store 65 possible locations- I just was being a bit lazy. Ba dum tiss.
Hmm as a bit field followed by piece order- that would be 8 bytes followed by a variable number of pieces, perhaps you could do a sort of compression where 0c means pawn and 1xxxc is any other piece. (c stands for a color bit). So thats another 14 bytes. Thats 22 bytes!
The xxx by the way is one of 8 things: k, k not moved, r, r not moved, n, b, q, en passant pawn
From the previous HN discussion, 00c for pawn and xxxc for other pieces is probably the optimal way to encode pieces into whole bits. Because you can sacrifice 1 pawn to enable 3 promotions, so you need to handle up to 28 non-pawns. With 2/5 you start at 14 bytes but can need up to 17.5. With 3/4 you start at 14 bytes and never need more than 14.
With 6 xxx states instead of 8, you can merge "king not moved", "rook not moved", and "en passant pawn" into a single state since those can never overlap. (Though if you're encoding one pawn the long way that means you need 14.125 bytes, oh well.)