There is some information added, depending on the vector db and context (some systems will add permissions related metadata so that the LLM won’t pull chunks that the user didn’t have access to).
The vector itself is pretty large (512 dimensions).
The chunks have an overlap (iirc 30% but someone feel free to correct me).
I don’t _think_ the data is typically compressed (not sure why but I assume performance).