I do sometimes wonder if we will get "detailed enough" vector embeddings in LLMs to bring the grain of resolution down below human perception - like having enough bits to fully capture what's on tape in audio world. Maybe this is never possible, and (I hope) some details are unresolvable, but it will be interesting to see.
LLMs are already used in signal processing so the idea is explored.
Simply put anything that can be encoded is a language, so you just need sensors to capture and classify the incoming data and build that into a model. The real question is post training the model to behave correctly as these places are far less explored than things at the human scale. RLHF may be a poor choice because the models may see actual behaviors that humans don't and humans will discount it as being incorrect.
I suspect the curse of dimensionality makes this an optimization dead end. You hit prohibitive latency limits on retrieval long before the resolution approaches human perception. Even with current dimensions, the trade-off between index size and query speed is already the main constraint for production systems.