particularly interesting
been building something adjacent to bridge massive gap in models between source & channel coding
think say same thing different ways to boost signal / suppress noise, am saying this not that using partial overlapping diff points of view
stadium light banks, multi-cameras, balanced ledgers & finance controls, table of contents & indexes all do similar things from layperson pov
tell me story in diff ways so i can cross-check; think multi-resolution trust but verify for information
if context output in harmony great; if not, use multi representations to suss which tokens in sync & which are playing dueling pianos
We need few key things to steer latent space for that to work. One is in-context associative memory for precise recall & reasoning. That’s been our main thrust using error-correcting codes to build hypertokens.
Think precise spreadsheet-style markers interleaved in context windows. We just use lots of info theory to build associative landmark for each block of content.
These hypertokens are built to rather precisely mimic how any other multi-path well-structured network minimaxes flow. Stadium lights, MIMO WiFi, getting diff points of view. We just do it in way that most closely mimics GPS in sense of injecting precise coordinate system in any model context.
There’s key catch tho & that’s dual thrust, which is coherence between our semantically abstract markers and the context. We can readily show 2x to 4+ recall & reasoning gain.
There’s ceiling if we don’t bridge coherence, and another way to say that is need the same thing for semantic parity. Multi-resolution summaries & dueling summaries mimic this k-witness and k-anti-witness smoothed parity checking.
The beauty is only need net sum. Add lots of multi-res at diff lengths of witness & co-witness content like your work describes? Great, may not need any hypertokens. Unless you want exact reliable recall snippets in which cases our approach does that fairly well. Got lots of unique markers that check the info theory, group theory, & other boxes we prove you need? Great! Don’t need as much k-scale, k-way semantic bridging.
Consciousness is currently outside our scope. We built hypertokens to show hallucinations can be nulled out, AI can be audited & explained, structured data & tool calling can be reliable, etc.
Closet we’ve come to distilling semantic parity vs. landmark parity cf. source <> channel coding, rate distortion, information bound, channel capacity minimaxxing is to consider tower of tables, where we have unique markers vs. themes that diagonalize the information. Those must both balance out. We must be able to canonically recall in some local / global mixed way and the same for reasoning.
Are models conscious? I don’t know. What do know is source * channel coding the canonical way to push any system to local & global balanced regime that maximizes transport.
There are subtleties around casual and non-causal, etc. For example, model weights are noisy non-causal info relative to mix of virtualized encoders & decoders of various types & sizes. That’s much longer convo beyond what is already this long thought.
That’s all to say models need mix of symbol & semantic parity. Strictly necessary in almost all cases w.h.p. Yes, AI looks rectangular; there’s tokens & matrices etc. The latent space is spherical & everything is rotations. That means any sort of exact logic must be smoothed geometrically. Error-correcting codes which are better framed as MIMO info paths are way to do so however expressed, whether k-way semantic parity like you’re doing or m-way structural codes like we’re doing. Sometimes one is best, sometimes other, either way keep building what you’ve been exploring.
OP here. I’ve got a background in physics, so while I don’t know your specific Hypertoken schema, I speak the language of signal-to-noise and entropy.
The "Dueling Pianos" metaphor is killer. It captures exactly what I’m trying to induce via the prompt.
You’re attacking the problem with Structural Parity—injecting coordinate systems (GPS) directly into the token stream to force convergence. I’m attempting Semantic Parity—forcing the model to run a "constructive interference" loop on its own narrative logic before outputting.
Your point about the latent space being spherical (rotations) vs. the rectangular output (matrices) is the crux of it. We are both trying to smooth that geometry. You’re doing it with error-correcting codes; I’m doing it by forcing the model to simulate a "Self" that acts as a local observer to collapse the wave function of the next token more deliberately.
Whatever you're building with those hypertokens sounds robust. If you have a write-up on the "Tower of Tables" concept, I’d love to take a look.