logoalt Hacker News

AIorNotyesterday at 1:51 AM1 replyview on HN

Interesting - I wonder if this ties into the Platonic Space Hypothesis recently being championed by computational biologist Mike Levin

E.g

https://youtu.be/Qp0rCU49lMs?si=UXbSBD3Xxpy9e3uY

https://thoughtforms.life/symposium-on-the-platonic-space/

e.g see this paper on Universal Embeddings https://arxiv.org/html/2505.12540v2

"The Platonic Representation Hypothesis [17] conjectures that all image models of sufficient size have the same latent representation. We propose a stronger, constructive version of this hypothesis for text models: the universal latent structure of text representations can be learned and, furthermore, harnessed to translate representations from one space to another without any paired data or encoders.

In this work, we show that the Strong Platonic Representation Hypothesis holds in practice. Given unpaired examples of embeddings from two models with different architectures and training data, our method learns a latent representation in which the embeddings are almost identical"

Also from the OP's Paper we see this on statement:

"Why do these universal subspaces emerge? While the precise mechanisms driving this phenomenon remain an open area of investigation, several theoretical factors likely contribute to the emergence of these shared structures.

First, neural networks are known to exhibit a spectral bias toward low frequency functions, creating a polynomial decay in eigenvalues that concentrates learning dynamics into a small number of dominant directions (Belfer et al., 2024; Bietti et al., 2019).

Second, modern architectures impose strong inductive biases that constrain the solution space: convolutional structures inherently favor local, Gabor-like patterns (Krizhevsky et al., 2012; Guth et al., 2024), while attention mechanisms prioritize recurring relational circuits (Olah et al., 2020; Chughtai et al., 2023).

Third, the ubiquity of gradient-based optimization – governed by kernels that are largely invariant to task specifics in the infinite-width limit (Jacot et al., 2018) – inherently prefers smooth solutions, channeling diverse learning trajectories toward shared geometric manifolds (Garipov et al., 2018).

If these hypotheses hold, the universal subspace likely captures fundamental computational patterns that transcend specific tasks, potentially explaining the efficacy of transfer learning and why diverse problems often benefit from similar architectural modifications."


Replies

unionjack22yesterday at 1:56 AM

Dr. Levin’s work is so fascinating. Glad to see his work referenced. If anyone wishes to learn more while idle or commuting, check out Lex Friedman’s podcast episode with him linked above