Show HN: Zero-power photonic language model–code

6 points • by damir00 • today at 6:45 PM • 5 comments • view on HN

The model uses a 1024-dimensional complex Hilbert space with 32 layers of programmable Mach–Zehnder meshes (Reck architecture) and derives token probabilities directly via the Born rule.

Despite using only unitary operations and no attention mechanism, a 1024×32 model achieves coherent TinyStories generation after < 1.8 hours of training on a single consumer GPU.

This is Part 1 - the next step is physical implementation with $50 of optics from AliExpress.

Comments

cpldcpu • today at 11:00 PM

"Zero power" does not include the power needed to translate information between electronic and optical domains and the light source itself.

tliltocatl • today at 8:26 PM

Stupid question - how is it even possible given that you lose information on each layer? And how do one implement a non-linear activation function without an amplifier of a sort?

➕ show 1 reply

bastawhiz • today at 8:48 PM

This is a neat idea, but it's extremely light (no pun intended) on real details. Translating a simulation into real hardware that can do real computation in a reliable manner is properly hard. As much as I'd love to be an optimist about this project, I have to say I'll believe it when I see it actually running on a workbench.

If it does work, I think one of the biggest challenges will be adding enough complexity to it for it to do real, useful computation. Running the equivalent of GPT-2 is a cool tech demo, but if there's not an obvious path to scaling it up, it's a bit of a dead end.

ifuknowuknow • today at 8:37 PM

meds

alt Hacker News

Show HN: Zero-power photonic language model–code

Comments