Yon – a topos-oriented language with a content-addressed lattice heap

43 points • by amenn • last Friday at 12:28 AM • 44 comments • view on HN

Hello everyone. In the last two years I spent, as a dev, part of my free time stretching the limits of my knowledge. Not being a mathematician myself, I discovered that formalizing concepts in mathematical language could nonetheless be useful to improve symbolic reasoning about the concepts themselves. I made use of both books and AI, and I followed the development of the latter, mainly with a critical eye. I have several open projects, and from some observations and explorations on one of them I started asking myself what the current limits of reasoning, of logic, of mathematics itself are. So I explored categories, and topoi, above all starting from Mazzola's theory of music. I asked myself whether this could influence type theory in programming, and I ran some experiments. Out of this came this programming language, Yon, inspired by Yoneda and by morphisms. From another project I drew observations on the Leech lattice; from yet another, some experiments with mmap and coordinate-based allocation in a structure that would be advantageous, again, in a topological sense. The language certainly has mistakes here and there and I wrote the documentation in a hurry; the work took 3 weeks in total. It compiles to LLVM for performance reasons, and for now I preferred to avoid a VM and a GC. It contains unusual data structures that turn out to be performant. It's worth a look, and I hope it will win some converts, and that someone will want to help me with its development. I'd love for it to bring fresh stimuli to programming and maybe open a few new frontiers. A few concrete details, for those who want to look under the hood. The compiler is a real pipeline, not an interpreter: an OCaml frontend takes .yon source into a custom MLIR dialect I called "topos", where the categorical constructs live as first-class operations; its lowering passes take everything down to LLVM IR and from there to a native executable. A single command, yonc, drives the whole chain, and you can stop at any intermediate stage to see what a categorical construct actually becomes on its way to silicon. The runtime is where the Leech lattice observations ended up. The heap is content-addressed over Λ₂₄: every value is mapped to a lattice point and canonicalized under the Conway group Co₀ (via libmmgroup), so the same content always lives at the same address. That buys three things I would now find hard to give up: equality is a single machine comparison no matter how big the value is (string equality benches flat at ~17 ns up to 32,768-character strings, because it compares handles, never bytes); deduplication is global and automatic, with no interning logic in user code; and giving up the GC stopped being a renunciation, since cells are immutable and content-addressed, so there is nothing to trace and nothing to move. Concurrency I kept deliberately simple-minded: no threads, no shared mutable state. A program splits into isolated "Spaces" (separate processes, isolation enforced by the MMU) that talk over shared-memory channels with explicit failure semantics. About what is verified and what is just hope: the ground truth is a regression suite of 112 examples plus a cross-Space scenario suite, with exit codes identical on Linux x86-64 and macOS Apple Silicon (Intel Macs: untested). The book on the site, 21 chapters plus appendices, had every snippet compiled and run before being written down. The benchmarks appendix declares its environment and method; I tried not to publish any number without one. The limits of 1.0 are written down as well, in a baseline document that lists every fixed pool (256 heaps per chain, 64 Spaces, 16 concurrent RPC sessions, and so on), with the rationale that a hard limit that fails loudly is a specification, while a soft limit that degrades silently is a bug. For the license I went with the GCC model: compiler and toolchain are AGPLv3, the runtime is AGPLv3 with an explicit linking exception, so the language itself stays free, and the programs you write in it are entirely yours, under any license you choose.

Site + book: https://yon-lang.org Repo: https://github.com/yon-language/yon (tag v1.0.0)

Happy to answer anything: the topos dialect, why a lattice rather than a hash, what the categorical constructs lower to, what broke along the way.

Comments

mccoyb • today at 1:18 PM

I'm not sure where or how to convey this, because I've seen several of these languages designed with AI, documentation created using AI, etc -- posted on Hacker News in the last months or so, and I've responded to each one with roughly the same feedback (and I'm assuming good faith: that the intent is that the poster wishes to grow as a language designer).

Your audience, or whoever you aim your work at, should be treated with respect. Otherwise, why should they give you the time of day? Why would you expect them to respond positively to effort alone when effort (in code and in shit prose) is extremely cheap right now? Their time is not cheap ...

When I read the documentation, and it is extremely clear that you haven't taken the time to clarify your ideas, when much of it is LLM prose, when much of the content introduces highfalutin ideas without motivation, blending categorical concepts (which, by the way, should never be mixed with vague prose claims about the language), violating my reader context model, preventing me from understanding what problem exactly your language design is solving (where is that problem stated clearly?), it is a waste of my time.

> The work took 3 weeks in total ... it's worth a look, and I hope it will win some converts, and that someone will want to help me with its development.

You've gone too fast, too much is vague, nothing is clear.

I'd delete everything, start over, and try and explain just one of the ideas clearly. Seriously. This sounds harsh, but it's honestly the correct approach to something as subtle and nuanced as programming language design.

➕ show 4 replies

jrmg • today at 1:30 PM

Just a comment: this sounds a lot like when someone I knew mildly succumbed to AI psychosis, and thought he, with Gemini, had made some physics/metaphysics breakthrough. If you’re losing sleep and feeling distressed or euphoric, maybe lay off for a few days, no matter how hard that is. Talk to friends and/or family about unimportant things. Get outside for a while. Go back to old hobbies (reading, hiking, just going to coffee shops or thrift stores - whatever) and then reassess.

This language looks interesting, but I don’t understand the concepts. Does this stuff make sense to other people?

The heap is content-addressed over Λ₂₄: every value is mapped to a lattice point and canonicalized under the Conway group Co₀ (via libmmgroup), so the same content always lives at the same address.

What is ‘Λ₂₄’? What is a ‘lattice point’?

giving up the GC stopped being a renunciation, since cells are immutable and content-addressed, so there is nothing to trace and nothing to move

This kind of sounds like you’re saying that there’s nothing to free, which implies that nothing takes up memory, which I presume is not the case. Do you mean everything is immutable and content-addressed (like Git)? Doesn’t stuff still need to be freed somehow when the programs done with it, otherwise memory will grow for ever?

➕ show 4 replies

Chinjut • today at 4:17 PM

I have a PhD in category theory and know what the Leech lattice is and I still don't understand what is going on here. What is the value of using the Leech lattice to store memory?

cjs_ac • today at 1:49 PM

The documentation is a work of art. Every time I try to work out what just one of the unexplained ideas is, it just introduces new unexplained ideas. I don't know where these ideas came from, how they fit together, or why putting them together is useful. I certainly don't know why I would want to write a program in this language, as opposed to any other language I already know.

➕ show 2 replies

canyp • today at 4:08 PM

> Content addressing is extensionality made physical (chapter 11)

Actually, that's in chapter 12; 11 is the standard library. Maybe the LLM got confused because the chapters are 0-indexed.

I was curious about that topic but it seems over my head. I don't think it works outside of mathematics? In programming, one can have two objects that are identical in both structure and value but have different identities. It's why lisp has eq, eql, equal, etc. How'd you get around that other than adding an identity property?

Also:

> A handle, what your variables actually hold for strings, sections, lists, trees, is that slot index, carried as an f64

Why does the handle need floating point?

➕ show 1 reply

solomonb • today at 5:30 PM

As someone genuinely interested in programming language design, type theory, and category theory this sort of thing really saddens me. There is so much passion and rigor that has gone into developing these fields. Chucking all their jargon into an ai slop blender, imo, is actually incredibly disrespectful to those who have worked so hard.

Imagine someone honestly interested in learning about category theory but not yet knowing where to start. Projects like this only serve to muddy the waters obscuring paths to actual learning and giving the impression that the subject is a joke.

iterateoften • today at 2:30 PM

I noticed since 5.5 GPT has been adding "lattice" to a lot of things. Not sure if it is the new Gremlins.

danieltanfh95 • today at 4:25 PM

As I understand it, content addressing function content is problematic because it does not actually "normalise" the content of functions into something interchangeable. A function of input A and output B with performance signature X can still be very different in terms of actual code, but the actual comparison between both is hard to specify.

I was exploring this as a means to solving the open source, or rather the github conundrum, the problem of sharing code socially is that we need a canonical source, and this is sociologically driven than performance driven, and as it turns out, have devastating consequences for FOSS funding. I wanted to explore sharing code "interchangeably" in some sense to avoid this problem, but ultimately this seems unsolveable, even with exploration by Unison etc.

GreenSalem • today at 2:45 PM

Advanced AI psychosis.

Professional help might be necessary.

swiftcoder • today at 3:17 PM

Honestly, as someone who has at least a moderate tolerance for PL jargon, most of this is completely impenetrable. It's like someone put the whole field of PL in an LLM-powered blender.

omegafixedpoint • today at 4:08 PM

A few notes, because this is obviously vibe coded, and does not work in many ways.

1. Yon's documentation mentions "Homotopy type theory:"

  > the runnable HoTT fragment is refl/pair/fst/snd

  These are basic features of martin-lof type theory, not homotopy type theory. The documentation makes no reference of an interval type, which is generally the way to go for decidable type-checking in HoTT without univalence as an opaque axiom.
  
  Pi types are mentioned, but Yon does not have dependent types. From what I can tell, they are polymorphic, maybe even just simply-typed (except for identity types under Pi). See here in the repo: https://github.com/yon-language/yon/blob/523e363a4a00e8da1410a2521b1d7d1309d360ce/frontend/ast.ml#L37
  The datatype Ty only refers to other Ty's. Thus, it is not dependent. Terms cannot appear in types. Pi is explicitly indexed by a type defining its domain and codomain. A pi type is not a pi type if its codomain cannot depend on its domain.

2. Normalization can fail in Yon. Yon's docs say that its universe of propositions has booleans (https://yon-lang.org/book/heyting-core?_highlight=prop). It also says its logic is intuitionistic (AKA, constructive). However, it also says the logical connectives on booleans are CLASSICAL. This implies law of excluded middle, which is NOT constructive without careful sandboxing (e.g., Linear logic).

  Yon dangerously allows propositions to be lifted to booleans. If I am interpreting correctly (docs are very vague), this means propositions can be lifted to terms. This causes an obvious failure of normalization due to assumed proof irrelevance (otherwise Prop would not be distinguished) (also see Coquand's paper on this https://arxiv.org/abs/1911.08174).

3. Yon's type definitional equality does not actually reduce types. See here. This is the function used by the type-checker to check if types are equal. https://github.com/yon-language/yon/blob/523e363a4a00e8da141... No reduction actually occurs, conveniently because none of the types actually contain terms (that is, it is simply typed).

  Yon claims to be dependently-typed. See this in the repo: https://github.com/yon-language/yon/blob/523e363a4a00e8da1410a2521b1d7d1309d360ce/frontend/ast.ml#L21
  > Types in Yon Core kernel — the small dependent type theory used for the operational semantics.

  Suppose I'm reading the source code wrong. Conveniently, the comment one line below reveals, once again, that the "type theory" is simply-typed:

  >  *   T, U ::= Type_n                            universe of level n
  >  *         |  Pi(x:T). U                         dependent function

  Pi types eliminate into a FIXED type that does not depend on x. This means there is also no purpose for having a universe hierarchy.

  To confirm, I scoured the docs a bit for any examples using Pi types or Sigma types. I searched the docs, and could not find any, besides this example:

> world W { Code is X }

> place Account in W { balance number }

> fun takes_sub(s: { a : Account where Pi(x: Account). Pi(y: Account). Id(Account, x, y) }): number { return 0 }

> fun main(): number { return 0 }

Notably, the identity is the only constructor for Ty indexed by a term. That is, Pi types can ONLY eliminate into the identity. What if I want my Pi type to eliminate into anything else living in Prop? e.g., an existential like \forall (x : Nat), \exists (y : Nat), x < y. Unfortunately impossible in Yon.

This project is clearly produced by AI, and clearly dangerously incorrect. Do not use this for anything serious.

➕ show 2 replies

wavemode • today at 1:47 PM

> No garbage collector

> Slots are stable for the life of the process; the heap grows with distinct content only.

So how is a program supposed to handle lots of unique content? Like a web server handling user requests?

➕ show 2 replies

z4lis • today at 2:57 PM

You must not mix up technical mathematical words with softer prose words. For instance, "Yon's data model is categorical. A world is a category, a semantic site;"

So if you want to define a world, I expect you to tell me how to supply objects + morphisms + the composition law + the site structure. I don't know what a "semantic site" is, just what a "site" is. You'd need to define it. Anyway, we then get to our first examples of declaring worlds:

  world Currency { Code is EUR, USD }
  world Status { State is on, off }

This maybe gives me the first bit of data we need for a site. Definitely not the rest. Then we hit this

  world Sub subset of Currency

Two issues with this. One is stylistic. Why on earth would you call it a "subset"? It's not a set! "subworld" is the obvious choice... But the real issue is that like the initial definition, this doesn't tell me how to build `Sub`. I need to know which objects and morphisms of Currency to include into the category Sub? What's the site structure?

So now I think, "OK, maybe you just declare part of the structure and fill it in later, before you actually use it..."

But then your example disproves that notion! You have

  world Shop { Code is X }

  place Account in Shop {
    balance number
    owner String
  }

with no mention of Account when you declared Shop, I'm still not sure what Code or X are, and then you give what is seemingly supposed to be some working code

  fun main(): number {
    be a holds new Account { balance 40 owner "ada" }
    be _p holds String.print(a.owner)
    return a.balance + 2
  }

So your motivating example really kills off the interest from your two main communities that would use this thing: 1) category theorists have no idea what you're talking about, because nothing here looks like categories - there are no morphisms, no site structure 2) computer/software folks look at your example and think "why on earth would I learn topos theory to do something that sure looks like OOP"

I think a "topos inspired programming language" would be kind of cool if you could pull it off, but I think you really need to figure out how to sell it in the docs to at least one of the two communities above.

➕ show 1 reply

esafak • today at 2:07 PM

Could you name a few languages you had in mind while developing this, their respective problems, and how your language improves them, feature by feature?

> Yon allocates into xleech2, a content-addressed heap whose geometry is the Leech lattice Λ24: exactly 196,560 slots per heap.

What is the computational complexity of memory allocation into this Leech lattice? What applications did you have in mind where making allocation a maths problem in order to save time on comparisons makes sense? What is going to happen when a program exhausts your little heap?

➕ show 2 replies

drob518 • today at 2:32 PM

The buzzwords are strong in this one.

vitriol83 • today at 4:29 PM

this has templeOS vibes

Retr0id • today at 3:02 PM

[dead]

tribal808 • today at 1:18 PM

[dead]

alt Hacker News

Yon – a topos-oriented language with a content-addressed lattice heap

Comments