Building a C compiler is definitely hard for humans, but I don’t think it’s particularly strong evidence of "intelligence" from an LLM. It’s a very well understood, heavily documented problem with lots of existing implementations and explanations in the training data.
These kinds of tasks are relatively easy for LLMs, they’re operating in a solved design space and recombining known patterns. It looks impressive to us because writing a compiler from scratch is difficult and time consuming for a human, not because of the problem itself.
That doesn’t mean LLMs aren’t useful, even if progress plateaued tomorrow, they’d still be very valuable tools. But building yet another C compiler or browser isn’t that compelling as a benchmark. The industry keeps making claims about reasoning and general intelligence, but I’d expect to see systems producing genuinely new approaches or clearly better solutions, not just derivations of existing OSS.
Instead of copying a big project, I'd be more impressed if they could innovate in a small one.
Something that bothers me here is that Anthropic claimed in their blog post that the Linux kernel could boot on x86 - is this not actually true then? They just made that part up?
It seemed pretty unambiguous to me from the blog post that they were saying the kernel could boot on all three arch's, but clearly that's not true unless they did some serious hand-waving with kernel config options. Looking closer in the repo they only show a claimed Linux boot for RISC-V, so...
[0]: https://www.anthropic.com/engineering/building-c-compiler - "build a bootable Linux 6.9 on x86, ARM, and RISC-V."
[1]: https://github.com/anthropics/claudes-c-compiler/blob/main/B... - only shows a test of RISC-V
It's really cool to see how slow unoptimised C is. You get so used to seeing C easily beat any other language in performance that you assume it's really just intrinsic to the language. The benchmark shows a SQLite3 unoptimised build 12x slower for CCC, 20x for optimised build. That's enormous!
I'm not dissing CCC here, rather I'm impressed with how much speed is squeezed out by GCC out of what is assumed to be already an intrinsically fast language.
> the build failed at the linker stage
> The compiler did its job fine
> Where CCC Succeeds Correctness: Compiled every C file in the kernel (0 errors)
I don't think that follows. It's entirely possible that the compiler produced garbage assembly for a bunch of the kernel code that would make it totally not work even if it did link. (The SQLite code passing its self tests doesn't convince me otherwise, because the Linux kernel uses way more advanced/low-level/uncommon features than SQLite does.)
It's really difficult for me to understand the level of cynicism in the HN comments on this topic, at all. The amount of goalpost-moving and redefining is absolutely absurd. I really get the impression that the majority of the HN comments are just people whining about sour grapes, with very little value added to the discussion.
I'd like to see someone disagree with the following:
Building a C compiler, targeting three architectures, is hard. Building a C compiler which can correctly compile (maybe not link) the modern linux kernel is damn hard. Building a C compiler which can correctly compile sqlite and pass the test suite at any speed is damn hard.
To the specific issues with the concrete project as presented: This was the equivalent of a "weekend project", and it's amazing
So what if some gcc is needed for the 16-bit stuff? So what if a human was required to steer claude a bit? So what if the optimizing pass practically doesn't exist?
Most companies are not software companies, software is a line-item, an expensive, an unavoidable cost. The amount of code (not software engineering, or architecture, but programming) developed tends towards glue of existing libraries to accomplish business goals, which, in comparison with a correct modern C compiler, is far less performance critical, complex, broad, etc. No one is seriously saying that you have to use an LLM to build your high-performance math library, or that you have to use an LLM to build anything, much in the same way that no one is seriously saying that you have to rewrite the world in rust, or typescript, or react, or whatever is bothering you at the moment.
I'm reminded of a classic slashdot comment--about attempting to solve a non-technical problem with technology, which is doomed to fail--it really seems that the complaints here aren't about the LLMs themselves, or the agents, but about what people/organizations do with them, which is then a complaint about people, but not the technology.
As a neutral observation: it’s remarkable how quickly we as humans adjust expectations.
Imagine five years ago saying that you could have a general purpose AI write a c compiler that can handle the Linux kernel, by itself, from scratch for $20k by writing a simple English prompt.
That would have been completely unbelievable! Absurd! No one would take it seriously.
And now look at where we are.
"Ironically, among the four stages, the compiler (translation to assembly) is the most approachable one for an AI to build. It is mostly about pattern matching and rule application: take C constructs and map them to assembly patterns.
The assembler is harder than it looks. It needs to know the exact binary encoding of every instruction for the target architecture. x86-64 alone has thousands of instruction variants with complex encoding rules (REX prefixes, ModR/M bytes, SIB bytes, displacement sizes). Getting even one bit wrong means the CPU will do something completely unexpected.
The linker is arguably the hardest. It has to handle relocations, symbol resolution across multiple object files, different section types, position-independent code, thread-local storage, dynamic linking and format-specific details of ELF binaries. The Linux kernel linker script alone is hundreds of lines of layout directives that the linker must get exactly right."
I worked on compilers, assemblers and linkers and this is almost exactly backwards
The 158,000x slowdown on SQLite is the number that matters here, not whether it can parse C correctly. Parsing is the solved problem — every CS undergrad writes a recursive descent parser. The interesting (and hard) parts of a compiler are register allocation, instruction selection, and optimization passes, and those are exactly where this falls apart.
That said, I think the framing of "CCC vs GCC" is wrong. GCC has had thousands of engineer-years poured into it. The actually impressive thing is that an LLM produced a compiler at all that handles enough of C to compile non-trivial programs. Even a terrible one. Five years ago that would've been unthinkable.
The goalpost everyone should be watching isn't "can it match GCC" — it's whether the next iteration closes that 158,000x gap to, say, 100x. If it does, that tells you something real about the trajectory.
Nice article. I believe the Claude C Compiler is an extraordinary research result.
The article is clear about its limitations. The code README opens by saying “don’t use this” which no research paper I know is honest enough to say.
As for hype, it’s less hyped than most university press releases. Of course since it’s Anthropic, it gets more attention than university press.
I think the people most excited are getting ahead of themselves. People who aren’t impressed should remember that there is no C compiler written in Rust for it to have memorized. But, this is going to open up a bunch of new and weird research directions like this blog post is beginning to do.
I think one of the issue is that the register allocation algorithm -- alongside the SSA generation -- is not enough.
Generally after the SSA pass, you convert all of them into register transfer language (RTL) and then do register allocation pass. And for GCC's case it is even more extreme -- You have GIMPLE in the middle that does more aggressive optimization, similar to rustc's MIR. CCC doesn't have all that, and for register allocation you can try to do simple linear scan just as the usual JIT compiler would do though (and from my understanding, something CCC should do at a simple cost), but most of the "hard part" of compiler today is actually optimization -- frontend is mostly a solved problem if you accept some hacks, unlike me who is still looking for an elegant academic solution to the typedef problem.
> Combined over a billion iterations: 158,000x total slowdown
I don't think that's a valid explanation. If something takes 8x as long then if you do it a billion times it still takes 8x as long. Just now instead of 1 vs 8 it's 1 billion vs 8 billion.
I'd be curious to know what's actually going on here to cause a multiple order of magnitude degradation compared to the simpler test cases (ie ~10x becomes ~150,000x). Rather than I-cache misses I wonder if register spilling in the nested loop managed to completely overwhelm L3 causing it to stall on every iteration waiting for RAM. But even that theory seems like it could only account for approximately 1 order of magnitude, leaving an additional 3 (!!!) orders of magnitude unaccounted for.
I think there's a lot more to the story here.
"The miracle is not that the bear can dance well, it's that the bear can dance at all."
- Old Russian proverb.
This compiler experiment mirrors the recent work of Terence Tao and Google. The "recipe" is an LLM paired with an external evaluator (GCC) in a feedback loop.
By evaluating the objective (successful compilation) in a loop, the LLM effectively narrows the problem space. This is why the code compiles even when the broader logic remains unfinished/incorrect.
It’s a good example of how LLMs navigate complex, non-linear spaces by extracting optimal patterns from their training data. It’s amazing.
p.s. if you translate all this to marketing jargon, it’ll become “our LLM wrote a compiler by itself with a clean room setup”.
Edit: typo
A few things to note:
1. In the real world, for a similar task, there are little reasons for: A) not giving the compiler access to all the papers about optimizations, ISAs PDFs, MIT-licensed compilers of all the kinds. It will perform much better, and this is a proof that the "uncompressing GCC" is just a claim (but even more point 2).
2. Of all the tasks, the assembler is the part where memorization would help the most. Instead the LLM can't perform without the ISA documentation that it saw repreated infinite number of times during pre-training. Guess what?
3. Rust is a bad language for the test, as a first target, if you want an LLM-coded Rust C compiler, and you have LLM experience, you would go -> C compiler -> Rust port. Rust is hard when there are mutable data structures with tons of references around, and a C compiler is exactly that. To compose complexity from different layers is an LLM anti pattern that who worked a lot with automatic programming knows very well.
4. In the real world, you don't do a task like that without steering. And steering will do wonders. Not to say that the experiment was ill conceived. The fact is that the experimenter was trying to show a different point of what the Internet got (as usually).
CCC was and is a marketing stunt for a new model launch. Impressive, but still suffers from the same 80:20 rule. These 20% are optimizations, and we all know where the devel in “let me write my own language”.
I think AI will definitely help to get new compilers going. Maybe not the full product, yet. But it helps a lot to create all the working parts you need to get going. Taking lengthy specs and translating them into code is something AI does quite well - I asked it to give me a disassembler - and it did well. So, if you want to make a new compiler, you now don't have to read all the specs and details beforehand. Just let the AI mess with e.g. PE-Headers and only take care later if something in that area doesn't work.
Great article but you have to keep in mind that it was pure marketing, the real interesting question is to pass the same benchmark to CC an ask it to optimize in a loop, and see how long it takes for it to come up with something decent.
That’s the whole promise to reach AGI that it will be able to improve itself.
I think Anthropic ruined this by releasing it too early would have been way more fun to have seen a live website where you can see it iterating and the progress is making.
Vibe coding is entertainment. Nothing wrong about entertainment, but when totally clueless people connect to their bank account, or control their devices with vibe coded programs, someone will be entertained for sure.
Large language models and small language models are very strong for solving problems, when the problem is narrow enough.
They are above human average for solving almost any narrow problem, independent of time, but when time is a factor, let's say less than a minute, they are better than experts.
An OS kernel is exactly a problem, that everyone prefers to be solved as correct as possible, even if arriving at the solution takes longer.
The author mentions stability and correctness of CCC, these are properties of Rust and not of vibe coding. Still impressive feat of claude code though.
Ironically, if they populated the repo first with objects, functions and methods with just todo! bodies, be sure the architecture compiles and it is sane, and only then let the agent fill the bodies with implementations most features would work correctly.
I am writing a program to do exactly that for Rust, but even then, how the user/programmer would know beforehand how many architectural details to specify using todo!, to be sure that the problem the agent tries to solve is narrow enough? That's impossible to know! If the problem is not narrow enough, then the implementation is gonna be a mess.
One missing analysis, that IMHO is the most important right now, is : what is the quality of the generated code ?
Having LLM generates a first complete iteration of a C compiler in rust is super useful if the code is of good enough quality that it can be maintained and improved by humans (or other AIs). It is (almost) completely useless otherwise.
And that is the case for most of today's code generated by AIs. Most of it will still have to be maintained by humans, or at least a human will ultimately be responsible for it.
What i would like to see is whether that C compiler is a horrible mess of tangled spaghetti code with horrible naming. Or something with a clear structure, good naming, and sensible comments.
The prospect of going the last mile to fix the remaining problems reminds me of the old joke:
"The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time."
Can someone explain to me, what’s the big deal about this? The AI model was trained on lots of code and spit out sonething similar than gcc. Why is this revolutionary?
I curious, maybe AI learn too much code from human writed compilers. What if invent a fresh new language, and let AI write the compiler, if the compiler works well I think that is the true intelligent.
> Someone got it working on Compiler Explorer and remarked that the assembly output “reminds me of the quality of an undergraduate’s compiler assignment”. Which, to be fair, is both harsh and not entirely wrong when you look at the register spilling patterns.
This is what I've noticed about most LLM generated code, its about the quality of an undergrad, and I think there's a good reason for this - most of the code its been trained on is of undergrad quality. Stack overflow questions, a lot of undergrad open source projects, there are some professional quality open source projects (eg SqlLite) but they are outweighed by the mass of other code. Also things like Sqllite don't compare to things like Oracle or Sql Server which are proprietary.
Seeing that Claude can code a compiler doesn't help anyone if it's not coded efficiently, because getting it to be efficient is the hardest part, and it will be interesting seeing how long it will take to make it efficient. No one is gonna use some compiler that makes binaries run 700x longer.
I'm surprised that this wasn't possible before with just a bigger context size.
They should have gone one step further and also optimized for query performance (without editing the source code).
I have cough AI generated an x86 to x86 compiler (takes x86 in, replaces arbitrary instructions with functions and spits x86 out), at first it was horrible, but letting it work for 2 more days it was actually close to only 50% to 60% slowdown when every memory read instruction was replaced.
Now that's when people should get scared. But it's also reasonable to assume that CCC will look closer to GCC at that point, maybe influenced by other compilers as well. Tell it to write an arm compiler and it will never succeed (probably, maybe can use an intermeriadry and shove it into LLVM and it'll work, but at that point it is no longer a "C" compiler).
> CCC compiled every single C source file in the Linux 6.9 kernel without a single compiler error (0 errors, 96 warnings). This is genuinely impressive for a compiler built entirely by an AI.
It would be interesting to compare the source code used by CCC to other projects. I have a slight suspicion that CCC stole a lot of code from other projects.
You, know, it sure does add some additional perspective to the original Anthropic marketing materia... ahem, I mean article, to learn that the CCC compiled runtime for SQLite could potentially run up to 158,000 times slower than a GCC compiled one...
Nevertheless, the victories continue to be closer to home.
What does the smallest (simplest in terms of complexity / lines of code) C-compiler that can compile and run SQLite look like?
Perhaps that would be a more telling benchmark to evaluate the Claude compiler against.
Give me self hosting: LLM generates compiler which compiles LLM training and inference suite, which then generates compiler which...
It seems like if Anthropic released a super cool and useful _free_ utility (like a compiler, for example) that was better than existing counterparts or solved a problem that hadn’t been solved before[0] and just casually said “Here is this awesome thing that you should use every day. By the way our language model made this.” it would be incredible advertising for them.
But they instead made a blog post about how it would cost you twenty thousand dollars to recreate a piece of software that they do not, with a straight face, actually recommend that you use in any capacity beyond as a toy.
[0] I am categorically not talking about anything AI related or anything that is directly a part of their sales funnel. I am talking about a piece of software that just efficiently does something useful. GCC is an example, Everything by voidtools is an example, Wireshark is an example, etc. Claude is not an example.
Gcc and clang are part of the training set, the fact that it did as bad as it did is what’s shocking
Does it work better for the intended purpose than their browser experiments? No… no it doesn’t
I wonder how well an LLM would do for a new CPU architecture for which no C compiler exists yet, just assembler.
It might be interesting to feed this report in and see what the coding agent swarm can improve on.
I had no idea that SQLite performance was in fact compiler-dependent. The more you know!
Why don't LLMs directly generate machine code?
Honest question: would a normal CS student, junior, senior, or expert software developer be able to build this kind of project, and in what amount of time?
I am pretty sure everybody agrees that this result is somewhere between slop code that barely works and the pinnacle of AI-assisted compiler technology. But discussions should not be held from the extreme points. Instead, I am looking for a realistic estimation from the HN community about where to place these results in a human context. Since I have no experience with compilers, I would welcome any of your opinions.
Correct me if I am wrong. But Claude has probably been trained on gcc, so why oh why doesn't it one shot a faster and better compiler?
Did Anthropic release the scaffolding, harnesses, prompts, etc. they used to build their compiler? That would be an even cooler flex to be able to go and say "Here, if you still doubt, run this and build your own! And show us what else you can build using these techniques."
The level of discourse I've seen on HN about this topic is really disappointing. People not reading the actual article in detail, just jumping to conclusions "it basically copied gcc" etc etc. Taking things out of context, or worse completely misrepresenting what the author of the article was trying to communicate.
We act so superior to LLMs but I'm very unimpressed with humanity at this stage.
This is a good example of ALL AI slop. You get something barely working, and are faced with the next problem:
- Deal with legacy code from day one.
- Have mess of a codebase that is most likely 10-20x the amount of LOC compared to human code
- Have your program be really slow and filled with bugs and edge cases.
This is the battlefield for programmers. You either just build the damn thing or fix bugs for the next decade.
mehh
But gcc is part of it's training data so of course it spit out an autocomplete of a working compiler
/s
This is actually a nice case study in why agentic LLMs do kind of think. It's by no means the same code or compiler. It had to figure out lots and lots of problems along the way to get to the point of tests passing.
[dead]
Since Claude Code can browse the web, is it fair to think of it as “rewriting and simplifying a compiler originally written in C++ into Rust”?
I think this is a great example of both points of view in the ongoing debate.
Pro-LLM coding agents: look! a working compiler built in a few hours by an agent! this is amazing!
Anti-LLM coding agents: it's not a working compiler, though. And it doesn't matter how few hours it took, because it doesn't work. It's useless.
Pro: Sure, but we can get the agent to fix that.
Anti: Can you, though? We've seen that the more complex the code base, the worse the agents do. Fixing complex issues in a compiler seems like something the agents will struggle with. Also, if they could fix it, why haven't they?
Pro: Sure, maybe now, but the next generation will fix it.
Anti: Maybe. While the last few generations have been getting better and better, we're still not seeing them deal with this kind of complexity better.
Pro: Yeah, but look at it! This is amazing! A whole compiler in just a few hours! How many millions of hours were spent getting GCC to this state? It's not fair to compare them like this!
Anti: Anthropic said they made a working compiler that could compile the Linux kernel. GCC is what we normally compile the Linux kernel with. The comparison was invited. It turned out (for whatever reason) that CCC failed to compile the Linux kernel when GCC could. Once again, the hype of AI doesn't match the reality.
Pro: but it's only been a few years since we started using LLMs, and a year or so since agents. This is only the beginning!
Anti: this is all true, and yes, this is interesting. But there are so many other questions around this tech. Let's not rush into it and mess everything up.