What is the downside of using them to prototype? to generate throwaway code? What do we lose if we default to that behavior?
As usual with Oxide's RFDs, I found myself vigorously head-nodding while reading. Somewhat rarely, I found a part that I found myself disagreeing with:
> Unlike prose, however (which really should be handed in a polished form to an LLM to maximize the LLM’s efficacy), LLMs can be quite effective writing code de novo.
Don't the same arguments against using LLMs to write one's prose also apply to code? Was this structure of the code and ideas within the engineers'? Or was it from the LLM? And so on.
Before I'm misunderstood as a LLM minimalist, I want to say that I think they're incredibly good at solving for the blank page syndrome -- just getting a starting point on the page is useful. But I think that the code you actually want to ship is so far from what LLMs write, that I think of it more as a crutch for blank page syndrome than "they're good at writing code de novo".
I'm open to being wrong and want to hear any discussion on the matter. My worry is that this is another one of the "illusion of progress" traps, similar to the one that currently fools people with the prose side of things.
> LLMs are especially good at evaluating documents to assess the degree that an LLM assisted their creation!)
That's a bold claim. Do they have data to back this up? I'd only have confidence to say this after testing this against multiple LLM outputs, but does this really work for, e.g. the em dash leaderboard of HN or people who tell an LLM to not do these 10 LLM-y writing cliches? I would need to see their reasoning on why they think this to believe.
Strange to see no mention of potential copyright violations found in LLM-generated code (e.g. LLMs reproducing code from Github verbatim without respecting the license). I would think that would be a pretty important consideration for any software development company, especially one that produces so much free software.
> Wherever LLM-generated code is used, it becomes the responsibility of the engineer. As part of this process of taking responsibility, self-review becomes essential: LLM-generated code should not be reviewed by others if the responsible engineer has not themselves reviewed it. Moreover, once in the loop of peer review, generation should more or less be removed: if code review comments are addressed by wholesale re-generation, iterative review becomes impossible.
My general procedure for using an LLM to write code, which is in the spirit of what is advocated here, is:
1) First, feed in the existing relevant code into an LLM. This is usually just a few source files in a larger project
2) Describe what I want to do, either giving an architecture or letting the LLM generate one. I tell it to not write code at this point.
3) Let it speak about the plan, and make sure that I like it. I will converse to address any deficiencies that I see, and I almost always do.
4) I then tell it to generate the code
5) I skim & test the code to see if it's generally correct, and have it make corrections as needed
6) Closely read the entire generated artifact at this point, and make manual corrections (occasionally automatic corrections like "replace all C style casts with the appropriate C++ style casts" then a review of the diff)
The hardest part for me is #6, where I feel a strong emotional bias towards not doing it, since I am not yet aware of any errors compelling such action.
This allows me to operate at a higher level of abstraction (architecture) and remove the drudgery of turning an architectural idea into written, precise, code. But, when doing so, you are abandoning those details to a non-deterministic system. This is different from, for example, using a compiler or higher level VM language. With these other tools, you can understand how they work and rapidly have a good idea of what you're going to get, and you have robust assurances. Understanding LLMs helps, but thus not to the same degree.
The guide is generally very well thought, but I see an issue in this part:
It sets the rule that things must be actually read when there’s a social expectation (code interviews for example) but otherwise… remarks that use of LLMs to assist comprehension has little downside.
I find two problems with this:
- there is incoherence there. If LLMs are flawless in reading and summarization, there is no difference with reading the original. And if they aren’t flawless, then that flaw also extends to non social stuff.
- in practice, I haven’t found LLMs so good as reading assistants. I’ve send them to check a linked doc and they’ve just read the index and inferred the context, for example. Just yesterday I asked for a comparison of three technical books on a similar topic, and it wrongly guessed the third one rather than follow the three links.
There is a significant risk in placing a translation layer between content and reader.
> LLM-generated writing undermines the authenticity of not just one’s writing but of the thinking behind it as well.
I think this points out a key point.. but I'm not sure the right way to articulate it.
A human-written comment may be worth something, but an LLM-generated is cheap/worthless.
The nicest phrase capturing the thought I saw was: "I'd rather read the prompt".
It's probably just as good to let an LLM generate it again, as it is to publish something written by an LLM.
I've had the same thought about 'written' text with an LLM. If you didn't spend time writing it don't expect me to read it. I'm glad he seems to be taking a hard stance on that saying they won't use LLMs to write non-code artifacts. This principle extends to writing code as well to some degree. You shouldn't expect other people to peer review 'your' code which was simply generated because, again, you spent no time making it. You have to be the first reviewer. Whether these cultural norms are held firmly remains to be seen (I don't work there), but I think they represent thoughtful application of emerging technologies.
> Wherever LLM-generated code is used, it becomes the responsibility of the engineer. As part of this process of taking responsibility, self-review becomes essential: LLM-generated code should not be reviewed by others if the responsible engineer has not themselves reviewed it.
I think the review by the prompt writer should be at a higher level than another person who reviews the code.
If I know how to do something, it is easier for me to avoid mistakes while doing it. When I'm reviewing it it requires different pathways in my brain. Since there is code out there I'm drawn to that path, and I might not not always spot the problem points. Or code might be written in a way that I don't recognize, but still exhibits the same mistake.
In the past, as a reviewer I used to be able to count on my colleagues' professionalism to be a moat.
The size of the moat is inverse to the amount of LLM generated code in a PR / project. At a certain moment you can no longer guarantee that you stand behind everything.
Combine that with the push to do more faster, with less, meaning we're increasing the amount of tech debt we're taking on.
I would have expected at least some consideration of public perception, given the extremely negative opinions many people hold about LLMs being trained on stolen data. Whether it's an ethical issue or a brand hazard depends on your opinions about that, but it's definitely at least one of those currently.
> it is presumed that of the reader and the writer, it is the writer that has undertaken the greater intellectual exertion. (That is, it is more work to write than to read!)
This applies to natural language, but, interestingly, the opposite is true of code (in my experience and that of other people that I've discussed it with).
> assurance that the model will not use the document to train future iterations of itself.
believing this in 2025 is really fascinating. this is like believing Meta won’t use info they (i)legally collected about you to serve you ads
> Oxide employees bear responsibility for the artifacts we create, whatever automation we might employ to create them.
Yes, allow the use of LLMs, encourage your employees to use them to move faster by rewarding "performance" regardless of risks, but make sure to place responsibility of failure upon them so that when it happens, the company culture should not be blamed.
> Ironically, LLMs are especially good at evaluating documents to assess the degree that an LLM assisted their creation
Is there any evidence for this?
Nothing new here. Antirez for once has taken a similar stance on his YouTube video channel which has material on the topic. But it's worthwhile having a document like this publicly available by a company that the tech crowd seems to respect.
<offtopic> The "RFD" here stands for "Reason/Request for Decision" or something else? (Request for Decision doesn't have a nice _ring_ on it tbh). I'm aware of RFCs ofc and the respective status changes (draft, review, accepted, rejected) or ADR (Architectural Decision Record) but have not come across the RFD acronym. Google gave several different answers. </offtopic> </offtopic>
The idea that LLMs are amazing at comprehension but we are expected to read original documents seems contradictory to me? I’m also wary of using them as editors and losing the writers voice as that feels heavily prompt dependent and whether or not the writer does a final pass without any LLM. Asking someone else to re-write is losing your voice if you don’t have an opinion on how the re-write turns out
"LLMs are amazingly good at writing code" that one was good. I cant stop laughing.
> LLM-generated code should not be reviewed by others if the responsible engineer has not themselves reviewed it.
To extend that: If the LLM is the author and the responsible engineer is the genuine first reviewer, do you need a second engineer at all?
Typically in my experience one review is enough.
I wonder if they would be willing to publish the "LLMs at Oxide" advice, linked in the OP [1], but currently publicly inaccessible.
[1] https://github.com/oxidecomputer/meta/tree/master/engineerin...
Oxide’s approach is interesting because it treats LLMs as a tool inside a much stricter engineering boundary. Makes me wonder how many teams would avoid chaos if they adopted the same discipline.
>Wherever LLM-generated code is used, it becomes the responsibility of the engineer. As part of this process of taking responsibility, self-review becomes essential: LLM-generated code should not be reviewed by others if the responsible engineer has not themselves reviewed it
By this own article's standards, now there are 2 authors who don't understand what they've produced.
> LLMs are superlative at reading comprehension, able to process and meaningfully comprehend documents effectively instantly.
I couldn't disagree more. (In fact I'm shocked that Bryan Cantrill uses words like "comprehension" and "meaningfully" in relation to LLMs.)
Summaries provided by ChatGPT, conclusions drawn by it, contain exaggerations and half-truths that are NOT there in the actual original sources, if you bother enough to ask ChatGPT for those, and to read them yourself. If your question is only slightly suggestive, ChatGPT's tuning is all too happy to tilt the summary in your favor; it tells you what you seem to want to hear, based on the phrasing of your prompt. ChatGPT presents, using confident and authoritative language, total falsehoods and deceptive half-truths, after parsing human-written originals, be the latter natural language text, or source code. I now only trust ChatGPT to recommend sources to me, and I read those -- especially the relevant-looking parts -- myself. ChatGPT has been tuned by its masters to be a lying sack of shit.
I've recently asked ChatGPT a factual question: I asked it about the identity of a public figure (an artist) whom I had seen in a video on youtube. ChatGPT answered with "Person X", and even explained why Person X's contribution was so great to the piece of art in question. I knew the answer was wrong, so I retorted only with: "Source?". Then ChatGPT apologized, and did the exact same thing, just with "Person Y"; again explaining why Person Y was so influental in making that piece of art so great. I knew the answer was wrong still, so I again said: "Source?". And at third attempt, ChatGPT finally said "Person Z", with a verifiable reference to a human-written document that identified the artist.
FUCK ChatGPT.
Nobody has yet to explain how an LLM can be better than a well paid human expert.
I know I'm walking into a den of wolves here and will probably get buried in downvotes, but I have to disagree with the idea that using LLMs for writing breaks some social contract.
If you hand me a financial report, I expect you used Excel or a calculator. I don't feel cheated that you didn't do long division by hand to prove your understanding. Writing is no different. The value isn't in how much you sweated while producing it. The value is in how clear the final output is.
Human communication is lossy. I think X, I write X' (because I'm imperfect), you understand Y. This is where so many misunderstandings and workplace conflicts come from. People overestimate how clear they are. LLMs help reduce that gap. They remove ambiguity, clean up grammar, and strip away the accidental noise that gets in the way of the actual point.
Ultimately, outside of fiction and poetry, writing is data transmission. I don't need to know that the writer struggled with the text. I need to understand the point clearly, quickly, and without friction. Using a tool that delivers that is the highest form of respect for the reader.
> When debugging a vexing problem one has little to lose by using an LLM — but perhaps also little to gain.
This probably doesn't give them enough credit. If you can feed an LLM a list of crash dumps it can do a remarkable job producing both analyses and fixes. And I don't mean just for super obvious crashes. I was most impressed with a deadlock where numerous engineers and tried and failed to understand exactly how to fix it.
The empathy section is quite interesting
Hmmm, I'm a bit confused of their conclusions (encouraging use) given some of the really damning caveats they point out. A tool they themselves determine to need such careful oversight probably just shouldn't be used near prod at all.
Here's the only simple, universal law that should apply:
THOU SHALT OWN THE CODE THAT THOU DOST RENDER.
All other values should flow from that, regardless of whether the code itself is written by you or AI or by your dog. If you look at the values in the article, they make sense even without LLMs in the picture.
The source of workslop is not AI, it's a lack of ownership. This is especially true for Open Source projects, which are seeing a wave of AI slop PR's precisely because the onus of ownership is largely on the maintainers and not the upstart "contributors."
Note also that this does not imply a universal set of values. Different organizations may well have different values for what ownership of code means -- E.g. in the "move fast, break things" era of FaceBook, workslop may have been perfectly fine for Zuck! (I'd bet it may even have hastened the era of "Move fast with stable infrastructure.") But those values must be consistently applied regardless of how the code came to be.
"LLMs can be quite effective writing code de novo."
Maybe for simple braindead tasks you can do yourself anyway.
Try doing it on something actually hard or complex and they get it wrong 100/100 if they don't have adequate training data, and 90/100 if they do.
[dead]
[dead]
Cantrill jumps on every bandwagon. When he assisted in cancelling a Node developer (not a native English speaker) over pronouns he was following the Zeitgeist, now "Broadly speaking, LLM use is encouraged at Oxide."
He is a long way from Sun.
I fully disagree with 1) the stance, 2) the conclusions.
The problem with this text is it's a written anecdote. Could all be fake.
Find it interesting that the section about LLM’s tells when using it for writing is absolutely littered with emdashes
I disagree with LLM's as Editors. The about of — in the post is crazy.
Based on paragraph length, I would assume that "LLMs as writers" is the most extensive use case.
I had trouble getting past the Early Modern English tinge of the language used in this. It’s fun, but it distracts from the comprehension in attempt to just sound epic. It’s fine if you’re writing literature, but it comes off sounding uppity in a practical doc for devs. Writing is not just about conveying something in a mood you wish to set. Study how Richard Feynman and Warren Buffett communicated to their audiences; part of their success is that they speak to their people in the language all can easily understand.
Funny how the article states that "LLMs can be excellent editors" and then the post repeats all the mistakes that no editor would make:
1. Because reading posts like this 2. Is actually frustrating as hell 3. When everything gets dragged around and filled with useless anecdotes and 3 adjective mumbojumbos and endless emdashes — because somehow it's better than actually just writing something up.
Which just means that people in tech or in general have no understanding what an editor does.
A measured, comprehensive, and sensible take. Not surprising from Bryan. This was a nice line:
> it’s just embarrassing — it’s as if the writer is walking around with their intellectual fly open.
I think Oxide didn't include this in the RFD because they exclusively hire senior engineers, but in an organization that contains junior engineers I'd add something specific to help junior engineers understand how they should approach LLM use.
Bryan has 30+ years of challenging software (and now hardware) engineering experience. He memorably said that he's worked on and completed a "hard program" (an OS), which he defines as a program you doubt you can actually get working.
The way Bryan approaches an LLM is super different to how a 2025 junior engineer does so. That junior engineer possibly hasn't programmed without the tantalizing, even desperately tempting option to be assisted by an LLM.