Your job is to deliver code you have proven to work

696 points • by simonw • yesterday at 2:52 PM • 559 comments • view on HN

Comments

Your job isn’t to deliver code that works, it’s to successfully[1] operationalize business logic.

[1] I.e., it should work

That may seem pedantic but that’s a huge difference. Code is a means to an end. If no-code suddenly became better than code through some miracle, that would be your job.

This also means that if one day AI stops making mistakes, tossing AI requests over the wall may be a legitimate modus operandi.

lifeisstillgood • yesterday at 8:10 PM

I’m going to go with this as probably in the top three definitions of software developer …

along with

- the job was better titled as “Analyst Programmer” - you need both.

And

- you can make a changeset, but you have to also sell the change

acrophiliac • yesterday at 4:45 PM

Perhaps off-topic, but: "Testing doesn't show the absence of errors, it shows the presence of errors" Willison says we need to submit code we have proven to work but then argues for empirical testing, not actual correctness proofs.

➕ show 1 reply

gaigalas • yesterday at 3:22 PM

> Make your coding agent prove it first

Agents love to cheat. That's an issue I don't see a horizon for change.

Here's Opus 4.5 trying to cheat its way out of properly implementing compatibility and cross-platform, despite the clear requirements:

https://gist.github.com/alganet/8531b935f53d842db98157e1b8c0...

> Should popen handles work with fgets/fread/fwrite? PHP supports this. Option A: Create a minimal pipe_io_stream device / Option B: Store FILE* in io_private with a flag / Option C: Only support pclose, require explicit stream wrapper for reads.

If I asked for compatibility, why give me options that won't fully achieve it?

It actually tried to "break check" my knowledge about the interpreter (test me if I knew enough to catch it), and proposed shortcuts all the way through the chat.

I don't want to have to pepper my chats with variations on "don't cheat". I mean, I can do it, but it seems like boilerplate.

I wish I had some similar testing-related chats to share. Agents do that all the time.

This is the major blocker right now for AI-assisted automated verification, and one of the reasons why this isn't well developed beyond general directions (give it screenshots, make it run the command, etc).

givemeethekeys • yesterday at 5:21 PM

Sorry that’s not what it says in my job description.

mellosouls • yesterday at 4:22 PM

Thing is, this has always been the case. One of the problems with LLM-assisted coding is the idea that just because we're in a new era (we certainly are), the old rules can all be discarded.

The title doesn't go far enough - slop (AI or otherwise) can work and pass all the tests, and still be slop.

➕ show 2 replies

gorjusborg • yesterday at 7:38 PM

Your actual job is to produce positive outcomes for your stakeholders. Code can be part of that, but doesn't have to be.

If you are dumping AI slop on your team to sort through, you are creating drag on the entire team's efforts toward those positive outcomes.

As someone getting dumped upon, you probably should make the decision (in line with the objective to producing positive outcomes) to not waste your time weeding through that stuff.

Review everything else, make it clear that the mess is not reviewable, and communicate that upward if needed.

casey2 • today at 12:09 AM

It comes out of the AI, that is proof enough. Why would I have prompted it and gave it to you if I didn't think that the AI could handle it? The real risk is closer to "people carry some preconceived notion about code that doesn't map to AI code." such as, for example, the person who contributed the code knows about the problem in enough detail to be accountable in the short term. Or at the very least be able to tell you why they made a PR at all

How to prove it has been subject to some debate for the past century, the answer is that it's context dependent to what degree you will or even can prove the program and exposed identifiers correct. Programming is a communication problem as well as a math problem, often an engineering problem too. Only the math portion can be proved, the a small by critical amount engineering portion tested.

Communication is the most important for velocity it's the difference between hand rolling machine code and sshing into a computer halfway across the world having every tool you expect. If you don't trust that webdevs know what they are doing then you can be the most amazing dev in the world you but your actual ability to contribute will be hampered. The same is true of vibe coding, if people aren't on the same page as to what is and isn't acceptable velocity starts to slow down.

Languages have not caught up to AI tools, since AI operates well above the function level, what level would be appropriate to be named and signed off on? pull request and link to the chat as a commit? (what is wrong with that that could be fixed at the name level)

Honest communication is the most important. Amazon telling investors that they use TLA+ is just signaling that they "for realz take uptime very seriously guize", "we know distributed systems" and engineering culture. The honest reality is that they could prove all their code and not IMprove their uptime one lick, because most of what they run isn't their code. It's a communication breakdown if effort gets spent on that outside a research department.

t1234s • yesterday at 6:32 PM

Bravo.. best headline I've read in a long time. This phrase should be a desktop background.

nrhrjrjrjtntbt • yesterday at 7:57 PM

Always has been

johnea • yesterday at 10:10 PM

I couldn't agree more with the sentiment.

If you, the development engineer, haven't demonstrated the product to work as expected, and preferably this testing is independently confirmed by a product test group, then you can't claim to be delivering a functional product.

I would add though, that management, specifically marketing management setting unreasonable demands and deadlines, are a bigger threat to testing than LLMs.

Of course the damage done by LLM generated code not being tested, is additive to the damage management is doing.

So this isn't any kind of apologism, the two sources are both making the problem worse.

nish__ • yesterday at 5:02 PM

Good framing.

llm_nerd • yesterday at 5:46 PM

"the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest."

Kind of depressing how it has become such a trope of blaming juniors for every ill or bad habit. In all likelihood the reader of this comment has a number of terrible habits, working on teams with terrible habits, and juniors play zero part in it.

And, I mean, on that theme developers have been doing this for as long as we've had large teams. I've worked at a large number of teams where there was the fundamental principal that QA / UA holds responsibility. That they are responsible for tests, and they are responsible for bad code making it through to the product / solution. Developers -- grizzled, excellent-CV devs -- would toss over garbage code and call it a day.

annjose • yesterday at 5:48 PM

I came here to say

1) Amen 2) I wonder if this is isolated to junior dev only? Perhaps it seems like that because junior devs do more AI assisted coding than seniors?

morning-coffee • yesterday at 4:59 PM

Amen

imiric • yesterday at 3:38 PM

The job of a software developer is not just to prove that the software "works". The definition of "works" itself is often fuzzily defined and difficult to prove.

That is part of it, yes, but there are many others, such as ensuring that the new code is easy to understand and maintain by humans, makes the right tradeoffs, is reasonably efficient and secure, doesn't introduce a lot of technical debt, and so on.

These are things that LLMs often don't get right, and junior engineers need guidance with and mentoring from more experienced engineers to properly learn. Otherwise software that "works" today, will be much more difficult to make "work" tomorrow.

emsign • yesterday at 3:10 PM

As if! :)

fjfaase • yesterday at 6:15 PM

One more reason to work without branches and PRs. The future for CI/CD is bright ;-).

➕ show 1 reply

ekjhgkejhgk • yesterday at 3:18 PM

Oh look another "an opinionated X". Everything is opinionated these days, even opinions.

throwaway2027 • yesterday at 3:13 PM

It works on my machine ¯\_(ツ)_/¯

6510 • yesterday at 8:11 PM

I work alone, I have considerable amount of unfinished code laying around. Sometimes even multiple instances of a thing. I could see how it would be annoying in a team settings. The cause is not having the thing but how you organize it. Like with LLM slop it is wonderful to be able to scroll over something that shows what the solution might look like.

venturecruelty • yesterday at 6:37 PM

Lmao no, my job is to make the line go up and make my boss happy. It was ever thus.

nolineshere • yesterday at 5:26 PM

[dead]

sapphirebreeze • yesterday at 6:14 PM

[dead]

TheSamFischer • yesterday at 6:01 PM

[dead]

ekjhgkejhgk • yesterday at 3:21 PM

[flagged]

koakuma-chan • yesterday at 3:26 PM

[flagged]

alexgotoi • yesterday at 5:28 PM

[flagged]

➕ show 2 replies

9rx • yesterday at 3:04 PM

> Your job is to deliver code you have proven to work.

Your job is to solve customer problems. Their problems may only be solvable with code that is proven to work, but it is equally likely (I dare say even more likely) that their problem isn't best solved with code at all, or even solved with code that doesn't work properly but works well enough.

➕ show 1 reply

daedrdev • yesterday at 3:02 PM

Maybe in an ideal world

webdev1234568 • yesterday at 3:09 PM

Whole article seems very much all llm generated

Edit: I'm an idiot ignore me.

➕ show 4 replies

zkmon • yesterday at 3:12 PM

How about letting LLMs maintain a vast number of product versions all available at the same, which receive multiple versions of untested versions of the same patch, from LLMs, and then let the models elect a version of the software based on probabilistic or gradient methods? This elected version could change for different assessments. No human touches or looks at the code!

Just a wild thought, nothing serious.

➕ show 1 reply

Rperry2174 • yesterday at 3:15 PM

Im not fully convinced by "a computer can never be held accountable"

We already delegate accountability to non-humans all the time: - CI systems block merges - monitoring systems page people - test suites gate different things

In practice accountability is enforced by systems, not humans.. humans are defintiely "blamed" after the fact, but the day-to-day control loop is automated.

As agents get better at running code, inspecting ui state, correlating logs, screenshots, etc they're starting to operationally be "accountable" and preventing bad changes from shipping and producing evidence when something goes wrong .

At some point humans role shifts from "i personally verify this works" to "i trust this verification system and am accountable for configuring it correctly".

Thats still responsibility, but kind of different from whats described here. Taken to a logical extreme, the arguement here would suggest that CI shouldn't replace manual release checklists

➕ show 10 replies

SunshineTheCat • yesterday at 5:39 PM

I know this won't be popular, however, I think the idea of differentiating a "real developer" from one who relies mostly, or even solely on an LLM is coming to an end. Right now, I fully agree relying wholly upon an LLM and failing to test it is very irresponsible.

LLMs do make mistakes. They do a sloppy job at times.

But give it a year. Two years. five years. It seems unreasonable to assume they will hit a plateau that will prevent them from being able to build, test, and ship code better than any human on earth.

I say this because it's already happened.

It was thought impossible for a computer to reach the point of being able to beat a grandmaster at chess.

There was too much "art," experience, and nuance to the game that a computer could ever fully grasp or understand. Sure there was the "math" of it all, but it lacked the human intuition that many thought were essential to winning and could only be achieved through a lifetime of practice.

Many years following Deep Blue vs. Garry Kasparov, the best players in the world laugh at the idea of even getting close to beating Stockfish or any other even mediocre game engine.

I say all of this as a 15-year developer. This happens over and over again throughout history. Something comes along to disrupt an industry or profession and people scream about how dangerous or bad it is, but it never matters in the end. Technology is undefeated.

➕ show 4 replies

bluesnowmonkey • yesterday at 6:27 PM

> Your job is to deliver code you have proven to work.

First of all, no it’s not. Your job is to help the company succeed. If you write code that works but doesn’t help the company succeed, you failed. People do this all the time. Resume padding, for example.

Sometimes it’s better for the business to have two sloppy PRs than a single perfect one. You should be able to deliver that way when the situation demands.

Second, no one is out there proving anything. Like formal software correctness proofs? Yeah nobody does that. We use a variety of techniques like testing and code review to try to avoid shipping bugs, but there’s always a trade off between quality and speed/cost. You’re never actually 100% certain software works. You can buy more nines but they get expensive. We find bugs in 20+ year old software.

just_once • yesterday at 4:47 PM

I don't know if there's a word for this but this reads to me as like, software virtue signaling or software patronizing. It's bizarre to me to tell an engineer what their job is as a matter of fact and to claim a particular usage of a tool as mandated (a tool that no one really asked for, mind you), leveraging duty of all things.

I guess to me, it's either the case that LLMs are just another tool, in which case the already existing teachings of best practice should cover them (and therefore the tone and some content of this article is unnecessary) or they're something totally new, in which case maybe some of the already existing teachings apply, but maybe not because it's so different that the old incentives can't reasonably take hold. Maybe we should focus a little bit more attention on that.

The article mentions rudeness, shifting burdens, wasting people's time, dereliction. Really loaded stuff and not a framing that I find necessary. The average person is just trying to get by, not topple a social contract. For that, look upwards.

➕ show 2 replies

alt Hacker News

Your job is to deliver code you have proven to work

Comments