Wut? I pilot LLMs all day but there's no way in hell I'd agree to be at the helm of...

iandanforth • today at 1:11 PM • 18 replies • view on HN

Wut? I pilot LLMs all day but there's no way in hell I'd agree to be at the helm of a finance product. That first pillar is still there. Maybe the author isn't aware of the impact they have, but I know, with the evidence of reverted PRs, that when I step outside my area of deep knowledge I can no longer call BS on the agents. Our most capable agent, with access to the same kind of distributed systems the author talks about, is regularly wrong, frequently myopic, and just outright dumb constantly. It's the expertise of engineers on the team that push it back on track.

Replies

t34t34r43 • today at 1:46 PM

Posting this under a burner so I don't dox myself: I work in FinTech on a regulated product. We have access to Mythos. Mythos identified part of our codebase that it confidently asserted was not complaint with a particular regulation and we were at grave risk by allowing it to operate the way it was.

Except this was not the case, it had of course hallucinated what the regulation actually required (I know this because the code in question had already been reviewed by human counsel). This is (supposedly) the most bleeding-edge model available.

We use a lot of genAI to help us write code, but there is no way in the mid-term we could ever rely on these tools to actually build compliant financial products. We'd have to be totally mad. Yes, lots of Fintech companies are using these agents to accelerate, but anyone who's using them to actually ship product without a human actually digging into it is opening themselves up to a world of risk.

➕ show 15 replies

dakiol • today at 1:35 PM

> It's the expertise of engineers on the team that push it back on track.

But how are you so sure your colleagues are not more "expert" than you? Prior LLMs there was room for very good engineers and mediocre engineers to work together in 99% of the companies out there. With LLMs, only the "best" engineers will survive, because nobody needs mediocre engineers anymore.

This being HN, I imagine every engineer reading this thinks they are in top the 10-5% of their company/city/country, and therefore they think they are not "mediocre" engineers that can get affected by the introduction of LLMs. Statistically, they are probably wrong. So, it's all about ego. Chances are you are not a rockstar and LLMs will eventually take over your job.

As usual, the only winners here are corporations and executives. Most of us are the last monkeys in the chain, and so we'll get screwed.

➕ show 6 replies

lelanthran • today at 1:33 PM

> Wut? I pilot LLMs all day but there's no way in hell I'd agree to be at the helm of a finance product.

Dunno how much longer that is going to remain true for your specific employer - all the fintech companies I deal with personally have had some sort of AI account for their devs since last year.

Even places like jane street have employees posting blogs (one of which was on HN frontpage about 60m ago) saying they mostly direct agents.

How long do you think your specific employer is going to hold out?

➕ show 1 reply

jalev • today at 1:17 PM

Unfortunately every software related industry is embracing LLM/Codegen. Your banks, fintechs, insurance. Everyone. Your concerns are the same I'm having, yet it's regularly dismissed or hand-waved away as "don't worry about it the delivery velocity/ROI is worth it"

➕ show 2 replies

quijoteuniv • today at 5:47 PM

Norwegians have a saying: “Den som er ferdig utlært, er ikke utlært – men ferdig.” Meaning if you are finished with learning the one that is finished is you. Typical scandinavic hard cold truth…

I understand the frustration of spending years nurturing a skill and then seeing its value decline.But this isn’t really an LLM problem. The same thing happened to factory workers, typists, draftsmen, and many others before. The technology changes, but the underlying issue is the economic system we live in, where the market can suddenly decide that something you’ve spent years mastering is worth much less than before.

LLMs are not creating that dynamic. They’re just accelerating it.

abhgh • today at 2:37 PM

Reg PRs - for the ones with complex requirements what I am seeing is that time to initial PR is very short, and a ping-pong between the reviewer and developer begins, because in my cases (not all) the developer vibe-coded parts, and they didn't really understand the requirements deeply or their code, and it takes multiple iterations for them to fix it. You can argue this is a human problem but this is the net effect I'm seeing.

I am not sure but for complex cases it seems to me that the earlier sum of moderately long PR time + moderately long review time has been replaced by very short PR time + even longer review time. I am not sure if there's a net gain in these cases. Sometimes even if the code is functionally correct, it's verbose enough (e.g., too many intermediate functions) that I think they will impact future reviews.

SlinkyOnStairs • today at 2:25 PM

> That first pillar is still there. Maybe the author isn't aware of the impact they have, but I know, with the evidence of reverted PRs, that when I step outside my area of deep knowledge I can no longer call BS on the agents. Our most capable agent, with access to the same kind of distributed systems the author talks about, is regularly wrong, frequently myopic, and just outright dumb constantly. It's the expertise of engineers on the team that push it back on track.

I'd posit there's another layer. You have domain knowledge, certainly. But more valuable still is the wisdom to find more.

Anthropic and OpenAI can stick financial regulations in the training data all they want, but the AI systems will never learn to anticipate the future, or reach out to clients, partners, or regulators in complicated situations.

➕ show 1 reply

abustamam • today at 4:43 PM

Yeah I'm constantly shocked at how simultaneously smart and dumb Opus can be. It can tell me a LOT about my codebase but it will miss very critical clarifications that I begin with. And when I call it out it obviously remembered it, it just ignored it.

jrockway • today at 4:52 PM

I agree with this experience. LLMs are great and save me a lot of time, but they need frequent nudges to avoid going down a completely wrong path. I just don't feel like the management dream of "every engineer has 3 agents working for them full time" is quite a reality yet. I'm not saying it won't get there, or that I feel secure being a software engineer until I'm of retirement age, but I also think it's important to understand the limitation of the tools. You do need to know your codebase. You do need to iterate on small chunks of it at a time. You do need to carefully understand every line of code you're putting into production. LLMs are amazing at generating a lot of proposals, but you need to carefully consider each one.

Most surprising to me about the article was the desire for OP's company to use AI for design docs. I feel like AI-generated design docs are some of the worst -- basically treating English as a programming language. They aren't enjoyable to read, and they often miss the forest for the trees. A human written sketch explaining why we're here and what we're working towards is still meaningful and important. If you want code-level details of every decision and algorithm, we have code for that.

I have mixed feelings on whether these documents are useful LLM inputs. I did a project where I carefully paired with Claude Code on producing a specification that another model would actually implement. I'm not sure it saved me any time, and it was very un-fun. (I kind of blame Opus 4.7 xhigh for this. It ain't speedy.) I feel like I can nitpick code to get exactly what I want, but defining exactly what I want an auto-mode LLM to go and do, in English, is much more difficult. I don't think the PLAN.md I generated would have been useful for a human trying to understand the system (too verbose), and Claude Code still made its usual mistakes that I have reminded it a billion times not to make (t.Context() in tests, not context.Background()!), so I'm just not sure it was worth it. I would say I probably wouldn't do it again in the near future. A rough sketch to get humans on board and to get the high level details worked out, written by hand, and then pairing with the LLM on actually typing in the code seems the most productive to me. But I do try to go outside my comfort zone once in a while to test the edges of these tools. They are very impressive and are worth a lot of the hype. (I know I will never write a YAML file again. I hate it more than anything, and Claude is amazing at it. But I worry I wouldn't feel the same way if I hadn't already had 8 years of k8s experience.)

bwfan123 • today at 2:57 PM

> I pilot LLMs all day

Love the metaphor. Planes are sophisticated machines capable of auto-piloting, but humans are still needed to ultimately pilot the beast.

➕ show 2 replies

throwaway201606 • today at 6:15 PM

In software dev, for a big finance corp

I like your comment, want to try to expand on it

Comment long but there is a TL;DR at the bottom

My theory is that there are 4 areas to domain knowledge worth taking about here - there may be more but I like 2*2 matrices

1) explicit internal requirements - core of how the how the app should work towards achieving your business objectives - code expresses what should be done and to a pretty large extent, why it should be done - from business unit requirements - we are building a tool to do “X”

2) implicit internal requirements - core of how the how the app should work towards achieving your business parameters and constraints

eg profit = selling price - ( total of costs )

  - code expresses what should be done but really can’t express why. At best it is in the comments

eg if market is EU then tax = 30% (or some value for a table), AI can see what is being done but rationale is not explicit

3) implicit internal requirements - core of how the how the app should work towards achieving your business constraints - code expresses what should be done but really can’t express why. At best it is in the comments

eg if item is “rocket” , shipping = $1m ( we only make rockets in Antarctica and shipping from there is $1m)

4) implicit external requirements - core of how the how the app should work towards achieving your business constraints - code expresses what should be done but really can’t express why. At best it is in the comments

eg if item is “rocket” , add a 3 month gating stage to get approval from government to sell the item and do not collect payment till gating approved - AI can see the code but has no idea why it has to be done

These come from partners, regulation, compliance, auditability etc

So, my theory

AI can be good at the explicit stuff trivially (1, 2) but cannot be good at the implicit stuff (3,4)

It might be able to figure out implicit stuff needs to be done but will probably not be able to figure out why it needs to be done and it will definitely not be able to definitively figure out edge cases for when to do it / not do it

As long as you focus on implicit stuff, you will be fine for a little bit

TL;DR - become good and keep being good at being the person who understands the implicit external drivers of software dev

znpy • today at 1:51 PM

You pilot LLMs all day but that might not last.

A lot of companies are investing money on “ai factories” that are join to automate a lot of software development (that is, steer LLMs) on the basis of jira tickets (or linear/trello cards or whatever).

micromacrofoot • today at 2:29 PM

a year ago I would have agreed, but the gap is getting smaller all the time... these things can do 90% of the work, and how many people does a company really need for the remaining 10%? certainly not as many as they needed before

➕ show 1 reply

root-parent • today at 5:40 PM

[dead]

jkwang • today at 2:47 PM

[flagged]

keyle • today at 1:13 PM

[flagged]

➕ show 2 replies

alt Hacker News

Replies