I work at a FAANG.
Professionally, I have had almost no luck with it, outside of summarizing design docs or literally just finding something in the code that a simple search might not find: such is this team's code that does X?
I am yet to successfully prompt it and get a working commit.
Further, I will add that I also don't know any ICs personally who have successfully used it. Though, there's endless posts of people talking about how they're now 10x more productive, and everyone needs to do x y an z now. I just don't know any of these people.
Non-professionally, it's amazing how well it does on a small greenfield task, and I have seen that 10x improvement in velocity. But, at work, close to 0 so far.
Of the posts I've seen at work, they typically tend to be teams doing something new / greenfield-ish or a refactor. So I'm not surprised by their results.
I can second this. I’ve never had a problem writing short scripts and glue code in stuff ive mastered. In places I actually need help, I’m finding it slows me down.
Wow, that's such a drastic different experience than mine. May I ask what toolset are you using? Are you limited to using your home grown "AcmeCode" or have full access to Claude Code / Cursor with the latest and greatest models, 1M context size, full repo access?
I see it generating between 50% to 90% accuracy in both small and large tasks, as in the PRs it generates range between being 50% usable code that a human can tweak, to 90% solution (with the occasional 100% wow, it actually did it, no comments, let's merge)
I also found it to be a skillset, some engineers seem to find it easier to articulate what they want and some have it easier to think while writing code.
Also at FAANG. I think I am using the tools more than my peers based on my conversations. The first few times I tried our AI tooling, it was extremely hit and miss. But right around December the tooling improved a lot, and is a lot more effective. I am able to make prototypes very quickly. They are seldom check-in ready, but I can validate assumptions and ideas. I also had a very positive experience where the LLM pointed out a key flaw in an API I had been designing, and I was able to adjust it before going further into the process.
Once the plan is set, using the agentic coder to create smaller CLs has been the best avenue for me. You don't want to generate code faster than you and your reviewers can comprehend it. It'll feel slow, but check ins actually move faster.
I will say it's not all magic and success. I have had the AI lead me down some dark corners, assuring me one design would work when actually it is a bit outdated or not quite the right fit for the system we are building for because of reasons. So, I wouldn't really say that it's a 10x multiplier or anything, but I'm definitely getting things done faster than I could on my own. Expertise on the part of the user is still crucial.
One classic issue I used to run into, is doing a small refactor and then having to manually fix a bunch of tests. It is so much simpler to ask the LLM to move X from A to B and fix any test failures. Then I circle back in a few minutes to review what was done and fix any issues.
The other thing is, it has visibility for the wider code base, including some of our infrastructure that we're dependent on. There have been a couple times in the past quarter where our build is busted by an external team, and I am able to ask the LLM given the timeframe and a description of the issue, the exact external failure that caused it. I don't really know how long it would have taken to resolve the issue otherwise, since the issues were missed by their testing. That said, I gotta wonder if those breakages were introduced by LLM use.
My job hasn't been this fun in a long, long time and I am a little uneasy about what these tools are going to mean for my personal job security, but I don't know how we can put the genie back into the bottle at this point.
Same here. My take is that the codebase is too large and complex for it to find the right patterns.
It does work sometimes. The smaller the task, the better.
This checks out logical speaking.
The FANG code basis are very large and date back years might not necessarily be using open source frameworks rather in house libraries and frameworks none of which are certainly available to Anthropic or OpenAI hence these models have zero visibility into them.
Therefore combined with the fact that these are not reasoning or thinking machines rather probabilistic (image/text) generators, they can't generate what they haven't seen.
Not a FAANG engineer but also working at a pretty large company and I want to say you're spot on 1000%. It's insane how many "commenters" come out of the woodwork to tell you you're doing x or y wrong. They may not even frame it that way, but use a veneer of questions "what is your process like? Have you tried this product, etc." as a subtle way of completely dismissing your shared experience.
Can you elaborate on the shortcomings you find in professional setting that aren't coming up on personal projects? With it handling greenfield tasks are you perhaps referring to the usual sort of boilerplate code/file structure setup that is step 0 with using a lot of libraries?
>I am yet to successfully prompt it and get a working commit.
May I ask what you're working on?
Can you provide an example of how you actually prompt AI models? I get the feeling the difference among everyone's experiences has to do with prompting and expectation.
Experience depends on which FAANG it is. Amazon for example doesn't allow Claude Code or Codex so you are stuck with whatever internal tool they have
Meta, despite competing with these, is open to let their devs use better off the shelf tools.
Could you say more on how the tasks where it works vs. doesn't work differ? Just the fact that it's both small and greenfield in the one case and presumably neither in the other?
This is wild. I’m on the other end.
I’ve probably prompted 10,000 lines of working code in the last two months. I started with terraform which I know backwards and forwards. Works perfectly 95% of the time and I know where it will go wrong so I watch for that. (Working both green field, in other existing repos and with other collaborators)
Moved on to a big data processing project, works great, needed a senior engineer to diagnose one small index problem which he identified in 30s. (But I’d bonked on for a week because in some cases I just don’t know what I don’t know)
Meanwhile a colleague wanted a sample of the data. Vibe coded that. (Extract from zip without decompressing) He wanted randomized. One shot. Done. Then he wanted randomized across 5 categories. Then he wanted 10x the sample size. Data request completed before the conversion was over. I would have worked on that for three hours before and bonked if I hit the limit of my technical knowledge.
Built a monitoring stack. Configured servers, used it to troubleshoot dozens of problems.
For stuff I can’t do, now I can do. For stuff I could do with difficulty now I can do with ease. For stuff I could do easily now I can do fast and easy.
Your vastly different experience is baffling and alien to me. (So thank you for opening my eyes)