Most comments here surprise me: I am using Githubs Copilot / ChatGPT 4.0 at work with a code base which is mostly implements a basic CRUD service... and outside of small/trivial example (where the generated code is mostly okay), prompting is more often than not a total waste of time. Now, I wonder if I am just totally unable to write/refine good prompts for the LLM (as it works for smaller samples, I hope I am not too far off) or what could explain the huge discrepancy of experience. (Just for the record: I would totally not mind if the LLM writes the code for the stuff I have to do at work.)
To clarify my questions: - Who here uses LLMs to generate code for bigger projects at work? (>= 20k lines of code) - If you use LLMs for bigger projects: Do you need to change your prompting strategy to get good results? - What programming languages are you using in your code bases? - Are there other people here who experience that LLMs are no help for non trivial problems?
My employer gives me access to Jetbrains AI, I work on a Vue Frontend with a Kotlin Spring Boot backend.
The codebase is not too old and has grown without too much technical debt, with complex prompts I never had decent success. Its usefull for quick "what does this do" checks but any real functionality seems to be lacking.
Maybe I'm not refining my prompts good enough but doing so would take longer than implementing it myself.
Recently I tried Jetbrains Junie, which acts like Claude if I understand it correctly.
I had a really refined prompt, ran it three times with adjustments and fine tuning but the result was still lacking. So I tossed it and wrote it myself. But watching the machine nearly getting it right was still impressive.
You are just bad with prompting or working with very obscure language/framework or bad coding pattern or all of it. I had a talk with a seasoned engineer who has been coding for 50 years and has created many amazing things over lifetime about him having really bad results with AI tools I suggested for him. When I use AI for the same purposes in the same repo he's working on, it works nicely. When he does it, results are always not what he wants. It comes down to a combination of him not understanding how to guide the LLMs to correct direction and using a language/framework (he's not familiar with) he can't judge the LLMs output. It is really important to know what you want, be able to describe it in short points (but important points). Points that you know ai will mess up if you don't specify. And also be able to figure out which direction the ai is heading with the solution and correct it EARLY rather than later. Not overloading context/memory with unnecessary things. Focusing on key areas to improve and much more. I'm using AI to get solutions done that I can definitely do myself but it'll take a certain amount of time to hunt down all documentation, API/lib calls etc. With AI, 1/10th time is enough.
I've had massive success with java, js/TS, html css, go, rust, python, bitbucket pipelines/GitHub actions, cdk, docker compose, SQL, flutter/dart, swift etc.
Play with Cursor or Claude Code a bit and then make a decision. I am not on the this is going to replace Devs boat, but this has changed the way I code and approach things.
Copilot is just plain bad. The result is day and night compare with cursor + gemini 2.5 (of course with good prompting)
> Now, I wonder if I am just totally unable to write/refine good prompts for the LLM (as it works for smaller samples, I hope I am not too far off) or what could explain the huge discrepancy of experience.
Programming language / stack plays plays a big role, I presume.
Same here. We have a massive codebase with large classes and the LLMs are not very helpful. Frontend stuff is okay sometimes but the backend models are too complex at this point, I guess.
Tooling and available context size matters a lot. I'm having decent luck with Gemini 2.5 and Roo Code.
I'm in the same boat. I've largely stopped using these tools other than asking questions about a language that I'm less familiar with or a complex type in typescript for which it can be helpful (sometimes). Otherwise, I felt like I was just wasting my time and becoming lazier/worse as a developer. I do wonder whether LLMs have hit a wall and we're in a hype cycle.