The demos I see for these types of tools are always some toy project and doesn't reflect day to day work I do at all. Do you have any example PRs on larger more complex projects that have been written with codebuff and how much of that was human interactive?
The real problem I want someone to solve is helping me with the real niche/challenging portion of a PR, ex: new tiptap extension that can do notebook code eval, migrate legacy auth service off auth0, record and replay API GET requests and replay a % of them as unit tests, etc.
So many of these tools get stuck trying to help me "start" rather than help me "finish" or unblock the current problem I'm at.
Watching the demo it seems like it would be more effective to learn the skills you need rather than using this for a decade.
It takes 5+ seconds just to change one field to dark mode, I don't even want to imaigne a situation where I have two fields and I want to explain that I need to change this field and not that field.
I'm not sure who is the target audience for this, people who want to be programmers without learning programming ?
Every experience I have had with LLMs generating code. LLMs tend to follow the prompt much too closely and produce large amounts of convoluted code that in the end prove not only unnecessary but quite toxic.
Where LLMs shine is in being a personal Stack Overflow: asking a question and having a personalized, specific answer immediately, that uses one's data.
But solving actual, real problems still seem out of reach. And letting them touch my files sound crazy.
(And yes, ok, maybe I just suck at prompting. But I would need detailed examples to be convinced this approach can work.)
> Do you have any example PRs on larger more complex projects that have been written with codebuff and how much of that was human interactive?
We have a lot of code in production which are AI written. The important thing is that you need to consciously make a module or project AI-ready. This means that things like modularity and smaller files are even more important than they usually are.
I can't share those PRs, but projects on my profile page are almost entirely AI written (except the https://bashojs.org/ link). Some of them might meet your definition of niche based on the example you provided.
Kind of like "please describe the solution and I will write code to do it". That's not how programming works. Writing code and testing it against expectations to get to the solution, that's programming.
> ex: new tiptap extension that can do notebook code eval
Claude wrote me a prosemirror extension doing a bunch of stuff that I couldn’t figure out how to do myself. It was very convenient.
+1; Ideally I want a tool I don't have to specify the context for. If I can point it via config files at my medium-sized codebase once (~2000 py files; 300k LOC according to `cloc`) then it starts to get actually usable.
Cursor Composer doesn't handle that and seems geared towards a small handful of handpicked files.
Would codebuff be able to handle a proper sized codebase? Or do the models fundamentally not handle that much context?
It's pretty good for complex projects imo because codebuff can understand the structure of your codebase and which files to change to implement changes. It still struggles when there isn't good documentation, but it has helped me finish a number of projects
Absolutely! Imaging setting a bunch of css styles through a long winded AI conversation, when you could have an IDE to do it in a few seconds. I don't need that.
The long tail of niche engineering problems is the time consuming bit now. That's not being solved at all, IMHO.
Great question – we struggled for a long time to put our demo together precisely for this reason. Codebuff is so useful in a practical setting, but we can't bore the audience with a ton of background on a codebase when we do demos, so we have to pick a toy project. Maybe in the future, we could start our demo with a half-built project?
Hopefully the demo on our homepage shows a little bit more of your day-to-day workflows than other codegen tools show, but we're all ears on ways to improve this!
To give a concrete example of usefulness, I was implementing a referrals feature in Drizzle a few weeks ago, and Codebuff was able to build out the cli app, frontend, backend, and set up db schema (under my supervision, of course!) because of its deep understanding of our codebase. Building the feature properly requires knowing how our systems intersect with one another and the right abstraction at each point. I was able to bounce back and forth with it to build this out. It felt akin to working with a great junior engineer, tbh!
EDIT: another user shared their use cases here! https://news.ycombinator.com/item?id=42079914
I hear you. This is actually a foundational idea for Codebuff. I made it to work within the large-ish codebase of my previous startup, Manifold Markets.
I want the demos to be of real work, but somehow they never seem as cool unless it's a neat front end toy example.
Here is the demo video I sent in my application to YC, which shows it doing real stuff: https://www.loom.com/share/fd4bced4eff94095a09c6a19b7f7f45c?...