Hey HN! We’re James and Brandon building Codebuff (https://codebuff.com). Codebuff is like Cursor Composer, but in your terminal: it modifies files based on your natural language requests. You can try it with `npm i -g codebuff` and start using it immediately for free. We have no login gate, and we give all accounts up to $20 worth of credits.
Codebuff is different because we simplified the input to one step: you type what you want done in your terminal and hit enter. Then Codebuff looks through your whole codebase and makes the edits it wants, to existing source files or new ones. It also can run your tests, the type checker, or install packages to fulfill your request.
Demo video: https://www.youtube.com/watch?v=dQ0NOMsu0dA
It all started at a hackathon. I was trying out Sonnet 3.5 which had recently come out and seeing if I could use it to write code. The script I cobbled together that day pulled codebase context in one step and used it to rewrite files with changes in the second step. This two step process still exists today. Incidentally, my hackathon script worked rather poorly and my demo failed to produce any useful code.
But that weekend I thought about the kind of errors it made, and realized that with more context on our codebase, it might have been able to get the change right. For example, it tried to create an endpoint on our server (at my previous startup), but it didn't know that you needed to edit 3 specific files to do this (yeah... our backend was not that clean). So I hand-wrote a guide to our codebase, like I was instructing a new hire. I put it in a markdown file and passed it into Sonnet 3.5's system prompt. And the crazy thing is that it started producing wayyy better code. So, I started getting excited. In fact, this code guide idea still exists in Codebuff today as knowledge.md files which are automatically read on every request.
I didn't think of this project as a startup idea at first. I thought it was just a simple script anyone could write. But after another week, I could see there were more problems to solve and it should be a product.
In the week between applying to YC and the interview, I could not get Codebuff to edit files consistently. I tried many prompting strategies to get it to replace strings in the original file, but nothing worked reliably. How could I face my interviewer if I could not get something basic like this to work? On the day before my interview, in a Hail Mary attempt, I fine-tuned GPT-4o to turn Claude's sketch of changes into a git patch, which would add and remove lines to make the edits. I only finished generating the training data late at night, and the fine-tuning job ran as I slept.
And, holy hell, the next morning it worked! I pushed it to production just in time for my YC interview with Dalton. Soon after, Brandon joined and we were off to the races.
So, how does Codebuff work exactly? You invoke it in your terminal, and it starts by running through the source files in that directory and subdirectories and parsing out all the function and class names (or equivalents in 11 languages). We use the tree-sitter library to do this. It builds out a codebase map that includes these symbols and the file tree.
Then, it fires off a request to Claude Haiku 3.5 to cache this codebase context so user inputs can be responded to with lower latency. (Prompt caching is OP!). We have a stateless server that passes messages along to Anthropic or OpenAI. We use websockets to ferry data back and forth to clients. We didn't have authentication or even a database for the first three months. Codebuff was free to install and used our API keys for all requests. Luckily, no one exploited us for too much free Claude usage haha. Major thanks to Brandon for saving this situation by building out our database (Postgres + Drizzle), server (Bun, hosted on Render, auth (using the free Auth.js), website (NextJS also hosted on Render), billing (Stripe), logging (BetterStack), and dashboard (Retool). This is the best tech stack I’ve ever had.
When the user sends an input message, we prompt Claude to pick files that would be relevant (step 1). After picking files, we load them into context and the agent responds. It invokes tools using xml tags that we parse. It literally writes out <edit_file path="src/app.ts">…</edit_file> to edit a particular file, and has other tags to run terminal commands, or to ask to read more files. This is all we really need, since Anthropic has already trained Claude with very similar tools reach state of the art on the SWE benchmark.
Codebuff has limited free usage, but if you like it you can pay $99/mo to get more credits. We realize this is a lot more than competitors, but that’s because we do more expensive LLM calls with more context.
We’re already seeing Codebuff used in surprising ways. One user racked up a $500 bill by building out two Flutter apps in parallel. He never even looked at the code it generated. Instead, he had long conversations with Codebuff to make progress and fix errors, until the apps were built to his satisfaction. Many users built real apps over a weekend for their teams and personal use.
Of course, those aren't the typical use cases. Users also frequently use Codebuff to write unit tests. They would build a feature in parallel with unit tests and have Codebuff do loops to fix up the code until the tests pass. They would also ask it to do drudge work like set up Oauth flows or API scaffolding.
What's really exciting with all of these examples is that we're seeing people's creativity becoming unbridled. They're spending more of their time thinking about architecture and design, instead of implementation details. It's so cool that we're just at the beginning, and the technology is only going to improve from here.
If you would want to use Codebuff inside your own systems, we have an alpha SDK that exposes the same natural language interface for your apps to call and receive code edits! You can sign up here for early access: https://codebuff.retool.com/form/c8b15919-52d0-4572-aca5-533....
Thank you for reading! We’re excited for you to try out Codebuff and let us know what you think!
The demos I see for these types of tools are always some toy project and doesn't reflect day to day work I do at all. Do you have any example PRs on larger more complex projects that have been written with codebuff and how much of that was human interactive?
The real problem I want someone to solve is helping me with the real niche/challenging portion of a PR, ex: new tiptap extension that can do notebook code eval, migrate legacy auth service off auth0, record and replay API GET requests and replay a % of them as unit tests, etc.
So many of these tools get stuck trying to help me "start" rather than help me "finish" or unblock the current problem I'm at.
I'm not paying $20 for my ssh keys and rest of the clipboard to be sent to multiple unknown 3rd parties, thanks, not for me.
Would however pay for actual software that I can just buy instead of rent to do the task of inline shell assitance, without making network calls behind my back that i'm not in complete perfectionist one hundred point zero zero per cent control of.
Sorry just my opinion in general with these types of products. If you don't have the skills to make a fully self contained language model type of product or something do this then you are not skilled enough team for me to trust with my work shell.
Noting Codebuff is manicode renamed.
It's become my go-to tool for handling fiddly refactors. Here’s an example session from a Rust project where I used it to break a single file into a module directory.
https://gist.github.com/cablehead/f235d61d3b646f2ec1794f656e...
Notice how it can run tests, see the compile error, and then iterate until the task is done? Really impressive.
For reference, this task used ~100 credits
Does this send code via your servers? If so, why? Nothing you've described couldn't be better implemented as a local service.
Could this tool get a command from the LLM which would result in file-loss? How would you prevent that?
Allowing LLMs to execute unrestricted commands without human review is risky and insecure.
Have an upvote! I've been trying it out, it's quite nice. What I like about this vs CoPilot and Cursor is that I feel like (especially with CoPilot) I'm always "racing" the editor. Also Cursor conflicts with some longstanding keybindings I have, vs this which is just the terminal. Having worked on a similar system before, I know it's difficult to implement some of these things, but I am concerned about security. For instance, how well does it handle sensitive files like dot.env or gitignored files. At some point an audit, given that you're closed source would go a long way.
Quality of code wise, is it worse or better than Cursor? I pay for Cursor now and it saves me a LOT of time to not copy files around. I actually still use the chatGPT/claude interfaces to code as well.
> I fine-tuned GPT-4o to turn Claude's sketch of changes into a git patch, which would add and remove lines to make the edits. I only finished generating the training data late at night, and the fine-tuning job ran as I slept
Could you say more about this? What was the entirety of your training data, exactly, and how did the sketch of changes and git patch play into that?
How does it work if I'm not adding features, but want to refactor my code bases? E.g., the OOD is poor, and I want to totally change it and split the codes into new files? Would it work properly as it requires extensive reads + create new files + writes ...
Manicode is really awesome, did some actual dev for live apps and it does work.
You must though, learn to code in a different way if you are not that disciplined. I had excellent results asking for small changes, step by step and committing often so I can undo and go back to a working version easily.
Net result was very positive, built two apps simultaneously (customer side and professional side).
I gave this a spin, this is the best iteration I've seen of a CLI agent, or just best agent period actually. Extremely impressed with how well it did making some modifications to my fairly complex 10,000 LOC codebase, with minimal instruction. Will gladly pay $99/mo when I run out of credits if it keeps up this level.
Very excited for codebuff, its been a huge productivity boost for me! I've been putting it to use on a monorepo that has Go, Typescript, terraform and some sql and it always looks at the right files for the task. I like the UX way better than cursor - I like reviewing all changes at once and making minor tweaks when necessary. Especially for writing Go, i love being able to stick with Goland IDE while using codebuff.
I've been using Codebuff (formerly manicode) for a few weeks. I think they have nailed the editing paradigm and I'm using it multiple times a day.
If you want to make a multi-file edit in cursor, you open composer, probably have to click to start a new composer session, type what you want, tell it which files it needs to include, watch it run through the change (seeing only an abbreviated version of the changes it makes), click apply all, then have to go and actually look at the real diff.
With codebuff, you open codebuff in terminal and just type what you want, and it will scan the whole directory to figure out which files to include. Then you can see the whole diff. It's way cleaner and faster for making large changes. Because it can run terminal commands, it's also really good at cleaning up after itself, e.g., removing files, renaming files, installing dependencies, etc.
Both tools need work in terms of reliability, but the workflow with Codebuff is 10x better.
What do you think about having codebuff write a parser for javascript? Something that is specifically built to enhance itself that goes beyond the regular parsers and creates a more useful structure of the codebase to be then used for RAG for code writing? This would be double useful as a great demo for your product as well as enhancing your product intrinsically. For example the new parser can not only build the syntax tree but also provide relevant commentary for each method to describe what it does to better pick code context.
"...in a Hail Mary attempt"
I'm curious how often others have experienced this. There have been so many times on many different projects where I've struggled with something hard and had the breakthrough only right before the deadline (self-imposed or actual deadline).
Congrats, sounds like an awesome project. I'll have to try it out.
Congrats on the launch guys! Tried the product early on and it’s clearly improved a ton. I’m still using Cursor every day mainly because of how complete the feature set is - autocomplete, command K, highlight a function and ask questions about it, and command L / command shift L. I am not sure what it’ll take for me to switch - maybe I’m not an ideal user somehow… I’m working in a relatively simple codebase with few collaborators?
I’m curious what exactly people say causes them to make the switch from Cursor to Codebuff? Or do people just use both?
What if you have a microservice system with a repo-per-service setup, where to add functionality to a FE site you would have to edit code in three or four specific repos (FE site repo + backend service repo + API-client npm package repo + API gateway repo) out of hundreds of total repos?
I just tried it out in the context of a small but messy side project. It did exactly what I asked for. The easy of use is a bliss. Impressive!
am I the only one who is scared of "it can run any command in your terminal"?
Why is there stuff for Manifold Markets in the distributed package?
/codebuff/dist/manifold-api.js
I've been using Codebuff for the last few weeks, and it's been really nice for working in my Elixir repo. And as someone who uses Neovim in the terminal instead of VS Code, it's nice to actually be able to have it live in the tmux split beside Neovim instead of having to switch to a different editor.
I have noticed some small oddities, like every now and then it will remove the existing contents of a module when adding a new function, but between a quick glance over the changes using the diff command and our standard CI suite, it's always pretty easy to catch and fix.
Congrats on the launch! I tried this on a migration project I'm working on (which involves a lot of rote refactoring) and it worked very well. I think you've nailed the ergonomics for terminal-based operations on the codebase.
I've been using Zed editor as my primary workhorse, and I can see codebuff as a helper CLI when I need to work. I'm not sure if a CLI-only interface outside my editor is the right UX for me to generate/edit code — but this is perfect for refactors.
I really like the vibes on this: the YouTube video is pretty good, there’s a little tongue-in-cheek humor but it’s good natured, and the transparency around how it came together at the last minute is a great story.
It’s a crowded space and I don’t know how it’ll play, but in a space that hasn’t always brought out the best in the community, this Launch HN is a winner in my book.
I hope it goes great. Congratulations on the launch.
Love the demo video! Three quick questions:
Any specific reason to choose the terminal as the interface? Do you plan to make it more extensible in the future? (sounds like this could be wrapped with an extension for any IDE, which is exciting)
Also, do you see it being a problem that you can't point it to specific lines of code? In Cursor you can select some lines and CMD+K to instruct an edit. This takes away that fidelity, is it because you suspect models will get good enough to not require that level of handholding?
Do you plan to benchmark this with swe-bench etc.?
been using cline extension in vscode (which can execute commands and look at the output on terminal) and it's an incredibly adept sysadmin, cloud architect and data engineer. I like that cline lets you approve/decline execution requests and you can run it without sending the output which is safer from a data perspective.
It's cool to have this natively on the remote system though. I think a safer approach would be to compile a small binary locally that is multi-platform, and which has the command plus the capture of output to relay back, and transmit that over ssh for execution (like how MGMT config management compiles golang to static binary and sends it over to the remote node vs having to have mgmt and all it's deps installed on every system it's managing).
Could be low lift vs having a package, all it's dependencies and credentials running on the target system.
I've seen similar projects, but they all rely on paid LLMs, and can't work with local models, even if the endpoint is changed... what are the possibilities for this project to be run locally?
whooooot! it's been a wild ride thus far, but we've been super thrilled at how people are using it and can't wait for you all to try it out!
we've seen our own productivity increase tenfold – using codebuff to build buff our own code hah
let us know what you think!
congrats on the laucnh! thays super cool/ but also wonder youre vision about number of calls / open source. tks!
Does anyone know of a “copilot” style autocomplete in the CLI? I don’t want it to run anything for me, just predict what command I might type next
Are there any plans to add a sandbox? This seems cool, but it seems susceptible to prompt injection attacks when for example asking questions about a not necessarily trusted open source codebase.
I've been playing with Codebuff for a few days (building out some services with Node.js + Typescript) - been working beautifully! Feels like I'm watching a skilled surgeon at work.
The product design is really thoughtful and thanks for sharing your story – Cannot wait to try this see you and see how you iterate on this!
How do you end up handling line numbers in patches? Counting has always been a sticking point for LLMs.
I don't see the value. Why is this better than Cursor? What guarantees that you won't steal my code?
Wasn't there a recent startup in F24 that stole code from another YC company and fire was quickly put out by everyone?
This is much needed! Gonna try this out. I haven't seen a good tool that lets me generate code via CLI.
The ergonomics of using unit tests + this to pass said unit tests is actually pretty good. Just tried it.
> One user racked up a $500 bill by building out two Flutter apps in parallel.
Is that through the Enterprise plan?
Really like the look of this interface. You're definitely onto something. Good work.
This looks so awesome! Congrats on your launch. Eager to use it!
Amazing stuff! The rebrand is great and it's cool to read the whole story!
brilliant - and thank you - so impressed with your work, i finally made an account to just comment - out of the box worked, a few minor glitches, but this is the start of awesome. keep doing what you are doing.
Does Codebuff / the tree sitter implementation support Svelte?
Extra context length looks valuable! Excited to try this out!
$99/month lol. I have Perplexity, OpenAI, Claude and Cursor subscription and I end up paying way less than $99/month. Clearly you haven't done any research on price. Aider, Cline are open source, in not sure why someone would subscribe to it unless it's the top model on http://swebench.com/
Congratulations on your launch! But I confess that I am really confused. This sounds exactly like Aider, but closed source and it's locked into a single LLM API? I just watched you use it, and looks a lot like Aider too? Why would I use this over Aider?
I've seen people say "you don't have to add files to Codebuff", but Aider tells me when the LLM has requested to see files. I just have to approve it. If that bothers you, it's open source, so you could probably just add a config to always add files when requested.
Aider can also run commands for you.
What am I missing?