Launch HN: Codebuff (YC F24) – CLI tool that writes code for you

285 points • by jahooma • 11/07/2024 • 239 comments • view on HN

Hey HN! We’re James and Brandon building Codebuff (https://codebuff.com). Codebuff is like Cursor Composer, but in your terminal: it modifies files based on your natural language requests. You can try it with `npm i -g codebuff` and start using it immediately for free. We have no login gate, and we give all accounts up to $20 worth of credits.

Codebuff is different because we simplified the input to one step: you type what you want done in your terminal and hit enter. Then Codebuff looks through your whole codebase and makes the edits it wants, to existing source files or new ones. It also can run your tests, the type checker, or install packages to fulfill your request.

Demo video: https://www.youtube.com/watch?v=dQ0NOMsu0dA

It all started at a hackathon. I was trying out Sonnet 3.5 which had recently come out and seeing if I could use it to write code. The script I cobbled together that day pulled codebase context in one step and used it to rewrite files with changes in the second step. This two step process still exists today. Incidentally, my hackathon script worked rather poorly and my demo failed to produce any useful code.

But that weekend I thought about the kind of errors it made, and realized that with more context on our codebase, it might have been able to get the change right. For example, it tried to create an endpoint on our server (at my previous startup), but it didn't know that you needed to edit 3 specific files to do this (yeah... our backend was not that clean). So I hand-wrote a guide to our codebase, like I was instructing a new hire. I put it in a markdown file and passed it into Sonnet 3.5's system prompt. And the crazy thing is that it started producing wayyy better code. So, I started getting excited. In fact, this code guide idea still exists in Codebuff today as knowledge.md files which are automatically read on every request.

I didn't think of this project as a startup idea at first. I thought it was just a simple script anyone could write. But after another week, I could see there were more problems to solve and it should be a product.

In the week between applying to YC and the interview, I could not get Codebuff to edit files consistently. I tried many prompting strategies to get it to replace strings in the original file, but nothing worked reliably. How could I face my interviewer if I could not get something basic like this to work? On the day before my interview, in a Hail Mary attempt, I fine-tuned GPT-4o to turn Claude's sketch of changes into a git patch, which would add and remove lines to make the edits. I only finished generating the training data late at night, and the fine-tuning job ran as I slept.

And, holy hell, the next morning it worked! I pushed it to production just in time for my YC interview with Dalton. Soon after, Brandon joined and we were off to the races.

So, how does Codebuff work exactly? You invoke it in your terminal, and it starts by running through the source files in that directory and subdirectories and parsing out all the function and class names (or equivalents in 11 languages). We use the tree-sitter library to do this. It builds out a codebase map that includes these symbols and the file tree.

Then, it fires off a request to Claude Haiku 3.5 to cache this codebase context so user inputs can be responded to with lower latency. (Prompt caching is OP!). We have a stateless server that passes messages along to Anthropic or OpenAI. We use websockets to ferry data back and forth to clients. We didn't have authentication or even a database for the first three months. Codebuff was free to install and used our API keys for all requests. Luckily, no one exploited us for too much free Claude usage haha. Major thanks to Brandon for saving this situation by building out our database (Postgres + Drizzle), server (Bun, hosted on Render, auth (using the free Auth.js), website (NextJS also hosted on Render), billing (Stripe), logging (BetterStack), and dashboard (Retool). This is the best tech stack I’ve ever had.

When the user sends an input message, we prompt Claude to pick files that would be relevant (step 1). After picking files, we load them into context and the agent responds. It invokes tools using xml tags that we parse. It literally writes out <edit_file path="src/app.ts">…</edit_file> to edit a particular file, and has other tags to run terminal commands, or to ask to read more files. This is all we really need, since Anthropic has already trained Claude with very similar tools reach state of the art on the SWE benchmark.

Codebuff has limited free usage, but if you like it you can pay $99/mo to get more credits. We realize this is a lot more than competitors, but that’s because we do more expensive LLM calls with more context.

We’re already seeing Codebuff used in surprising ways. One user racked up a $500 bill by building out two Flutter apps in parallel. He never even looked at the code it generated. Instead, he had long conversations with Codebuff to make progress and fix errors, until the apps were built to his satisfaction. Many users built real apps over a weekend for their teams and personal use.

Of course, those aren't the typical use cases. Users also frequently use Codebuff to write unit tests. They would build a feature in parallel with unit tests and have Codebuff do loops to fix up the code until the tests pass. They would also ask it to do drudge work like set up Oauth flows or API scaffolding.

What's really exciting with all of these examples is that we're seeing people's creativity becoming unbridled. They're spending more of their time thinking about architecture and design, instead of implementation details. It's so cool that we're just at the beginning, and the technology is only going to improve from here.

If you would want to use Codebuff inside your own systems, we have an alpha SDK that exposes the same natural language interface for your apps to call and receive code edits! You can sign up here for early access: https://codebuff.retool.com/form/c8b15919-52d0-4572-aca5-533....

Thank you for reading! We’re excited for you to try out Codebuff and let us know what you think!

Comments

draebek • 11/07/2024

Congratulations on your launch! But I confess that I am really confused. This sounds exactly like Aider, but closed source and it's locked into a single LLM API? I just watched you use it, and looks a lot like Aider too? Why would I use this over Aider?

I've seen people say "you don't have to add files to Codebuff", but Aider tells me when the LLM has requested to see files. I just have to approve it. If that bothers you, it's open source, so you could probably just add a config to always add files when requested.

Aider can also run commands for you.

What am I missing?

➕ show 4 replies

haxton • 11/07/2024

The demos I see for these types of tools are always some toy project and doesn't reflect day to day work I do at all. Do you have any example PRs on larger more complex projects that have been written with codebuff and how much of that was human interactive?

The real problem I want someone to solve is helping me with the real niche/challenging portion of a PR, ex: new tiptap extension that can do notebook code eval, migrate legacy auth service off auth0, record and replay API GET requests and replay a % of them as unit tests, etc.

So many of these tools get stuck trying to help me "start" rather than help me "finish" or unblock the current problem I'm at.

➕ show 10 replies

nisten • 11/07/2024

I'm not paying $20 for my ssh keys and rest of the clipboard to be sent to multiple unknown 3rd parties, thanks, not for me.

Would however pay for actual software that I can just buy instead of rent to do the task of inline shell assitance, without making network calls behind my back that i'm not in complete perfectionist one hundred point zero zero per cent control of.

Sorry just my opinion in general with these types of products. If you don't have the skills to make a fully self contained language model type of product or something do this then you are not skilled enough team for me to trust with my work shell.

➕ show 1 reply

ndyg • 11/07/2024

Noting Codebuff is manicode renamed.

It's become my go-to tool for handling fiddly refactors. Here’s an example session from a Rust project where I used it to break a single file into a module directory.

https://gist.github.com/cablehead/f235d61d3b646f2ec1794f656e...

Notice how it can run tests, see the compile error, and then iterate until the task is done? Really impressive.

For reference, this task used ~100 credits

➕ show 1 reply

papa-vova • 11/19/2024

Speaking of naming: did you guys see this one? https://arxiv.org/abs/1606.08866

And this: https://github.com/antlr/codebuff

boratanrikulu • 11/07/2024

Allowing LLMs to execute unrestricted commands without human review is risky and insecure.

➕ show 3 replies

iandanforth • 11/07/2024

Does this send code via your servers? If so, why? Nothing you've described couldn't be better implemented as a local service.

Could this tool get a command from the LLM which would result in file-loss? How would you prevent that?

➕ show 1 reply

v3ss0n • 11/07/2024

We already have AIDE, Continue, Cody , Aider, Cursor.. Why this?

➕ show 6 replies

toisanji • 11/07/2024

Quality of code wise, is it worse or better than Cursor? I pay for Cursor now and it saves me a LOT of time to not copy files around. I actually still use the chatGPT/claude interfaces to code as well.

➕ show 1 reply

fragmede • 11/07/2024

Sounds pretty interesting, I was thinking that would be the way to work past limited context window sizes automatically.

> Codebuff has limited free usage, but if you like it you can pay $99/mo to get more credits...

> One user racked up a $500 bill...

Those two statements are kind of confusing together. Past the free tier, what does $99/month get you? It sounds like there's some sort of credit, but that's not discussed at all here. How much did this customer do to get to that kind of bill? I get that they built a flutter app, but did it take a hour to run up a $500 bill? 6 hours? a whole weekend? Is there a way to set a limit?

The ability to rack up an unreasonable bill by accident, even just conceptually, is a non-starter for many. This is interactive so it's not as bad as accidentally leaving a GPU EC2 instance on overnight, but I'll note that Aider shows per query and session costs.

➕ show 1 reply

cellis • 11/08/2024

Have an upvote! I've been trying it out, it's quite nice. What I like about this vs CoPilot and Cursor is that I feel like (especially with CoPilot) I'm always "racing" the editor. Also Cursor conflicts with some longstanding keybindings I have, vs this which is just the terminal. Having worked on a similar system before, I know it's difficult to implement some of these things, but I am concerned about security. For instance, how well does it handle sensitive files like dot.env or gitignored files. At some point an audit, given that you're closed source would go a long way.

➕ show 1 reply

shardool97 • 11/07/2024

Very excited for codebuff, its been a huge productivity boost for me! I've been putting it to use on a monorepo that has Go, Typescript, terraform and some sql and it always looks at the right files for the task. I like the UX way better than cursor - I like reviewing all changes at once and making minor tweaks when necessary. Especially for writing Go, i love being able to stick with Goland IDE while using codebuff.

➕ show 1 reply

Finbarr • 11/07/2024

I've been using Codebuff (formerly manicode) for a few weeks. I think they have nailed the editing paradigm and I'm using it multiple times a day.

If you want to make a multi-file edit in cursor, you open composer, probably have to click to start a new composer session, type what you want, tell it which files it needs to include, watch it run through the change (seeing only an abbreviated version of the changes it makes), click apply all, then have to go and actually look at the real diff.

With codebuff, you open codebuff in terminal and just type what you want, and it will scan the whole directory to figure out which files to include. Then you can see the whole diff. It's way cleaner and faster for making large changes. Because it can run terminal commands, it's also really good at cleaning up after itself, e.g., removing files, renaming files, installing dependencies, etc.

Both tools need work in terms of reliability, but the workflow with Codebuff is 10x better.

➕ show 1 reply

marcusbuffett • 11/08/2024

I gave this a spin, this is the best iteration I've seen of a CLI agent, or just best agent period actually. Extremely impressed with how well it did making some modifications to my fairly complex 10,000 LOC codebase, with minimal instruction. Will gladly pay $99/mo when I run out of credits if it keeps up this level.

evntdrvn • 11/07/2024

What if you have a microservice system with a repo-per-service setup, where to add functionality to a FE site you would have to edit code in three or four specific repos (FE site repo + backend service repo + API-client npm package repo + API gateway repo) out of hundreds of total repos?

➕ show 2 replies

hiatus • 11/07/2024

Why is there stuff for Manifold Markets in the distributed package?

/codebuff/dist/manifold-api.js

https://www.npmjs.com/package/codebuff?activeTab=code

➕ show 3 replies

israrkhan • 11/07/2024

am I the only one who is scared of "it can run any command in your terminal"?

➕ show 1 reply

hubris24 • 11/08/2024

I don't see the value. Why is this better than Cursor? What guarantees that you won't steal my code?

Wasn't there a recent startup in F24 that stole code from another YC company and fire was quickly put out by everyone?

➕ show 2 replies

dnsbty • 11/07/2024

I've been using Codebuff for the last few weeks, and it's been really nice for working in my Elixir repo. And as someone who uses Neovim in the terminal instead of VS Code, it's nice to actually be able to have it live in the tmux split beside Neovim instead of having to switch to a different editor.

I have noticed some small oddities, like every now and then it will remove the existing contents of a module when adding a new function, but between a quick glance over the changes using the diff command and our standard CI suite, it's always pretty easy to catch and fix.

➕ show 1 reply

imranhou • 11/08/2024

What do you think about having codebuff write a parser for javascript? Something that is specifically built to enhance itself that goes beyond the regular parsers and creates a more useful structure of the codebase to be then used for RAG for code writing? This would be double useful as a great demo for your product as well as enhancing your product intrinsically. For example the new parser can not only build the syntax tree but also provide relevant commentary for each method to describe what it does to better pick code context.

darweenist • 11/08/2024

Congrats on the launch guys! Tried the product early on and it’s clearly improved a ton. I’m still using Cursor every day mainly because of how complete the feature set is - autocomplete, command K, highlight a function and ask questions about it, and command L / command shift L. I am not sure what it’ll take for me to switch - maybe I’m not an ideal user somehow… I’m working in a relatively simple codebase with few collaborators?

I’m curious what exactly people say causes them to make the switch from Cursor to Codebuff? Or do people just use both?

➕ show 1 reply

dakshgupta • 11/07/2024

Love the demo video! Three quick questions:

Any specific reason to choose the terminal as the interface? Do you plan to make it more extensible in the future? (sounds like this could be wrapped with an extension for any IDE, which is exciting)

Also, do you see it being a problem that you can't point it to specific lines of code? In Cursor you can select some lines and CMD+K to instruct an edit. This takes away that fidelity, is it because you suspect models will get good enough to not require that level of handholding?

Do you plan to benchmark this with swe-bench etc.?

➕ show 1 reply

marvin-hansen • 11/09/2024

How is this different from Qodo? Why isn’t it mentioned as a competitor?

I’ve hard time figuring out what codebuff brings to the table that hasn’t been done before other than being YC backed. I think to win in this massively competitive and fast moving market, you really have to put forward something significantly better than an expensive cobbled together script replicating OSS solutions…

I know this sounds harsh, but believe me, differentiation makes or breaks you sooner than later. Proper differentiation doesn’t have to be hard, it just needs to answer the question what you offer that I can’t get anywhere else at a similar price point. Right now, your offer is more expensive for basically something I get elsewhere better for 1/5 the price… I’m seriously worried whether your venture will be around in one or two years from now without a more convincing value prop.

From my experience of leaning more into full end to end Ai workflows building Rust, it seems that

1) context has clearly won over RAG. There is no way back.

2) workflow is the next obvious evolution and gets you an extra mile

3) adversial GAN training seems a path forward to get from just okay generated code to something close to a home run on the first try

4) generating a style guide based on the entire code base and feeding that style guide together with the task and context into the LLM is your ticket to enterprise customers because no matter how good your stuff might be , if the generated code doesn’t fit the mold you are not part of the conversation. Conversely, if you deliver code in the same style and formatting and it actually works, well, price doesn’t matter much.

5) in terms of marketing to developers, I suggest starting listening to their pain points working with existing Ai tools. I don’t have one single of the problems you try to solve. Im sitting over a massive Rust monorepo and I’ve seen virtually every existing Ai coding assistant failing one way or another. The one I have now works miracles half the time and only fails the other half. That is already a massive improvement compared to everything else I tried over the past four years.

Point is, there is a massive need for coding assistance on complex systems and for CodeBuff to make a dime of a difference, you have to differentiate from what’s out there by starting with the challenges engineers face today.

➕ show 1 reply

sanketsaurav • 11/07/2024

Congrats on the launch! I tried this on a migration project I'm working on (which involves a lot of rote refactoring) and it worked very well. I think you've nailed the ergonomics for terminal-based operations on the codebase.

I've been using Zed editor as my primary workhorse, and I can see codebuff as a helper CLI when I need to work. I'm not sure if a CLI-only interface outside my editor is the right UX for me to generate/edit code — but this is perfect for refactors.

➕ show 1 reply

benreesman • 11/08/2024

I really like the vibes on this: the YouTube video is pretty good, there’s a little tongue-in-cheek humor but it’s good natured, and the transparency around how it came together at the last minute is a great story.

It’s a crowded space and I don’t know how it’ll play, but in a space that hasn’t always brought out the best in the community, this Launch HN is a winner in my book.

I hope it goes great. Congratulations on the launch.

➕ show 2 replies

wouterjanl • 11/08/2024

I just tried it out in the context of a small but messy side project. It did exactly what I asked for. The easy of use is a bliss. Impressive!

➕ show 1 reply

abossy • 11/08/2024

> I fine-tuned GPT-4o to turn Claude's sketch of changes into a git patch, which would add and remove lines to make the edits. I only finished generating the training data late at night, and the fine-tuning job ran as I slept

Could you say more about this? What was the entirety of your training data, exactly, and how did the sketch of changes and git patch play into that?

➕ show 1 reply

eraad • 11/08/2024

Manicode is really awesome, did some actual dev for live apps and it does work.

You must though, learn to code in a different way if you are not that disciplined. I had excellent results asking for small changes, step by step and committing often so I can undo and go back to a working version easily.

Net result was very positive, built two apps simultaneously (customer side and professional side).

froggy • 11/08/2024

"...in a Hail Mary attempt"

I'm curious how often others have experienced this. There have been so many times on many different projects where I've struggled with something hard and had the breakthrough only right before the deadline (self-imposed or actual deadline).

Congrats, sounds like an awesome project. I'll have to try it out.

tgtweak • 11/07/2024

been using cline extension in vscode (which can execute commands and look at the output on terminal) and it's an incredibly adept sysadmin, cloud architect and data engineer. I like that cline lets you approve/decline execution requests and you can run it without sending the output which is safer from a data perspective.

It's cool to have this natively on the remote system though. I think a safer approach would be to compile a small binary locally that is multi-platform, and which has the command plus the capture of output to relay back, and transmit that over ssh for execution (like how MGMT config management compiles golang to static binary and sends it over to the remote node vs having to have mgmt and all it's deps installed on every system it's managing).

Could be low lift vs having a package, all it's dependencies and credentials running on the target system.

➕ show 2 replies

ilrwbwrkhv • 11/07/2024

It couldn't write a simple test for my typescript node system. Kept telling me credits left, login. I don't know who gets success from these tools and what they are building but none of them actually work for me. Yesterday there was Aide which I tried and found to be broken and so is this one.

➕ show 1 reply

zh2408 • 11/08/2024

How does it work if I'm not adding features, but want to refactor my code bases? E.g., the OOD is poor, and I want to totally change it and split the codes into new files? Would it work properly as it requires extensive reads + create new files + writes ...

anonzzzies • 11/07/2024

Comparison with Aider?

➕ show 1 reply

brandonchen • 11/07/2024

whooooot! it's been a wild ride thus far, but we've been super thrilled at how people are using it and can't wait for you all to try it out!

we've seen our own productivity increase tenfold – using codebuff to build buff our own code hah

let us know what you think!

la64710 • 11/08/2024

https://www.codebuff.com/

The demo right there is worth $5 of software development ( in offshored upwork cost) . Imagine when this can be done at scale for huge existing codebase.

nubinetwork • 11/08/2024

I've seen similar projects, but they all rely on paid LLMs, and can't work with local models, even if the endpoint is changed... what are the possibilities for this project to be run locally?

kfajdsl • 11/07/2024

Are there any plans to add a sandbox? This seems cool, but it seems susceptible to prompt injection attacks when for example asking questions about a not necessarily trusted open source codebase.

➕ show 1 reply

nseth • 11/07/2024

I've been playing with Codebuff for a few days (building out some services with Node.js + Typescript) - been working beautifully! Feels like I'm watching a skilled surgeon at work.

➕ show 1 reply

jerpint • 11/08/2024

Does anyone know of a “copilot” style autocomplete in the CLI? I don’t want it to run anything for me, just predict what command I might type next

➕ show 2 replies

CHERHU • 11/07/2024

The product design is really thoughtful and thanks for sharing your story – Cannot wait to try this see you and see how you iterate on this!

loondri • 11/07/2024

This is much needed! Gonna try this out. I haven't seen a good tool that lets me generate code via CLI.

jc4883 • 11/07/2024

The ergonomics of using unit tests + this to pass said unit tests is actually pretty good. Just tried it.

carom • 11/07/2024

How do you end up handling line numbers in patches? Counting has always been a sticking point for LLMs.

➕ show 1 reply

handfuloflight • 11/07/2024

> One user racked up a $500 bill by building out two Flutter apps in parallel.

Is that through the Enterprise plan?

➕ show 1 reply

iimaginary • 11/07/2024

Really like the look of this interface. You're definitely onto something. Good work.

mitch7w • 11/07/2024

Amazing stuff! The rebrand is great and it's cool to read the whole story!

maldous • 11/07/2024

brilliant - and thank you - so impressed with your work, i finally made an account to just comment - out of the box worked, a few minor glitches, but this is the start of awesome. keep doing what you are doing.

➕ show 1 reply

alt Hacker News

Launch HN: Codebuff (YC F24) – CLI tool that writes code for you

Comments

🔗 View 14 more comments