logoalt Hacker News

Launch HN: Codebuff (YC F24) – CLI tool that writes code for you

285 pointsby jahooma11/07/2024239 commentsview on HN

Hey HN! We’re James and Brandon building Codebuff (https://codebuff.com). Codebuff is like Cursor Composer, but in your terminal: it modifies files based on your natural language requests. You can try it with `npm i -g codebuff` and start using it immediately for free. We have no login gate, and we give all accounts up to $20 worth of credits.

Codebuff is different because we simplified the input to one step: you type what you want done in your terminal and hit enter. Then Codebuff looks through your whole codebase and makes the edits it wants, to existing source files or new ones. It also can run your tests, the type checker, or install packages to fulfill your request.

Demo video: https://www.youtube.com/watch?v=dQ0NOMsu0dA

It all started at a hackathon. I was trying out Sonnet 3.5 which had recently come out and seeing if I could use it to write code. The script I cobbled together that day pulled codebase context in one step and used it to rewrite files with changes in the second step. This two step process still exists today. Incidentally, my hackathon script worked rather poorly and my demo failed to produce any useful code.

But that weekend I thought about the kind of errors it made, and realized that with more context on our codebase, it might have been able to get the change right. For example, it tried to create an endpoint on our server (at my previous startup), but it didn't know that you needed to edit 3 specific files to do this (yeah... our backend was not that clean). So I hand-wrote a guide to our codebase, like I was instructing a new hire. I put it in a markdown file and passed it into Sonnet 3.5's system prompt. And the crazy thing is that it started producing wayyy better code. So, I started getting excited. In fact, this code guide idea still exists in Codebuff today as knowledge.md files which are automatically read on every request.

I didn't think of this project as a startup idea at first. I thought it was just a simple script anyone could write. But after another week, I could see there were more problems to solve and it should be a product.

In the week between applying to YC and the interview, I could not get Codebuff to edit files consistently. I tried many prompting strategies to get it to replace strings in the original file, but nothing worked reliably. How could I face my interviewer if I could not get something basic like this to work? On the day before my interview, in a Hail Mary attempt, I fine-tuned GPT-4o to turn Claude's sketch of changes into a git patch, which would add and remove lines to make the edits. I only finished generating the training data late at night, and the fine-tuning job ran as I slept.

And, holy hell, the next morning it worked! I pushed it to production just in time for my YC interview with Dalton. Soon after, Brandon joined and we were off to the races.

So, how does Codebuff work exactly? You invoke it in your terminal, and it starts by running through the source files in that directory and subdirectories and parsing out all the function and class names (or equivalents in 11 languages). We use the tree-sitter library to do this. It builds out a codebase map that includes these symbols and the file tree.

Then, it fires off a request to Claude Haiku 3.5 to cache this codebase context so user inputs can be responded to with lower latency. (Prompt caching is OP!). We have a stateless server that passes messages along to Anthropic or OpenAI. We use websockets to ferry data back and forth to clients. We didn't have authentication or even a database for the first three months. Codebuff was free to install and used our API keys for all requests. Luckily, no one exploited us for too much free Claude usage haha. Major thanks to Brandon for saving this situation by building out our database (Postgres + Drizzle), server (Bun, hosted on Render, auth (using the free Auth.js), website (NextJS also hosted on Render), billing (Stripe), logging (BetterStack), and dashboard (Retool). This is the best tech stack I’ve ever had.

When the user sends an input message, we prompt Claude to pick files that would be relevant (step 1). After picking files, we load them into context and the agent responds. It invokes tools using xml tags that we parse. It literally writes out <edit_file path="src/app.ts">…</edit_file> to edit a particular file, and has other tags to run terminal commands, or to ask to read more files. This is all we really need, since Anthropic has already trained Claude with very similar tools reach state of the art on the SWE benchmark.

Codebuff has limited free usage, but if you like it you can pay $99/mo to get more credits. We realize this is a lot more than competitors, but that’s because we do more expensive LLM calls with more context.

We’re already seeing Codebuff used in surprising ways. One user racked up a $500 bill by building out two Flutter apps in parallel. He never even looked at the code it generated. Instead, he had long conversations with Codebuff to make progress and fix errors, until the apps were built to his satisfaction. Many users built real apps over a weekend for their teams and personal use.

Of course, those aren't the typical use cases. Users also frequently use Codebuff to write unit tests. They would build a feature in parallel with unit tests and have Codebuff do loops to fix up the code until the tests pass. They would also ask it to do drudge work like set up Oauth flows or API scaffolding.

What's really exciting with all of these examples is that we're seeing people's creativity becoming unbridled. They're spending more of their time thinking about architecture and design, instead of implementation details. It's so cool that we're just at the beginning, and the technology is only going to improve from here.

If you would want to use Codebuff inside your own systems, we have an alpha SDK that exposes the same natural language interface for your apps to call and receive code edits! You can sign up here for early access: https://codebuff.retool.com/form/c8b15919-52d0-4572-aca5-533....

Thank you for reading! We’re excited for you to try out Codebuff and let us know what you think!


Comments

gmaster144011/07/2024

Does Codebuff / the tree sitter implementation support Svelte?

show 1 reply
a2ronprice1211/08/2024

This looks so awesome! Congrats on your launch. Eager to use it!

ali-netproject11/07/2024

Extra context length looks valuable! Excited to try this out!

RoxaneFischer111/08/2024

congrats on the laucnh! thays super cool/ but also wonder youre vision about number of calls / open source. tks!

ishbaid11/07/2024

Looks awesome! Great work team.

iLoveOncall11/07/2024

Your website has a serious issue. Trying to play the YouTube video makes the page slow down to a crawl, even in 1080p, while playing it on YouTube directly has no issue, even in 4K.

On the project itself, I don't really find it exciting at all, I'm sorry. It's just another wrapper for a 3rd party model, and the fact that you can 1) describe the entire workflow in 3 paragraphs, and 2) built it and launched it in around 4 months, emphasizes that.

Congrats on launch I guess.

show 3 replies
sagarpatil11/08/2024

$99/month lol. I have Perplexity, OpenAI, Claude and Cursor subscription and I end up paying way less than $99/month. Clearly you haven't done any research on price. Aider, Cline are open source, in not sure why someone would subscribe to it unless it's the top model on http://swebench.com/

show 1 reply
captaincrunch11/07/2024

Couldn't get through the video, your keyboard sounds are very annoying.

show 1 reply
palebone11/07/2024

I'm a big fan! It's better than cursor in many ways

show 2 replies
tsarchitect11/07/2024

[dead]

rileycrane11/07/2024

[dead]

OhNoNotAgain_9911/07/2024

[dead]

recepxx3411/08/2024

[flagged]

brandonchen11/08/2024

brace yourself.

the night critics are coming.

show 1 reply