A Brief History of Ralph

46 points • by dhorthy • today at 6:00 PM • 23 comments • view on HN

Comments

I've really jumped into this since I watched Geoffrey's videos last week. I ended up creating my own version of this, and have been throwing it small projects so far.

I created a small claude skill, that helps create the "specs" for a new/existing project, it adds a /specs folder with a README, that acts as a lookup for topics/features about the app, technical approach and feature set. Once we've chatted it spawns off subagents to do research and present those findings in the specific spec. In terms of improvements there, I'd almost like a more opinionated back and forth between "pm type" agents, to help test ideas and implementation ideas.

I've got the planning and build loop setup in the claude devcontainer, which is somewhat fragile at the moment, but works for now.

In terms of chewing up context, I've noticed that depending on the size of the project the "IMPLEMENTATION_PLAN.md" can get pretty massive. If each agent run needs to parse that plan to figure out what to do next it feels like a lot of wasted parsing. I'm working on changing that implementation plan to be more granular so there is less to parse when figuring out what to do next.

Overall, it's been fun and has kept me really engaged the past week.

jes5199 • today at 7:46 PM

I forked the anthropic Ralph Wiggum plugin: https://github.com/jes5199/chief-wiggum

there’s some debate about whether this is in the spirit of the _original_ Ralph, because it keeps too much context history around. But in practice Claude Code compactions are so low-quality that it’s basically the same as clearing the history every few turns

I’ve had good luck giving it goals like “keep working until the integration test passes on GitHub CI” - that was my longest run, actually, it ran unattended for 24 hours before solving the bug

Juvination • today at 6:25 PM

I've been working with the Ralphosophy? for iterative behavior in my workflow and it seems pretty promising for cutting out a few manual steps.

I still have a manual part which is breaking the design document down into multiple small gh issues after a review but I think that is fine for now.

Using codex exec, we start working on a github issue with a supplied design document, creating a PR on completion. Then we perform a review using a review skill madeup which is effectively just a "cite your sources" skill on the review along with Open Questions.

Then we iterate through open questions doing a minimum of 3 reviews (somewhat arbitrary but sometimes multiple reviews catch things). Then finally I have I have a step in for checking Sonarcloud, fixing them and pushing the changes. Realistically this step should be broken out into multiple iterations to avoid large context rot.

What I miss the most is output, seeing whats going on in either Codex or Claude in real time. I can output the last response but it just gets messy until I make something a bit more formal.

fallinditch • today at 9:27 PM

Has anyone used this technique with other LLMs that are good at coding but not so expensive: for example Qwen 3 Coder?

➕ show 1 reply

skybrian • today at 6:22 PM

There's a lot of irrelevant detail, but the article never actually explains what "Ralph" does or how it works.

➕ show 2 replies

ossa-ma • today at 6:15 PM

So it took the author 6 months and several 1-to-1s with the creator to get value from this. As in he literally spent more time promoting it than he did using it.

And it all ends with the grift of all grifts: promoting a crypto token in a nonchalant 'hey whats this??!!??' way...

➕ show 1 reply

articulatepang • today at 7:00 PM

This is so poorly written. What is "Ralph"? What is its purpose? How does it work? A single sentence at the top would help. The writer imagines that the reader cares enough to have followed their entire journey, or to decode this enormously distended pile of words.

More generally, I've noticed that people who spend a lot of time interacting with LLMs sometimes develop a distinct brain-fried tone when they write or talk.

➕ show 3 replies

f311a • today at 6:14 PM

Just look at the code quality produced by these loops. That's all you need to know about it.

It's complete garbage, and since it runs in a loop, the amount of garbage multiplies over time.

➕ show 3 replies

alt Hacker News

A Brief History of Ralph

Comments