This doesn't make too much sense to me. * This isn't a language, it's some tooling ...

the_duke • today at 3:20 PM • 12 replies • view on HN

This doesn't make too much sense to me.

* This isn't a language, it's some tooling to map specs to code and re-generate

* Models aren't deterministic - every time you would try to re-apply you'd likely get different output (without feeding the current code into the re-apply and let it just recommend changes)

* Models are evolving rapidly, this months flavour of Codex/Sonnet/etc would very likely generate different code from last months

* Text specifications are always under-specified, lossy and tend to gloss over a huge amount of details that the code has to make concrete - this is fine in a small example, but in a larger code base?

* Every non-trivial codebase would be made up of of hundreds of specs that interact and influence each other - very hard (and context - heavy) to read all specs that impact functionality and keep it coherent

I do think there are opportunities in this space, but what I'd like to see is:

* write text specifications

* model transforms text into a *formal* specification

* then the formal spec is translated into code which can be verified against the spec

2 and three could be merged into one if there were practical/popular languages that also support verification, in the vain of ADA/Spark.

But you can also get there by generating tests from the formal specification that validate the implementation.

Replies

onion2k • today at 3:51 PM

Models aren't deterministic - every time you would try to re-apply you'd likely get different output (without feeding the current code into the re-apply and let it just recommend changes)

If the result is always provably correct it doesn't matter whether or not it's different at the code level. People interested in systems like this believe that the outcome of what the code does is infinity more important than the code itself.

➕ show 9 replies

pron • today at 5:22 PM

If what you're after is determinism, then your solution doesn't offer it. Both the formal specification and the code generated from it would be different each time. Formal specifications are useful when they're succinct, which is possible when they specify at a higher level of abstraction than code, which admits many different implemementations.

➕ show 1 reply

wenc • today at 6:53 PM

Rehashing my comment from before:

I use Kiro IDE (≠ Kiro CLI) primarily as a spec generator. In my experience, it's high-quality for creating and iterating on specs. Tools like Cursor are optimized for human-driven vibing -- they have great autocomplete, etc. Kiro, by contrast, is optimized around spec, which ironically has been the most effective approach I've found for driving agents.

I'd argue that Cursor, Antigravity, and similar tools are optimized for human steering, which explains their popularity, while Kiro is optimized for agent harnesses. That's also why it’s underused: it's quite opinionated, but very effective. Vibe-coding culture isn't sold on spec driven development (they think it's waterfall and summarily dismiss it -- even Yegge has this bias), so people tend to underrate it.

Kiro writes specs using structured formats like EARS and INCOSE (which is the spc format used in places like Boeing for engineering reqs). It performs automated reasoning to check for consistency, then generates a design document and task list from the spec -- similar to what Beads does. I usually spend a significant amount of time pressure-testing the spec before implementing (often hours to days), and it pays off. Writing a good, consistent spec is essentially the computer equivalent of "writing as a tool of thought" in practice.

Once the spec is tight, implementation tends to follow it closely. Kiro also generates property-based tests (PBTs) using Hypothesis in Python, inspired by Haskell's QuickCheck. These tests sweep the input domain and, when combined with traditional scenario-based unit tests, tend to produce code that adheres closely to the spec. I also add a small instruction "do red/green TDD" (I learned this from Simon Willison) and that one line alone improved the quality of all my tests. Kiro can technically implement the task list itself, but this is where agents come in. With the spec in hand, I use multiple headless CLI agents in tmux (e.g., Kiro CLI, Claude Code) for implementation. The results have been very good. With a solid Kiro spec and task list, agents usually implement everything end-to-end without stopping -- I haven’t found a need for Ralph loops. (agents sometimes tend to stop mid way on Claude plans, but I've never had that happen with Kiro, not sure why, maybe it's the checklist, which includes PBT tests as gates).

didn't have the strongest start, but the Kiro IDE is one of the best spec generators I've used, and it integrates extremely well with agent-driven workflows.

DrJokepu • today at 4:21 PM

> Models aren't deterministic

Is that really true? I haven’t tried to do my own inference since the first Llama models came out years ago, but I am pretty sure it was deterministic: if you fixed the seed and the input was the same, the output of the inference was always exactly the same.

➕ show 1 reply

davedx • today at 3:34 PM

My process has organically evolved towards something similar but less strictly defined:

- I bootstrap AGENTS.md with my basic way of working and occasionally one or two project specific pieces

- I then write a DESIGN.md. How detailed or well specified it is varies from project to project: the other day I wrote a very complete DESIGN.md for a time tracking, invoice management and accounting system I wanted for my freelance biz. Because it was quite complete, the agent almost one-shot the whole thing

- I often also write a TECHNICAL-SPEC.md of some kind. Again how detailed varies.

- Finally I link to those two from the AGENTS. I also usually put in AGENTS that the agent should maintain the docs and keep them in sync with newer decisions I make along the way.

This system works well for me, but it's still very ad hoc and definitely doesn't follow any kind of formally defined spec standard. And I don't think it should, really? IMO, technically strict specs should be in your automated tests not your design docs.

➕ show 3 replies

jnpnj • today at 6:19 PM

Maybe we're entering the non-deterministic applications too. No more mechanical predictable thing.. more like 90% regular and then weird.

Slightly sarcastic but not sure this couldn't become a thing.

rco8786 • today at 5:07 PM

How is your 2 step process not susceptible to all the exact same pitfalls you listed above?

dist-epoch • today at 4:41 PM

> Models aren't deterministic - every time you would try to re-apply you'd likely get different output

So like when you give the same spec to 2 different programmers.

➕ show 3 replies

pessimizer • today at 3:48 PM

I think your objections miss the point. My informal specs to a program are user-focused. I want to dictate what benefits the program will give to the person who is using it, which may include requirements for a transport layer, a philosophy of user interaction, or any number of things. When I know what I want out of a program, I go through the agony of translating that into a spec with database schemas, menu options, specific encryption schemes, etc., then finally I turn that into a formal spec within which whether I use an underscore or a dash somewhere becomes a thing that has to be consistent throughout the document.

You're telling me that I should be doing the agonizing parts in order for the LLM to do the routine part (transforming a description of a program into a formal description of a program.) Your list of things that "make no sense" are exactly the things that I want the LLMs to do. I want to be able to run the same spec again and see the LLM add a feature that I never expected (and wasn't in the last version run from the same spec) or modify tactics to accomplish user goals based on changes in technology or availability of new standards/vendors.

I want to see specs that move away from describing the specific functionality of programs altogether, and more into describing a usefulness or the convenience of a program that doesn't exist. I want to be able to feed the LLM requirements of what I want a program to be able to accomplish, and let the LLM research and implement the how. I only want to have to describe constraints i.e. it must enable me to be able to do A, B, and C, it must prevent X,Y, and Z; I want it to feel free to solve those constraints in the way it sees fit; and when I find myself unsatisfied with the output, I'll deliver it more constraints and ask it to regenerate.

➕ show 3 replies

fnord77 • today at 4:17 PM

[delete]

➕ show 2 replies

hkonte • today at 3:50 PM

[dead]

alt Hacker News

Replies