I'm a bit annoyed by the feeling that we're kind of stuck when it comes to using LLMs for programming.
I use Claude Code and Codex, but I haven't been able to enter flow state like I can when I hand write code.
This is kind of ironic to me since AI should be a bicycle for the mind, but right now it feels like a bicycle that just brakes abruptly every couple minutes. I stop, wait, review, prompt again.
Is there anyone exploring something fundamentally different than the prompt response loop we have today?
I actually think the idea of a tab model is directionally better than prompt response.
Would love to hear about any startups, personal experiments, etc.
The tab model was a lot of fun, you felt like you're getting a speed boost while coding. I think vibe coding (or agentic engineering) is a different paradigm altogether.
I have tried out some of the popular tools and I'm using opencode on desktop and I use pi via termux on android for when I'm on the go. I think the current direction of PRD -> review -> execute -> debug is in many cases the right mindset.
Working with a team of fresh graduates, I see that working with any vibe coding tool is like being a manager, not a developer. I think that's what you miss, you miss being a developer but the vibe coding tools make you a manager which isn't something that you might enjoy.
Nonetheless, I do think that there are some interesting things to do with pi. I'm just getting started, if anyone has an interesting workflow in pi, I would be interested in trying it out!
Just yesterday I tried to find an annoying and persistent bug in the cummunication between a Lyrion Media Server and my player. I used Opencode's native Big Pickle AI, and first it was a pain in the back, because it gave me a new code, I had to start the player and test the control in the server's web GUI, report the errors back, and so forth, and it tried a lot, but never found the real cause.
Then I got tired, and told it to use PlayWright to control the browser and test by itself. After some hangs, that I had to stop manually, it did all by itself, and finally fixed the bug. I had to increase the agents' steps setting in the config, but that was it. While it was fixing the bug, I surfed the web, and kept an eye on it, but it did everything on it's own. impressive.
I have also noticed that, waiting for an LLM answer makes my mind wander to completely unrelated topics.
What I've found useful is to create a tasks.md file where each bullet point / task is one implementation. Bullet points that belong together and can be done in the same chat session are grouped together.
I easily enter a flow state during writing these detailed implementation plans. Then I can also start multiple chat sessions for parts that don't interfere with each other, while I'm waiting for an LLM answer for one part I can get started on the next or start reviewing one of the previous answers.
I have also explored more complex, e.g. using Kanban board for tasks, but I found great value in these simple yet effective setups.
Well, it always depends on your environment. In my case, nothing forces me to heavily use AI, so my workflow is kind of the old way, but with less hassle.
- Do your thinking alone. (AI part: search, understanding)
- Specing. (AI part: search, understanding, completing some text)
- Coding like the old days. (AI part: search, understanding, code examples)
- Okay, now I have a good idea of how my feature is going to work
- Look for fluff code and delegate it to AI to write/review it.
- Focus on the part of the code I want to have fun doing.
- Review.
- Repeat.
It’s slower than the approach of doing specs and letting AI do the rest, while focusing your role only on code review. However, I’m more in control of what I build, I can explain what I built better than everyone else, and I build up my knowledge. (also I have less problems, because less code haha)
Will I go for the full Agentic way ? Maybe but I will find a way to slow it down so I can be in control
I've been working on inverting the control theory for the agent loop. Instead of the user initiating everything, the agent runs automatically in the background and calls the user for feedback as part of tool use. The end game for me is to get rid of the chat interface altogether and move back toward async email and other messaging channels. The chatbot UI as a means of driving the business always felt like a temporary stepping stone / clever demo.
I think there are 10-100x productivity gains lurking in here. It is very expensive for a human to reserialize their mental state into a prompt each time a task needs working on. An agent can do this ~instantly and with high frequency 24/7. The higher the rate of evaluation the less change has to be dealt with between any two iterations. So, the likelihood that a given iteration needs human help goes down as you increase the rate of evaluation per unit of wall clock time. Tighter and faster control loops tend to require less severe corrective measures than slow and sloppy ones.
This is the most plausible reason for so many tokens in the future. I can actually see a million tokens per second making sense. I have a pretty good idea how I'd approach this if I actually had access to this kind of infrastructure. 1Mtok/s is baby tier in terms of raw information theory. The politics of employing a system like this are far more terrifying to me than any technological aspects. Humans really like having control over things, even when that control is pure downside for the business.
Just use both. vscode + copilot for autocomplete and antigravity for prompting. My main editor is vscode and I do heavy manual edits, and I also heavily use antigravity. I feel like I've become very productive and not actually renouncing the ways of the programmer, but augmenting them, as long as I'm careful enough with the generated code by checking every file it touched.
I don't think you would expect to get into a flow state if you were intermittently directing another (human) programmer to do work, and you shouldn't expect to with LLM-driven coding either. Perhaps you are best finding out ways to extend the length of time where the LLM can work without prompting, then use that downtime to focus on other tasks that will help you to guide it better the next time you need to prompt it.
I spent 3 months on Pi and the Opencode Go plan for open model inference. I've never had so much fun on a computer. If you are looking for a place to start, that should be it. Or check out: https://github.com/huggingface/tau at https://twotimespi.dev
I've now use GPT-5.5 as the primary. Code quality was just higher. I tinker and do R&D with open models then come back and refactor my slop into usable code I can save for future use.
As Boris said, you shouldn't be manually prompting anymore but asking the AI to prompt itself, in the form of workflows. I usually have 3 to 5 different sessions running at once all autonomously.
I recently started an internship in a field I am very interested in. I began using claude to write a lot of my code, but realized that:
a) It was way too easy to just auto-approve everything. Answering the 5-10 spec questions it asked me made me feel like I was an important part of the loop, but really it was just a way to make me feel important while spraying my slop cannon.
b) I wasn't actually learning anything, defeating the whole purpose of the internship I worked hard to get.
I am now using a workflow where the brainstorming process is the same, but I have claude write an instructional document for me to implement. It has instructions to ask me questions about what I know / want to know, to lay out the plan iteratively with lots of verification steps, and to heavily explain portions of the code that are unfamiliar to me. It's sorta like making my own custom tutorials specifically for the problem I am working on.
It's a little slower, but not too bad since it does still put whole codeblocks in the instructions. I have a much better understanding of what I am doing, I still get to enjoy learning and programming and improving, and I don't feel like a reverse centaur.
Stop searching for “different ways of coding”. This is it.
LLMs have barely been around for a few years. People are addicted, seriously addicted, to the next shiny workflow. It’s like JavaScript frameworks all over again.
The way to get over this addiction, is to just stop talking about it. Stop reading another BS article about how someone used agents to do some 10x improvement already. Unsubscribe from company channels where people endlessly bikeshed how to use some new LLM or agent harness or whatever.
By now, everything you need to know to maximize productivity has already been discovered. There are no new tricks, and even if there were, you’re really not missing out, the old tricks still work just fine. Just get out there and work.
A) spec driven development
B) opinionated skills that use GitHub tickets, merge gates and execution of ticket graphs
I keep a TODO file where I just write my ideas in free text, and every once in a while I tell claude "I updated the TODO file".
This is basically like queueing up prompt.
I wish Claude Code had a thing like that builtin. Like a "user ideas scratchpad".
using tools like claude code and codex constantly boosts our dopamine, making hyperflow impossible. these days, most engineers work on multiple projects simultaneously to satisfy their dopamine receptors.
Ive built a couple things in the past few months that have leaned heavily on LLM as my programmer. Mainly Claude code, but occasionally codex also. Its a different way to produce. I spend more time doing something like plain text feature mapping. simple .md files, good flow and creativity. Then once i'm happy with it, i pass it off to the dev team- claude to code up and integrate. I feel like im flowing in the part of the process I always was. But the buzz of getting something working is gone. More like slow satisfaction of getting something useful at the end.
I saw another post today about a startup making an oven for baking bread. I feel that often the main issue lies with what you need 'the code' to accomplish.
My flow state is thinking about and understanding this: am I solving a problem that needs to be solved now, for the right person?
I created this to help me understand it (project foundations + create milestones) and then bring it to reality (ship milestones).
Computers are like a bicycle for the mind.
LLM AI is like Uber for the mind.
I was thinking how it would be interesting to make an environment where instead of LLM just crapping out a bunch of code really fast, it works more like a pair programming exercise going at human speed.
The LLM would explain what it's doing, then write a bit of code, then you have time to look at it and understand it, and go to the next step. At any point you can interject and discuss or change it.
I find the biggest problem is that once an LLM generates a bunch of code, it's really hard for a human to build up the context for what the code is doing and why. When you're coding normally or pairing, then you're gradually absorbing the context and what the code is doing throughout the process.
The reality is that writing code fast was never the bottleneck. It's understanding the code and making sure it's actually doing what's needed that's hard.
I have built a radically different system in general called Abject (https://abject.world). I have been thinking about how agents are the wrong abstraction and what an operating system can look like now that we have LLMs. It's not designed to help you with your website or app, but it does code and make different kinds of apps within it's system. In principle it should also be able to code apps outside it's ecosystem but I haven't tried that in any serious way.
The fundamental problem i keep seeing across all harnesses is the use of the exact same UX afforded by a git based backend. If we want to stay in flow, the LLMs edit backend would have to be based off something like crdts to handle simultaneous edits.
Related question: are there any close-to-gpt5.5/opus-level good autocompletion models?
I assume you're not talking about solo work. For me, the quickest way to "flow state" has always started with human discussion. Only then does writing the code become a trivial task I can do alone. LLMs will never provide the flow because they do not set the goals. The tail does not wag the dog. They can only suggest implementation details.
Unless I'm working on totally unfamiliar problems, I don't want that advice for the majority of the code. Contrary to popular belief, there exist so many situations where there's exactly one right answer and countless wrong ones. There are less important miscellaneous parts I might have it fill in.
The only reason I cannot completely delegate to AI is because it cannot read my mind. Even then, it would probably still suggest crap since it is the averaging of those countless wrong answers. And still, even if it could overcome all that, I'm only saving a few hours at the end of several days of meetings.
I'm just not getting where the value is for anyone beyond entry level. I'm being totally honest when I say that I even stopped needing most search and documentation (for mature tools) over a decade ago. Back then, Stack Overflow was at its peak and I had the same questions about it. Offline coding is not only possible, but increasingly easier.
What am I doing different here?
My current approach which I've been testing on two MVPs with what I would call 'moderate success' (but hey, actual success!)
3 tier, philosophy-spec-design. Increasing detail. Design files include db model explanations and pseudocode/function headers - that level of detail.
For each thing I need to change, I have a, prompt ready to go to ask the agent to follow about 5 steps and it outputs a 'reviewfile' with details of what it things about the thing I posited. I review its output. I have another prompt ready to then get an agent to generate a taskfile + update the design documentation. The taskfile explains in great detail what has changed and what needs to be implemented. I review the taskfile and got diffs of the design doc changes. Finally an agent implements the taskfile. I review all changed code and commit.
It gets there, but still definitely misses some stuff. It's very adequate for a MVP I'm finding.
Edit: this seems to only work with Opus. Sonnet can't do it (maybe I'm just lucky and Opus is seriously compensating for an awful approach and I'm just lucky?)
Interesting
I have noticed this too, but have a different problem
The "flow state" has never been where I want to be, it is where I make my worst mistakes and where the details swamp the bigger picture. A "cannot see the wood for the trees" problem
I am developing a practice for agentic coding, involving plans, reviews and check points. But the "twirling my thumbs" waiting for the agent to do its thing is a related problem for me
> but I haven't been able to enter flow state like I can when I hand write code.
Fixing that for you.
I haven't been able to enter flow state like I can when I write code.
Thanks to being unemployed, the last few months I've been experimenting a lot with coding agents, harnesses and most importantly, the workflows around them.
Currently I'm refining what I think works best for me, which I'd call something like "issues/PR based LLM workflow", powered mainly by this action I'm building on top of the Pi coding agent SDK: https://github.com/shaftoe/pi-coding-agent-action
Essentially I issue prompts swapping between the terminal and the git forge web app (GitHub and my own Forgejo instance) and it currently looks something like this:
- create an issue with detail/quality of spec based on how the task or the project is important
- trigger a Pi session prepending a comment in the forge with "/pi " to work on it, either to produce a report or to e.g. implement the change in a new PR
- trigger more sessions in the same thread, be it an issue or a PR, to steer or to add more requests like fork out a new PR or similar. This works also for reviews so I just add comments and the submit a review with "/pi follow the comments instructions" or similar
- if I want more fine graded control and I am at the workstation I use the bridging Pi extension to pick up the work locally: https://github.com/shaftoe/pi-coding-agent-action/tree/devel...
- rinse and repeat until I'm either happy with the change or the PR is so bloated that I get rid of it and start anew
I know it's probably something Claude / Codex / Cursor offer with their web app but I want the freedom and the flexibility to use the LL provider/model I want, and Pi as a harness does that plus all the rests egregiously. Another advantage is that I can fit the LLM action in any pipeline I want and take care of chores like automated changelog generation and what not.
As I said it's still mostly work in progress but in general I think there's lot of potential with this kind of workflow, it forces me to keep the scope of the changes small (I still want to review the PR content after all) and gives me a memory for free just leveraging the ticketing system. I also like the fact the harness is running most of the time in the ci/cd sandbox which, in the case of Forgejo, I control fully.
PS I try to keep my work with/on AI tools on my website at https://a.l3x.in/ai
1. Find a problem that LLMs suck and you're good at. Then you'll have no choice but to enter the flow state.
There's lots of those still. Portable shell programming is my favorite. Even the most capable models limp at it, but I thrive on my own, so it becomes an interaction where I really feel I need to think.
2. Work on dense programs, and use LLM for debugging only. LLMs suck at writing dense code. They thrive at redundancy and verbosity, so it will make you avoid it and use it for adjacent work, not the main thing.
3. Multitask. Ride several bikes at once, but not for the sake of doing more (for that you could automate), do it for the multitasking. Parallelize, split projects into multiple work fronts, work on reducing the time to mental switch between contexts. It's not coding per se, but a great skill, AI involved or not.
I built a terminal that is also an agent comms system. Way back when Claude Code first came out I hacked together something to get two of them talking to each other and inserting text and reading from each others respective TTYs, and it was horribly hacky, so I set about actually understanding how CLIs, TUIs, and terminals work (I had written a simplified terminal based on jquery-terminal a long while ago during Covid that hacked tool runs using GPT-2, so this was overdue). I've been writing code for decades, so have all of it handy in case I need to point the LLMs to a particular way of doing things. Refactoring constantly is key.
There's cmux in this space, but I had already used Hyper for years, so I decided it was time to fork something and build on it. Cmux does tabs in panes AND panes in tabs. Hyperia does addressable panes in tabs and windows. I've tried to keep it minimalistic, which helps with flowing back and forth between different projects (I typically work on 3-4 at a time). I added a Rust sidecar, making all objects addressable over MCP, so Claude Code, Codex, or a small local model on Ollama can split panes, run commands, and read screens, with one hard rule enforced in the harness rather than the prompt: an agent can never move my focus, other than asking for permissions to access a new object. ACLs too. Hyperia also carries an agent loop that wires into it's own MCP server, so a local model in a Javascript "shell" can control resizing the terminal (handy for videos), or opening a project and setting up the agent panes.
I stay typing in a pane while the agents work in theirs, in my peripheral vision, and web panes sit right next to terminals so docs, webapps/sites and the agent chat live in the same window. Reviewing becomes glancing instead of context switching, which is the closest thing to ideal flow with prompts I've gotten out of this auto-AI stuff. Tab and pane clicks copy the address into the buffer, then I paste and issue commands referencing what I want dealt with. I have an SDR radio on my box that allows me to talk to a given pane (WIP not in the build yet). Working on getting the local agent stuff done and wired to the radio.
The upshot of this approach is enabling agents running in one tab, all mounting the same directory, with one in charge of the others. Claude Code is great at this, and it saves on the tokens it would normally use for doing it itself. I talk to Claude, or whatever I pick, and it talks to the rest of the agents and coordinates the work. I like Antigravity a lot because it moves crazy fast for coding. With Claude in control and GLM-5.2 doing auditing and explaining to me how development is going. As an example. No unseeable agent army here. No need for it, actually.
About the only thing that trips me up at the moment is having to work on Hyperia itself, which I don't do inside of it because of restarts. When I work on Hyperia, I start an agent in Windows terminal and wire it into the MCP for testing. I build installers constantly as well, and then run through the Q&A process by using it to work on other projects I'm doing.
I use Zed for code editing and viewing, but rarely. I also just open things in special sticky notes (or have the agent do it) so I understand how we're doing things. GLM-5.2 took to the planning stickys like a fish in water.
https://github.com/deepbluedynamics/hyperia
https://github.com/deepbluedynamics/nemesis8 (n8)
Both are open source, obviously. It's worth mentioning they will remain that way and will never require a service plan or any other cost. I built them because I needed it for another project I will be selling, not aimed at developers at all.
n8 implements the agent runs in containers. This is a separation of concerns - in runs in any terminal and controls the session starts and search for previous sessions (as well as monitoring the usage of tokens, CPU, network and file access). Working on the dashboard for that now, so I can easily see which files are changing, how much they changed, and what changed in them. I co-founded Loggly, so that crap is in my wheelhouse.
This isn't the tab completion model. It works great for the way my brain works, but I also think having an agentic terminal is a good move for anyone writing code and we'd all be better off if we ran agents in containers over our bare metal. It makes it way easier to see what the agent is doing (and resuming later), and allows it to do most of its work in the container, as opposed to running loose on my box..
I build a lot of my own tools, to suit exactly how I want to work. Obviously, having a little thinky guy in the computer to do most of the busy work of making new tools accelerates that, but tools that make the LLMs suit me also accelerates my general work.
Some stuff I've built:
https://github.com/swelljoe/tandem - Tandem is a sysadmin buddy that travels with you over ssh. Just a wrapper over tmux and claude code (or whatever agent you like), it opens two panes in tmux, one with an ssh session to one of the hundreds of devices I maintain, and one with a local Claude Code configured to use a local work space and instructed via CLAUDE.md/AGENTS.md to use tmux to interact with the remote machine. I built it because a lot of my coworkers were installing Claude Code on our robots and authenticating there to get help with robot troubles, and that felt bad. This allows them to keep all sensitive stuff locally and still get help troubleshooting directly on the device. I happen to find it useful, sometimes, too.
https://github.com/swelljoe/nelson - Nelson is a fancy Ralph loop for security bug hunting that I built to help audit my own software. It's also grown to include a benchmark suite I'm using to figure out which models are worth using for security work. I've published some of those benchmark results, and have a few hundred hours/dollars worth of new ones to publish this weekend. Turns out the benchmarking is more interesting, so that's gotten more attention than the bug-hunting side, but the benchmarks inform how the bug-hunting side works, and I added multi-model/multi-pass scans and de-dupe features recently because I found that letting models have a couple bites at the apple increases discovery, and there are bugs that only some models catch, and it's not always the top model that finds them. There's some overlap, but also some divergence. This research has also led me to start working on a harness for security auditing tasks; giving the agent tools and project structure data to lift detection and reduce false positives.
https://github.com/swelljoe/flar - FLAR is the Fast Light Agent Restrictor. It bubblewraps an agent so it is quite safe to use agents on your local machine, even with `--dangerously-skip-permissions` (which makes agents more fun to use). The sandbox feature found in most agents is porous and can be expanded by the agent harness itself. Similarly, if the agent introduces a supply chain attack into your code and runs it before you get a chance to audit/review it in a PR or run it through an SBOM dependency checker, the blast radius is exactly the project directory and the credentials/history of the one agent. (Whereas, without flar, the blast radius is your whole .ssh, github creds, all agent creds, your keyring, whatever secrets are in your home, etc.) This one is new. Just made it because I was talking about how I always put agents in VMs because I don't trust them. Someone suggested `srt` (https://github.com/anthropic-experimental/sandbox-runtime) and I like the idea but I don't like how complicated and huge and JavaScript it is. You can read and understand the entirety of `flar` in one sitting. Anyway, to break out of "prompt/response", you have to skip permissions, or call it via `claude -p` or API with tasks to perform. Nelson does the latter and `flar` does the former.
That's not to mention all the side projects and other stuff I've been able to make a lot of progress on.
The biggest one is finishing https://venturous.app/ (or, at least I made it do what I most wanted it to do, which is provide map overlays of US public lands and mobile data provider coverage so I can find cool places to camp free while staying connected). This is a re-implementation of an old defunct app called FreeRoam that I absolutely loved when I traveled full-time. I built half of it over several months by hand, and then Claude helped finish it in a few weekends and holidays. I'll get Claude to help build the mobile apps someday.
I’m writing a JSX templating language — to manage context, branching, etc automatically. You hand it a spec/existing work and it automatically applies a recipe.
So far that’s been much nicer for anything large or complex, because I was spending all my time on context piping.
You are the bottleneck.
Why should AI be limited to human time. Is a mountain? A galaxy?
[dead]
[flagged]
[flagged]
[flagged]
[flagged]
[dead]
[dead]
[flagged]
[flagged]
I think the fundamental aspect of flow is that it requires a high amount of cognitive engagement. Most of the time you're just not getting that from interacting with an LLM because the process is relatively passive. There are also forced breaks while it does its internal CoT which breaks flow.
I think a lot of people get a sort of novelty effect when first interacting with an LLM which can feel superficially like flow, but it's different in that it eventually wanes and what really happens in practice is you're encouraged to disengage and this makes it almost impossible to get into a true flow state.
The risk here I think is that if you get humans disengaging from the task at hand, there's a higher chance of bugs being introduced. You might move slightly faster in the short term but be forced to hit the brakes in the medium/long term.
I am currently in the process of launching my AI teams platform that I've been working on since at least January. It's https://PersonaStack.ai. I'm doing it without VC money and all by myself. I've used over 110B tokens so far building it.
You get some amazing results with teams of AIs if you do it right. The key is to control behavior with what integrations and responsibilities each agent has. That way they naturally adapt, delegate, fact check each other, and generally act more autonomously.
This is already running the automated news site ainews.personastack.ai complete with social media posts 100% automated.
It also runs the issue triage, coding, reviews, and releases for the Kuberhealthy open source CNCF project, which is another thing of mine.
I don't think the next step is really smarter models. It's how we make the models more effective, and teams, when done right, net the best results I've seen.
Hoping to get noticed here soon, but it's extremely hard to do solo I'm finding.