> In Elixir tests, each test runs in a database transaction that rolls back at the end. Tests run async without hitting each other. No test data persists.
And it confuses Claude.
This way of running tests is also what Rails does, and AFAIK Django too. Tests are isolated and can be run in random order. Actually, Rails randomizes the order so if the are tests that for any reason depend on the order of execution, they will eventually fail. To help debug those cases, it prints the seed and it can be used to rerun those tests deterministically, including the calls to methods returning random values.
I thought that this is how all test frameworks work in 2026.
I can attest to everything. Using Tidewave MCP to give your agent access to the runtime via REPL is a superpower, especially with Elixir being functional. It's able to proactively debug and get runtime feedback on your modular code as it's being written. It can also access the DB via your ORM Ecto modules. It's a perfect fit and incredibly productive workflow.
Great article that concretizes a lot of intuitions I've had while vibe coding in Elixir.
We don't 100% AI it but this very much matches our experience, especially the bits about defensiveness.
Going to do some testing this week to see if a better agents file can't improve some of the author's testing struggles.
It seems like the 100% vibe coded is an exaggeration given that Claude fails at certain tasks.
The new generation of code assistants are great. But when I dogmatically try to only let the AI work on a project it usually fails and shots itself in its proverbial feet.
If this is indeed 100% vibe coded, then there is some magic I would love to learn!
ok, so im "vibe-" building out my company's lab notebook in elixir ahead of the first funding check coming in.
im doing some heavy duty shit, almost everything is routed through a custom CQRS-style events table before rollup into the db tables (for lab notebook integrity). editing is done through a custom implementation of quill js's delta OT. 100% of my tests are async.
I've never once run into the ecto issues mentioned.
I haven't had issues with genservers (but i have none* in my project).
claude knows oban really well. Honestly I was always afraid to use oban until claude just suggesting "let's use oban" gave me the courage. I'll be sending Parker and Shannon a first check when the startup's check comes in.
article is absolutely spot on on everything else. I think at this point what I've built in a month-ish would have taken me years to build out by myself.
biggest annoyance is the over-defensiveness mentioned, and that Claude keeps trying to use Jason instead of JSON. Also, Claude has some bad habits around aliases that it does even though it's pretty explicitly mentioned in CLAUDE.md, other annoying things like doing `case functioncall() do nil -> ... end` instead of `if var = functioncall() do else`
*none that are written, except liveviews, and one ETS table cache.
[0] CQRS library: https://hexdocs.pm/spector/Spector.html
[1] Quill impl: https://hexdocs.pm/otzel/Otzel.html
I don’t understand how the author can simultaneously argue that Claude is great at Elixir because it’s a small language with only one paradigm, and also that Claude is bad at Elixir, spewing out non-idiomatic code that makes little sense in the functional paradigm?
It's interesting that Claude is able to effectively write Elixir, even if it isn't super idiomatic without established styles in the codebase, considering Elixir is a pretty niche and relatively recent language.
What I'd really like to see though is experiments on whether you can few shot prompt an AI to in-context-learn a new language with any level of success.
I'm a bit lost on few bad and ugly points.
They could've been sorted with precise context injection of claude.md files and/or dedicated subagents, no?
My experience using Claude suggests you should spend a good amount of time scaffolding its instructions in documents it can follow and refer to if you don't want it to end in the same loops over and over.
Author hasn't written on whether this was tried.
Async or mildly complex thread stuff is like kryptonite for LLMs.
The imperative thing is so frustrating. Even the latest models still write elixir like a JS developer, checking nils, maybe_do_blah helper functions everywhere. 30 lines when 8 would do.
"It writes 100% of our code"
- Silently closes the tab, and makes a remark to avoid given software at any cost.
I dont know erlang. My hobby LLM project is having it write a fully featured ERP in Erlang.
An ERP is practically an OS.
It now has
- pluggable modules with a core system - Users/Roles/ACLs/etc. - an event system (IE so we can roll up Sales Order journal entries into the G/L) - G/L, SO, AR, AP - rollback/retries on transactions
i havent written a line of code
Everyone always ends these articles with “I expect it will get better”
What if it doesnt? What if LLMs just stay mostly the same level of usefulness they are now, but the costs continue to rise as subsidization wears off?
Is it still worth it? Maybe, but not worth abandoning having actual knowledge of what you’re doing.
It's the second time today when I see that the higher number of LoC is served as something positive. I would put it strictly in "Ugly" category. I understand the business logic that says that as long as you can vibe code away from any problems, what's the point of even looking at the code.