The deeper reason agents write good Bonsai_term code is that the entire UI renders as plain text, so a screenshot test is just a diff the model can read and verify on its own. A GUI's visual state needs a vision model to inspect, but a TUI's output already lives in the agent's native modality, which closes the feedback loop for free.
for snapshot tests it seems better to diff a data representation such as some yaml string, than to diff UIs