logoalt Hacker News

jstanleytoday at 8:09 AM1 replyview on HN

The most interesting thing in here is https://github.com/smhanov/laconic which is the author's "agentic research orchestrator for Go that is optimized to use free search & low-cost limited context window llms".

I have been doing this kind of thing with Cursor and Codex subscriptions, but they do have annoying rate limits, and Cursor on the Auto model seems to perform poorly if you ask it to do too much work, so I am keen to try out laconic on my local GPU.

EDIT:

Having tried it out, this may be a false economy.

The way it works is it has a bunch of different prompts for the LLMs (Planner, Synthesizer, Finalizer).

The "Planner" is given your input question and the "scratchpad" and has to come up with DuckDuckGo search terms.

Then the harness runs the DuckDuckGo search and gives the question, results, and scratchpad to the Synthesizer. The Synthesizer updates the scratchpad with new information that is learnt.

This continues in a loop, with the Planner coming up with new search queries and the Synthesizer updating the scratchpad, until eventually the Planner decides to give a final answer, at which point the Finalizer summarises the information in a user-friendly final answer.

That is a pretty clever design! It allows you to do relatively complex research with only a very small amount of context window. So I love that.

However I have found that the Synthesizer step is extremely slow on my RTX3060, and also I think it would cost me about £1/day extra to run the RTX3060 flat out vs idle. For the amount of work laconic can do in a day (not a lot!), I think I am better off just sending the money to OpenAI and getting the results more quickly.

But I still love the design, this is a very creative way to use a very small context window. And has the obvious privacy and freedom advantages over depending on OpenAI.


Replies

andaitoday at 9:24 AM

Yeah, came here to mention that too!

From the article:

>To manage all this, I built laconic, an agentic researcher specifically optimized for running in a constrained 8K context window. It manages the LLM context like an operating system's virtual memory manager—it "pages out" the irrelevant baggage of a conversation, keeping only the absolute most critical facts in the active LLM context window.

The 8K part is the most startling to me. Is that still a thing? I worked under that constraint in 2023 in the early GPT-4 days. I believe Ollama still has the default context window set to 8K for some reason. But the model mentioned on laconic GitHub (Qwen3:4B) should support 32K. (Still pretty small, but.. ;)

I'll have to take a proper look at the architecture, extreme context engineering is a special interest of mine :) Back when Auto-GPT was a thing (think OpenClaw but in 2023), I realized that what most people were using it for was just internet research, and that you could get better results, cheaper, faster, and deterministically, by just writing a 30 line Python script.

Google search (or DDG) -> Scrape top N results -> Shove into LLM for summarization (with optional user query) -> Meta-summary.

In such straightforward, specialized scenarios, letting the LLM drive was, and still is, "swatting a fly with a plasma cannon."

(The analog these days would be that many people would be better off asking Claw to write a scraper for them, than having it drive Chromium 24/7...)

show 1 reply