logoalt Hacker News

The Hardest Document Extraction Problem in Insurance

24 pointsby sgondala_ycappyesterday at 9:09 PM3 commentsview on HN

Comments

cadamsdotcomtoday at 12:22 AM

There are some patterns here that everyone using AI should build in. Examples below are things I’m doing in my harness that enhances Claude Code (https://codeleash.dev)

1. Self-correction (human out of the loop) - give the AI opportunities to see mistakes it made and correct them. Think linting, but your agent wrote the linter weeks ago, it’s code, and its output is fed back with line numbers and recommendations for fixes. Maybe you have a specific architecture - why not have your agent write a script that walks your entire codebase’s AST and flags violations. If you guarantee that check gets run, you’ll never see a violation again because the agent will fix them before declaring itself done. Bye bye dumb AI mistakes.

2. Success criteria - a more subjective version of self correction. When the agent declares itself done, automatically run a process (agentic or code or both) which determines if the agent’s job is truly done. For example make your coding agent harness’s stop hook run the full test suite and feed back any failures. If the agent actually does get to finish work and your harness notifies you, you have certainty that your tests all pass. For extra credit you can have your harness inform the agent to review its own work using a self-review checklist. I’ve got this so dialed in, that the only time I interact with an agent is to approve its initial plan, then again to approve the plan it comes up with after its self review.

3. Self-reflection & continuous improvement. As the agent works, the harness should be generating logs of its work. Did it try to escape the test-driven-development state machine? Did it edit with shell commands instead of edit tools? Just before the agent is about to drop context (hits compaction or stops work), ask it to output details of anything it learned, as well as having it review its own work logs. These learnings can then be used to improve the harness, improve the self-review checklist, improve the docs (agent OR human docs), improve the automations, uncover process gaps, or even guide you to code refactorings so the agent doesn’t get surprises in future.

The result is you can trust the output. It’s all about putting a deterministic shell around the AI, by supervising a process while it does the work - because the surface area and complexity of a powerful process is so much lower than that of the work itself.

It’s just like management: set up guardrails, define the outcome, monitor the process, trust that it’ll lead to quality work, continuously improve everything at every opportunity.

lschuelleryesterday at 11:05 PM

I find this approach quite interesting. Leveraging the text/context skills as pure logical and structred processing wouldn't work. Basically, dealing with edge cases as almost everthing is an edge case in doc processing. Would like to know, if and how easy this is transferable to banking? E.g. processing consumer loans, which includes a fair amount of insurance paperwork as well.

chrisjjyesterday at 11:12 PM

> We built a self-correcting extraction system that went from 80% to 95% row count accuracy

Got to wonder who has any use for 95% accuracy - at just counting rows.