I recently wrote a small CLI tool for scanning through legacy codebases. For each file, it does a li...

jiggawatts • yesterday at 8:00 PM • 2 replies • view on HN

I recently wrote a small CLI tool for scanning through legacy codebases. For each file, it does a light parse step to find every external identifier (function call, etc...), reads those into the context, and then asks questions about the main file in question.

It's amazing for trawling through hundreds of thousands of lines of code looking for a complex pattern, a bug, bad style, or whatever that regex could never hope to find.

For example, I recently went through tens of megabytes(!) of stored procedures looking for transaction patterns that would be incompatible with read committed snapshot isolation.

I got an astonishing report out of Gemini Pro 3, it was absolutely spot on. Most other models barfed on this request, they got confused or started complaining about future maintainability issues, stylistic problems or whatever, no matter how carefully I prompted them to focus on the task at hand. (Gemini Pro 2.5 did okay too, but it missed a few issues and had a lot of false positives.)

Fixing RCSI incompatibilities in a large codebase used to be a Herculean task, effectively a no-go for most of my customers, now... eminently possible in a month or less, at the cost of maybe $1K in tokens.

Replies

jammaloo • yesterday at 8:24 PM

Is there any chance you'd be willing to share that tool? :)

➕ show 1 reply

mrtesthah • yesterday at 8:20 PM

If this is a common task for you, I'd suggest instead using an LLM to translate your search query into CodeQL[1], which is designed to scan for semantic patterns in a codebase.

1. https://codeql.github.com/

alt Hacker News

Replies