This has become especially true for me in the past four months. The new long context reasoning models are shockingly good at digging through larger volumes of gnarly code. o3, o4-mini and Claude 3.7 Sonnet "thinking" all have 200,000 token context limits, and Gemini 2.5 Pro and Flash can do 1,000,000. As "reasoning" models they are much better suited to following the chain of a program to figure out the source of an obscure bug.
Makes me wonder how many of the people who continue to argue that LLMs can't help with large existing codebases are missing that you need to selectively copy the right chunks of that code into the model to get good results.
This has become especially true for me in the past four months. The new long context reasoning models are shockingly good at digging through larger volumes of gnarly code. o3, o4-mini and Claude 3.7 Sonnet "thinking" all have 200,000 token context limits, and Gemini 2.5 Pro and Flash can do 1,000,000. As "reasoning" models they are much better suited to following the chain of a program to figure out the source of an obscure bug.
Makes me wonder how many of the people who continue to argue that LLMs can't help with large existing codebases are missing that you need to selectively copy the right chunks of that code into the model to get good results.