Comparing against stockfish isn't fair. That's comparing against enormous amounts of compute spent experimenting with strategies, training neutral nets, etc.
It will lose so badly there will be no point in the comparison.
Besides you could compare models (and harnesses) directly against eachother.
Stockfish is a good reference point, an objective measure of how far the LLM's advanced.