I am guarded on today's LLM after much unit testings myself.
For as long as LLM use the probabilistic predictive next-token for an algorithm, there shall be glaring errors when encountering a complex-logic (or even compound-logic).
In short, use AND, OR, NOR, XOR sparingly when doing AI prompt. Elevate your err-dar when doing so.