The tests are absolutely essential, otherwise there's no signal to guide the LLM towards correct behavior and hallucinations accumulate until any hope of forward progress collapses.
Obviously the signal is comparison against the behavior of the original.
Obviously the signal is comparison against the behavior of the original.