> If a harness is needed, it can make its own. If tools are needed, it can chose to bring out these tools.
If I understand correctly the model can carry only very limited memory among tests, so it looks like it's not really possible for the model to self specialize itself under this assumptions.