"a harness for a memory" so it still requires external tools to work well. The whole point of this benchmark is to validate the systems can solve problems without any sort of outside help.