The test isn't for how well an LLM can find or replace a string. It's for how well it can carry out given instructions... Is that not obvious?