>Three whole books likely exceeds their context window size of course
This was not “read all three books”, this was “check these three links with the (known) book synopsis/reviews there” and it made up the third one.
>I would consider this a failure in their tool use capabilities, not their reading ones.
Id give it to you if I got an error message, but the text being enhanced with wrong-but-plausible data is clearly a failure of reliability.