The point is that you probably don't care that much how exactly the dependency is called, as long as it is called in such a way that it does the action you want and returns the results you're interested in. The test shouldn't be "which methods of the dependency does this function call?" but rather "does this function produce the right results, assuming the dependency works as expected?".
This is most obvious with complex interfaces where there are multiple ways to call the dependency that do the same thing. For example if my dependency was an SQL library, I could call it with a string such as `SELECT name, id FROM ...`, or `SELECT id, name FROM ...`. For the dependency itself, these two strings are essentially equivalent. They'll return results in a different order, but as long as the calling code parses those results in the right order, it doesn't matter which option I go for, at least as far as my tests are concerned.
So if I write a test that checks that the dependency was carried with `SELECT name, id FROM ...`, and later I decide that the code looks cleaner the other way around, then my test will break, even though the code still works. This is a bad test - tests should only fail if there is a bug and the code is not working as expected.
In practice, you probably aren't mocking SQL calls directly, but a lot of complex dependencies have this feature where there are multiple ways to skin a cat, but you're only interested in whether the cat got skinned. I had this most recently using websockets in Node - there are different ways of checking, say, the state of the socket, and you don't want to write tests that depend on a specific method because you might later choose a different method that is completely equivalent, and you don't want your tests to start failing because of that.
The fakes vs mocks distinction here feels like a terminology debate masking violent agreement. What you’re describing as a “fake” is just a well-designed mock. The problem isn’t mocks as a concept, it’s mocking at the wrong layer. The rule: mock what you own, at the boundaries you control. The chaos you describe comes from mocking infrastructure directly. Verifying “deleteUserById was called exactly once with these params” is testing implementation, not behavior. Your HashMap-backed fake tests the right thing: is the user gone after the operation? Who cares how. The issue is finding the correct layers to validate behavior, not the implementation detail of mocks or fakes… that’s like complaining a hammer smashed a hole in the wall.