Read Brooks' argument in detail, if you haven't. He has spent decades getting robots to play nicely in human environments, and he gets invited to an enormous number of modern robotics demonstrations.
His hardware argument is primarily sensory. Specifically, current generation robots, no matter how clever they might be, have a physical sensorium that's incredibly impoverished, about on par with a human with severe frostbite. Even if you try to use humans as teleoperators, it's incredibly awkward and frustrating, and they have to massively over-rely on vision. And fine-detail manual dexterity is hopeless. When you can see someone teleoperate a robot and knit a patterned hat, or even detach two stuck Lego bricks, then robots will have the sensors needed for human-level dexterity.
I did read it, and I found it so lacking that it baffles me to see people actually believe it to be a well-crafted argument.
Again: we can't even make a universal robot work in a sim with perfect sensor streams! If the issue was "universal robots work fine in sims, suffer in real world", then his argument would have had a leg to stand on. As is? It's a "robot AI caught lacking" problem - and ignoring the elephant in the room in favor of nitpicking at hardware isn't doing anyone a favor.
It's not like we don't know how to make sensors. Wrist-mounted cameras cover a multitude of sins, if your AI knows how to leverage them - they give you a data stream about as rich as anything a human gets from the skin - and every single motor in a robot is a force feedback sensor, giving it a rudimentary sense of touch.
Nothing stops you from getting more of that with dedicated piezos, if you want better "touchy-feely" capabilities. But do you want to? We are nowhere near being limited by "robot skin isn't good enough". We are at "if we made a perfect replica of a human hand for a robot to work with, it wouldn't allow us to do anything we can't already do". The bottleneck lies elsewhere.