Have you done this yourself? If you haven't I think you'd discover server hardware is actually shockingly reliable. You could go years without needing to physically touch anything on a single machine. I find that people who are used to cloud assume stuff is breaking all the time. That's true at scale, but when you have a handful of machines you can go a very long time between failures.
If you have failover redundancy of services across your systems of some kind to mitigate then great. With proper setup no worries. I guess it depends how much you want to take on vs hand off.
Yes, having done this for decades, it happens often enough that you need to plan for it. You need to have redundancy, spare parts, and staffing or you are basically gambling. All of this has to be tested, too, or you might find that your failover mechanism has dependencies you didn’t plan for or unexpected failure modes (I’ve twice experienced data center hard outages due to the power distribution system failing oddly when switching between mains and UPS power, or UPS and generator).
Using something like AWS can make it easy to assume that servers don’t fail often but that’s because the major players have all of that behind the scenes, heavily tested, and will migrate VMs when prefail indicators trigger but before stuff is done.