Yeah, my current boss spent time weeding out such hardware bugs: https://arxiv.org/abs/2110.11519 (EDIT: maybe https://x.com/Tesla_AI/status/1930686196201714027 is a more relevant citation)
They found a bimodal distribution in failures over the lifetime of chips. Infant mortality was well understood. Silicon aging over time was much less well understood, and I still find surprising.