If this was a peer-reviewed paper, it won't pass.
- Is the wearable accurate enough to be sure that 3bpm is not a measurement fluke? - Why did you use the minimum heart rate value (which could be a measurement glitch) and did not compare a percentile (e.g., 2.5th lowest percentile)? - Were all assumptions for paired t-testing valid? How did you account for likely temporal correlations in the data (e.g., sauna could have an effect also on a night 2 days after it, same for exercise)? - How can you define a "comparable-intensity exercise day" if you don't know the characteristics of the sauna?
> Is the wearable accurate enough to be sure that 3bpm is not a measurement fluke
If the statistical tests show significance (and are valid), the answer to this question is yes. If you have enough data you can make strong conclusions even witwith imperfect hardware.