I think it's the opposite, the fact that it is slow means profiling is more important. This is because a 10x difference between unoptimized 0.1ms and optimized 0.01ms in Go could translate to 10s vs 1s in an equivalent python script, which is considerably more noticeable difference