If you have a less-popular CPU, compilers today can be utter trash. GCC doesn't understand the TinyAVR core and emits insane assembly. Like iterating an array, instead of putting the array pointer in the Z register and using the atomic load-and-increment instructions, it will add to the pointer, read, subtract from the pointer, loop. It also uses the slower load instruction. Overall, looping over an array in C is 4 times slower than assembly, and consumes three times as much program space. Try examining the assembly from your next program, you'll probably be quite surprised at how awful it is.
I had to implement Morton ordering on this platform. The canonical C for this blows up to over 300 instructions per iteration. I unrolled the loop, used special CPU hardware and got the entire thing in under 100 instructions.
Compilers, even modern ones, are not magic and only understand CPUs popular enough to receive specific attention from compiler devs. If your CPU is unpopular, you're doing optimizations yourself.
Assembly doesn't matter to arduino script kiddies, but it's still quite important if you care at all about execution speed, binary size, resource usage.