I used to think that zero alloc = zero malloc, and all stack allocations are of statically known fixed size (you know the max call depth), so you can preallocate your stack area with some confidence, and will never run out of RAM.
The line you point at creates a single local pointer variable which is used in a tight loop; I don't see why won't it stay entirely in a register.
I'm not a real embedded developer though; last time I worked as one I worked on 8-bit devices. Maybe things changed since then.
I think your experience on 8 bit is just fine. Imagine, if you will, that your 8 bit micro has 2 kB of RAM, such as the famous atmega328p of the Arduino UNO. Sure the compiler might put it into a register, but it might not. It most certainly won't put where later in the code they define 3 66 byte arrays on the stack, but that's maybe ok. The question is: how do you preallocate the stack safely? How do you know exactly what your usage is without overflowing the stack and wreaking havoc? Maybe you profile the code with debug on and it's X bytes, then in release mode it's Y because register packing. This effects all code, but it's something we need to be cognizant of when we're trying to maximize the 2 kB. It's easy to throw kilobytes of stack around on desktop. Megabytes even. I've done gigabytes before for quick and dirty stuff. But on deeply embedded 8 bits, you don't want to be doing that.
My bigger point was that no malloc should be called "stack allocated" or some other more technically correct term. That tells me "hey if you run this code and something goes haywire, check your stack isn't corrupted" because 9 times out of ten for me that's the problem.