From what I can tell, only Apple even wants to try doing any of the processing on-device. Including parsing the speech. (This may be out-of-date at this point, but I haven't heard of Amazon or Google doing on-device processing for Alexa or Assistant.)
So there's no way for them to do anything without sending it off to the datacenter.
Alexa actually had the option to process all requests locally (on at least some hardware) for the first ~10 years, from launch until earlier this year. The stated reason for removing the feature was generative AI.