TFA does directly mention the NPU "Arm-China Zhouyi: 30 TOPS (Dedicated)"
"you cannot simply use standard versions of PyTorch or TensorFlow out of the box. You must use the NeuralONE AI SDK."
Neon is a SIMD instruction set for the CPU, not a separate accelerator. It doesn't need an SDK to use, it's supported by compiler intrinsics and assembly language in any modern ARM compiler.
Quite right, I mixed up Neon with NN:
https://www.arm.com/products/silicon-ip-cpu/ethos/arm-nn