I actually had a fantastic experience with Fortran lately. I ported a compute kernel from python/numpy to Fortran 2018, partially due to the GIL and partly so I could use Intel's compiler. The performance improvement was tremendous. Several times faster per core, then multiplying further because I could take advantage of threading. In all, the 3 day project increased actual throughput 450x.
(I considered JAX, but the code in question was not amenable to a compute graph. Another option was to thread by fork, and use IPC.)
I liked the language itself more than expected. You have something like "generics" with tensors. Suppose you pass a parameter, N, and you also would like to pass a tensor, and you would like to specify the tensor's shape (N, N). You can do this; the parameter type constraints can reference other parameters.
Tensors and various operations are first-class types, so the compiler can optimise operations easily for the system you're building on. In my case, I got 80% improvement from ifx over gfortran.
Invocation from Python was basically the same as a C library. Both Python and Fortran have facilities for C interop, and Numpy can be asked to lay out tensors in a Fortran compatible way.
Part of what eased the port was that Numpy seems to be a kind of "Fortran wrapper". The ergonomics on tensor addressing, slicing and views is identical.
I did something similar many years ago. I was amazed that Fortran was not more discussed as an option to write performant code within a Python / numpy codebase.
At the time everyone seems to default to using C instead. But Fortran is so much easier! It even has slicing notations for arrays and the code looked so much like Numpy as you say.
I've never found anything to back this up, but my impression was that both the Python / Numpy and Fortran 90 slicing operations were directly inspired by MATLAB (although most of the ideas go back to at least Algol 68).
It also helps that Fortran compatibility is a must for pretty much anything that expects to use BLAS.