Because both of them are optimized for hardware. Neural networks, despite the name, have very little similarity to biologics.
There's a lot of multiplication of numbers in parallel, so it makes sense to try to fit that to matrices.
Cryptography is built bottom-up, but likewise it makes sense to exploit data structures that already exist in silicon.
In addition both have a property similar to dispersion. In crypto each change to an input bit should cascade through as many output bits as possible. In ML each output bit should depend on as much of the input bits (and hidden layers) as possible. So they both feature a similar maximization of entropy.
While modern LLMs are a far cry from biological synapses, I do find it fascinating that if you take the highly reciprocal data of a biological connectome and unroll it into a DAG, you suddenly see motifs popping up that look similar to what we find in AI. I found this both looking at temporal unrolling of RNNs or mapping layer activation weights of a Transformer. Totally agree though, the current LLM architecture itself is driven by the need to shove all of this nicely into parallelized compute hardware.