Because both of them are optimized for hardware. Neural networks, despite the name, have very little similarity to biologics.
There's a lot of multiplication of numbers in parallel, so it makes sense to try to fit that to matrices.
Cryptography is built bottom-up, but likewise it makes sense to exploit data structures that already exist in silicon.
Ironic that both Shannon and Turing layed the foundation for both cryptography and AI. I think it boils down to information which is related to language and text.
Can anyone recommend any good content to learn cryptography? Like, even if I read the algorithm for AES I have zero understanding about why it works this way
I've finished the Cryptography I on Coursera already. Can't recommend it enough
Ecologically speaking, there is a term called “carcinization” the evolutionary tendency for different organisms to independently evolve crablike forms.
The condition for carcinization is usually described as a kind of “shared condition.” After reading this article, that is what I felt.
In other words, from the perspective of shared conditions, isn’t it possible that systems receive similar pressures when they need to mix information?
1. There is a state space. 2. Each part of the input affects many parts of the output. 3. A simple rule is not enough, so nonlinearity becomes necessary. 4. But the hardware cannot be allowed to stall, so the system evolves toward a structure where simple transformations are repeated many times.
Ultimately, even across different fields, the core question is how to decompose complexity into atomic units. The choice of those units tends to converge under the pressures imposed by the underlying substrate. This seems to be the central thesis of the article.
This feels similar to how humans solve nonlinear differential equations.
If so, perhaps the structure of human cognition itself works in a similar way: when facing nonlinearity, we break it into smaller structures and design around those smaller parts.
Because my academic background is limited, I find it difficult to express this properly in language. But I think this kind of pressure can also be applied to programming and software theory.
When I think about software engineering, it also often starts from the smallest element that does not change easily, and then builds larger systems by composing those elements. In OOP, that unit is the conceptual object. In FP, it is the function. In DOP, it is data.
FP is mathematical. DOP is aligned with the data that computers store and transmit. OOP is connected to our abstract model of the world. That may be why different people are good at different paradigms.
OOP compresses the world into objects and responsibilities. FP compresses the world into functions and composition. DOP compresses the world into data and transformable structures. Utlimately, it is a question of how we cut complexity, what we choose as the minimal unit of decomposition.
Then what should this idea be called? And if we apply this to AI coding, what would it imply?
I have thoughts, but because I did not study enough, I feel frustrated that I cannot express them more fully. I wish I had learned more.
are they really? seems not accurate to me, the devil is in the details
I would argue that they are not the same, but there is a symmetry between them.
The central problem of cryptology is to prevent inference about either the key or the plaintext, despite the requirement to be able to reconstruct the plaintext from the ciphertext+key. So ciphers have to almost perfectly mix information.
Machine learning is possible because in the absence of perfect mixing, inference is possible (given many input output pairs), even if the information is many decibels down below the noise. So the information about what parameters need changing is present in the output despite many subsequent layers of processing. This means that a lot of mixing can be tolerated, and it's needed because you don't know in advance what the data flow should look like in detail, so the NN has to provide as many options as possible.