logoalt Hacker News

chroniclerlast Sunday at 1:02 PM2 repliesview on HN

I don't even have enough knowledge to grasp the first video. Is there a list of knowledge requirements to look at?


Replies

jsightlast Sunday at 2:42 PM

3blue1brown videos are great if you want to go deep on the math behind it.

If you are struggling with the neural network mechanics themselves, though, I'd recommend just skimming them once and then going back for a second watch later. The high level overview will make some of the early setup work make much more sense in a second viewing.

HarHarVeryFunnylast Sunday at 6:16 PM

IMO that's a bit of a strange video for Karpathy to start with, perhaps even to include at all.

Let me explain why ...

Neural nets are trained by giving them lots of example inputs and outputs (the training data) and incrementally tweaking their initially random weights until they do better and better at matching these desired outputs. The way this is done is by expressing the difference between the desired and current (during training) outputs as an error function, parameterized by the weights, and finding the values of the weights that correspond to the minimum value of this error function (minimum errors = fully trained network!).

The way the minimum of the error function is found is simply by following its gradient (slope) downhill until you can't go down any more, which is hopefully the global minimum. This requires that you have the gradient (derivative) of the error function available so you know what direction (+/-) to tweak each of the weights to go in the downhill error direction, which will bring us to Karpathy's video ...

Neural nets are mostly built out of lego-like building blocks - individual functions (sometimes called nodes, or layers) that are chained/connected together to incrementally transform the neural network's input into it's output. You can then consider the entire neural net as a single giant function outputs = f(inputs, weights), and from this network function you can create the error function needed to train it.

One way to create the derivative of the network/error function is to use the "chain rule" of calculus to derive the combined derivative of all these chained functions from their own individual pre-defined derivative functions. This is the way that most machine learning frameworks, such as TensorFlow, and the original Torch (pre-PyTorch) worked. If you were using a machine learning framework like this then you would not need Karpathy's video to understand how it is working under the hood (if indeed that is something you care about at all!).

The alternative, PyTorch way, of deriving the derivative of the neural network function, is more flexible, and doesn't require you to build the network just out of nodes/layers that you already have the derivative functions for. The way PyTorch works is to let you just use regular Python code to define your neural network function, then record this python code as it runs to capture what it is doing as the definition of neural network function. Given this dynamically created neural network function, PyTorch (and other similar machine learning frameworks) then uses a built-in "autograd" (automatic gradient) capability to automatically create the derivative (gradient) of your network function, without someone having had to do that manually, as was the case for each of the lego building blocks in the old approach.

What that first video of Karpathy's is explaining is how this "autograd" capability works, which would help you build your own machine learning framework if you wanted to, or at least understand how PyTorch is working under the hood to create the network/error function derivative for you, that it will be using to train the weights. I'm sure many PyTorch users happily use it without caring how it's working under the hood, just as most developers happily use compilers without caring about exactly how they are working. If all you care about is understanding generally what PyTorch is doing under the hood, then this post may be enough!

For an introduction to machine learning, including neural networks, that assumes no prior knowledge other than hopefully being able to program a bit in some language, I'd recommend Andrew Ng's Introduction to ML courses on Coursera. He's modernized this course over the years, so I can't speak for the latest version, but he is a great educator and I trust that the current version is just as good as his old one that was my intro to ML (building neural nets just using MATLAB rather than using any framework!).