built a tiny pytorch clone in c after going through prof. vijay janapa reddi's mlsys book: mlsysbook.ai/tinytorch/
perfect for learning how ml frameworks work under the hood :)
Any reason for creating a new tensor when accumulating grads over updating the existing one?
Edit: I asked this before I read the design decisions. Reasoning is, as far as I understand, that for simplificity no in-place operations hence accumulating it done on a new tensor.
Cool. But this makes me wonder. This negates most of the advantages of C. Is there a compiler-autograd "library"? Something that would compile into C specifically to execute as fast as possible on CPUs with no indirection at all.
woah, this got way more attention than i expected. thanks a lot.
if you are interested in the technical details, the design specs are here: https://github.com/sueszli/autograd.c/blob/main/docs/design....
if you are working on similar mlsys or compiler-style projects and think there could be overlap, please reach out: https://sueszli.github.io/