This is beautifully written, thanks for sharing.
I could see myself using some of the source code in the classroom to explain how transformers "really" work; code is more concrete/detailed than all those pictures of attention heads etc.
Two points of minor criticism/suggestions for improvement:
- libraries should not print to stdout, as that output may detroy application output (imagine I want to use the library in a text editor to offer style checking). So best to write to a string buffer owned by a logging class instance associated with a lm.rs object.
- Is it possible to do all this without "unsafe" without twisting one's arm? I see there are uses of "unsafe" e.g. to force data alignment in the model reader.
Again, thanks and very impressive!
> best to write to a string buffer
It's best to call a user callback. That way logs can be, for example, displayed in a GUI.