I made a blogpost on my submission (currently the top handwritten one at 36 parameters) https://alexlitzenberger.com/blog/building_a_minimal_transfo...
I ask this question as someone who can't do much more than confirm that your blog post is written in English by someone who knows math.
Does this result suggest that if we had N clever humans manually building an LLM, they might come up with something as smart as a frontier model, but potentially 45 times smaller? (1644 / 36 ~= 45, N = very large, time not specified)
I didn't look at all the details, but wanted to see how you did the initial embedding and see you do have a 14x5 matrix there. I guess when you are setting things by-hand (rather than learning), the definition of counting "parameters" is a bit unclear. One could say all those are parameters! even if setting in a straight-forward way.