Thank you so much <3
Yes, I recently wrote https://github.com/samwho/llmwalk and had a similar experience with cache vs no cache. It’s so impactful.
Hopefully you can write the teased next article about how Feedforward and Output layers work. The article was super helpful for me to get better understanding on how LLM GPTs work!
Hopefully you can write the teased next article about how Feedforward and Output layers work. The article was super helpful for me to get better understanding on how LLM GPTs work!