logoalt Hacker News

JohannaAlmeidatoday at 2:27 PM0 repliesview on HN

Yeah auto complete is an amazing use case. I needed a small model that used transformers , could fit on my weak consumer GPU .

So i needed to make fundamental arquitecture changes .Do some KV cache tricks.

And then prove the new arquitecture was faster with benchmarks and perplexity was acceptable.