Is this for just like auto complete, because you are not going to get anything very useful out of a ...

empath75 • today at 2:07 PM • 2 replies • view on HN

Is this for just like auto complete, because you are not going to get anything very useful out of a code-only training set.

JohannaAlmeida • today at 2:27 PM

Yeah auto complete is an amazing use case. I needed a small model that used transformers , could fit on my weak consumer GPU .

So i needed to make fundamental arquitecture changes .Do some KV cache tricks.

And then prove the new arquitecture was faster with benchmarks and perplexity was acceptable.

altruios • today at 3:06 PM

I think it's more a proof of concept: locally trained. It would take lots of resources/time to train something non-trivial.

alt Hacker News