logoalt Hacker News

mark_l_watsonlast Sunday at 6:28 PM2 repliesview on HN

Wow, Sebastian Raschk's blog articles are jewels - much appreciated.

I use the get-oss and qwen3 models a lot (smaller models locally using Ollama and LM Studio) and commercial APIs for the full size models.

For local model use, I get very good results with get-oss when I "over prompt," that is, I specify a larger amount of context information than I usually do. Qwen3 is simply awesome.

Until about three years ago, I have always understood neural network models (starting in the 1980s), GAN, Recurrent, LSTM, etc. well enough to write implementations. I really miss the feeling that I could develop at least simpler LLMs on my own. I am slowly working through Sebastian Raschk's excellent book https://www.manning.com/books/build-a-large-language-model-f... but I will probably never finish it (to be honest).


Replies

imtringuedlast Monday at 11:27 AM

For me it is the opposite. I'm shocked by how simple transformer based models and how small the architectural differences are between the latest models. Almost nothing has changed since late 2023.

lvl155last Sunday at 7:40 PM

He does an amazing job of keeping me up to date on this insanely fast-paced space.