logoalt Hacker News

roadside_picnicyesterday at 10:10 PM1 replyview on HN

What "actual reasoning" are you referring to? I believe you're making my point for me.

Speculative decoding requires the implementer to understand:

- How the initial prompt is processed by the LLM

- How to retrieve all the probabilities of previously observed tokens in the prompt (this also help people understand things like the probability of the entire prompt itself, the entropy of the prompt etc).

- Details of how the logits generate the distribution of next tokens

- Precise details of the sampling process + the rejection sampling logic for comparing the two models

- How each step of the LLM is run under-the-hood as the response is processed.

Hardly just plumbing, especially since, to my knowledge, there are not a lot of hand-holding tutorials on this topic. You need to really internalize what's going on and how this is going to lead to a 2-5x speed up in inference.

Building all of this yourself gives you a lot of visibility into how the model behaves and how "reasoning" emerges from the sampling process.

edit: Anyone who can perform speculative decoding work also has the ability to inspect the reasoning steps of an LLM and do experiments such as rewinding the thought process of the LLM and substituting a reasoning step to see how it impacts the results. If you're just prompt hacking you're not going to be able to perform these types of experiments to understand exactly how the model is reasoning and what's important to it.


Replies

ameliusyesterday at 10:45 PM

But I can make a similar argument about a simple multiplication:

- You have to know how the inputs are processed.

- You have to left-shift one of the operands by 0, 1, ... N-1 times.

- Add those together, depending on the bits in the other operand.

- Use an addition tree to make the whole process faster.

Does not mean that knowing the above process gives you a good insight in the concept of A*B and all the related math and certainly will not make you better at calculus.

show 1 reply