Their explanation for why their idea (SSD) might work - precision-exploration conflict hypothesis - is something adaptive decoding also tries to solve.
I've been wondering about adaptive decoding! It seems obvious to me that at some points during decoding (reasoning, "creative thinking") you would want a higher temperature, while at other points (emitting syntactically correct code, following a plan that was already established) you would want lower temperature.
I've been wondering about adaptive decoding! It seems obvious to me that at some points during decoding (reasoning, "creative thinking") you would want a higher temperature, while at other points (emitting syntactically correct code, following a plan that was already established) you would want lower temperature.