That would be an interesting paper actually; what is the optimal sampling technique given you only have access to the token outputs. Surely someone has already done it.