Even though big, dense models aren't fashionable anymore, they are perfect for specdec, so it can be fun to see the speedup that is possible.
I can get about 20 tokens per second on the DGX Spark using llama-3.3-70B with no loss in quality compared to the model you were benchmarking:
llama-server \
--model llama-3.3-70b-instruct-ud-q4_k_xl.gguf \
--model-draft llama-3.2-1b-instruct-ud-q8_k_xl.gguf \
--ctx-size 80000 \
--ctx-size-draft 4096 \
--draft-min 1 \
--draft-max 8 \
--draft-p-min 0.65 \
-ngl 999 \
--flash-attn on \
--parallel 1 \
--no-mmap \
--jinja \
--temp 0.0 \
-fit off
Specdec works well for code, so the prompt I used was "Write a React TypeScript demo". prompt eval time = 313.70 ms / 40 tokens (7.84 ms per token, 127.51 tokens per second)
eval time = 46278.35 ms / 913 tokens (50.69 ms per token, 19.73 tokens per second)
total time = 46592.05 ms / 953 tokens
draft acceptance rate = 0.87616 (757 accepted / 864 generated)
The draft model cannot affect the quality of the output. A good draft model makes token generation faster, and a bad one would slow things down, but the quality will be the same as the main model either way.