20 tokens per second for eval time is the killer here. It means you can't use this to process any meaningful amount of text.
A GPU typically processes close to 1000 tokens/s during eval.
I'm pretty sure eval time is token generation time where it's actually outputting new tokens. If you're getting a thousand per second on that, I'd love to know on what.
The prompt is literally "why is the sky blue?" and consists of 7 tokens.
It's probably too small for the timings to be taken seriously.