Fair point. There is a distinction between syntactic efficiency (C is terse) and task-completion efficiency (what the benchmark likely measured). If the tasks involved string manipulation, hash maps, JSON, etc. then C pays a massive token tax because you are implementing what other languages provide in stdlib. Python has dict and json.loads(), C has malloc and strcmp.
So: C tokenizes efficiently for equivalent logic, but stdlib poverty makes it expensive for typical benchmark tasks. Same applies to Factor/Forth, arguably worse.