I am pretty sure llamacpp have their own benchmarking binary that you can use.
llama-bench is part of the llama-cpp package, but from recent experimentation, the settings it is able to (or is documented to?) accept lag behind somewhat. Not sure whether it would accept all of the esoteric settings in the article?
llama-bench is part of the llama-cpp package, but from recent experimentation, the settings it is able to (or is documented to?) accept lag behind somewhat. Not sure whether it would accept all of the esoteric settings in the article?