I kept making a hacky eval tool to try and compare the outputs of different models, model configs, prompt versions, etc and finally rolled it up into a web app + downloadable app (kind of like Postman or Insomnia, but for AI).
Free, holds your keys in localstorage and makes direct calls to the APIs (unless there's a CORS issue), at https://evvl.ai if you want to try.