Thanks for the recs, we will look into adding some of these, maybe OCaml for variety. I'm not familiar with Racket.
Mistral Medium 3.5 is on there, but you will have to scroll down pretty far to find it (does not perform well): https://gertlabs.com/rankings?mode=oneshot_coding
Thanks for that, I hadn't scrolled down far enough.
Just want to be sure I'm reading the results correctly... When I compare GPT-5.5 with Mistral Medium 3.5, I see in the tables:
a) Mistral beats GPT in Java and C++
b) It's close for Rust
c) GPT-5.5 easily wins for Go, Javascript, Python and Typescript
Model choice really does appear to be language dependent (assuming I'm reading the results correctly).
What's going on with Qwen3.6 27b? Filtered to Python it comes out at the top of the list, which seems... well, unlikely.
Racket is a variety of Scheme that grew up as a teaching language, but now also has a few other notable niches as well.
Typed Racket is to Racket as TypeScript is to JavaScript: it adds some additional static checks to an otherwise dynamic language via gradual typing. This pair of languages might help begin answer the question "does gradual typing generally help LLMs, or does TypeScript outperform JavaScript for incidental reasons?".
Among Lisps, I'm most interested in seeing Clojure because it's a language I can see myself using with LLMs at work. But Typed Racket and Racket could make an especially interesting pair because of the gradual typing thing.
I'm not sure whether you want to include them in your project. The kind of selectivity you describe yourself as going for is hard for me, especially since I'm not the one doing the work. :)
PS: Aside from this benchmarking and comparison project: Racket is an interesting language and seems like a good place to start if you want to explore classic Scheme texts (Structure and Interpretation of Computer Programs, The Little Schemer, How to Design Programs) or newer ones that try to teach newer or more specialized ideas (e.g., The Little Typer). You may have to tweak the language a bit to stay faithful to some of those books, but that's something Racket is good at and there are already sources noting relevant differences online.
When a non-programmer in my life expressed curiosity about programming, we ended up starting HtDP together and it's been fun. I think Racket was a good choice for that.