I would love to see how they do with functional languages and especially Lisps here. I've noticed pretty good performance with Emacs Lisp relative to overall model strength, but I haven't used LLMs to application code in any such languages.
It would also be interesting to see how Python compares to other languages in its niche (Ruby, Perl, Raku).
Thanks for putting this together! It's interesting.
I just did a side-by-side with Claude Code Python vs. Raku for DSL use ... https://slangify.org if you are interested.
That's a good idea. Would you rather see Lisp or Scala? Any interest in Prolog? We are trying to be selective to keep the data concentrated, but we will eventually add a couple more, most likely to sample different programming paradigms.
I've noticed that with clojure(script) unless you specifically instruct them to keep nesting levels low, they can hit a point where they make a paren placement error and can't debug their way out of it. Although in my case while one model made the error then couldn't find what it had done, a second model that I switched to was then able to identify it and back it out. So I suspect this is a transient weakness in today's models, not something fundamental.