I was thinking about FizzBuzz and thought it might be cool to benchmark various LLMs to see the highest number they could go before they got it wrong. FizzBuzz is cool because you can test whether the model's can generalize to any other game (divisors of 3 and 7 instead of 3 and 5 for example).
Fun, short and sweet experiment to run over the weekend, with some mildly interesting results :)