Interesting. So why do the models seem to handle deeply nested Lisp expressions just fine?
Probably because there's a ton of code that deals with nested parentheses across languages in the training data, and models have learned how to work around tokenization limitations, when it comes to parentheses.
Probably because there's a ton of code that deals with nested parentheses across languages in the training data, and models have learned how to work around tokenization limitations, when it comes to parentheses.