Is there evidence that modern LLMs identify parts of speech in an observable way? This explanation sounds more like how we did it in the 90s before deep learning took over.
https://arxiv.org/abs/1906.04341
https://arxiv.org/abs/1905.05950
https://en.wikiversity.org/wiki/Psycholinguistics/Models_of_...
https://arxiv.org/abs/1906.04341
https://arxiv.org/abs/1905.05950
https://en.wikiversity.org/wiki/Psycholinguistics/Models_of_...