logoalt Hacker News

retracyesterday at 6:27 PM2 repliesview on HN

That presumes that languages with little morphology do not have equivalent structures at work elsewhere doing the same kind of heavy lifting.

One classic finding in linguistics is that languages with lots of morphology tend to have freer word order. Latin has lots of morphology and you can move the verb or subject anywhere in the sentence and it's still grammatical. In a language like English syntax and word order and word choice take on the same role as morphology.

Inflected languages may indeed have more information encoded in each token. But the relative position of the tokens to each other also encodes information. And inflected languages appear to do this less.

Languages with richer morphology may also have smaller vocabularies. To be fair, this is a contested conjecture too. (It depends a lot on how you define a morpheme.) But the theory is that languages like Ojibwe or Sansrkit with rich derivational morphologies and grammatical inflections simply don't need a dozen words for different types of snow, or to describe thinking. A single morpheme with an almost infinite number of inflected forms can carry all the shades of meaning, where different morphemes might be used to make the same distinctions, in a less inflected language.


Replies

adamzwassermanyesterday at 8:14 PM

These are good points that sharpen the hypothesis. The word order question is interesting — positional encoding vs morphological encoding might have different computational properties for a parser.

One difference I'm betting on: morphological agreement is redundant (same information marked multiple times), while word order encodes information once. Redundancy aids error correction and may lower pattern extraction thresholds. But I'm genuinely uncertain whether that outweighs the structural information carried by strict word order.

Do you have intuitions on which would be "easier" for a statistical learner? Or pointers to relevant literature? The vocabulary size / morpheme count tradeoff is also something I hadn't fully considered as a confound.

show 1 reply
pessimizeryesterday at 7:06 PM

You saved me from posting this. Strict word order makes a lot of things easier that have to be done through morphology in the vulgar Latins.

> Languages with richer morphology may also have smaller vocabularies. To be fair, this is a contested conjecture too.

I agree with the criticism of this to an extent. A lot of has seemed to me like it relies on thinking of English as a sort of normal, baseline language when it is actually very odd. It has so many vowels, and it also isn't open so has all of these little weird distinguishing consonant clusters at the end of syllables. And when you compare it to a language conjugated with a bunch of suffixes, those suffixes gradually both make the words very long, and add a bunch of sounds that can't be duplicated very often at the end of roots without causing confusion.

All of that together means that there's a lot more bandwidth for more words. English, even though it has a lot more words than other languages, doesn't have more precise words. Most of them are vague duplications, including duplicating most of Norman French just to have special, fancy versions of words that already existed. The strong emphasis on position in the grammar and the vast number of vowels also allows it to easily borrow words from other languages without a compelling reason.

I think all of that is enough to explain why English is such an outlier on vocabulary size, and I think you see similar in other languages that share a subset of these features.

show 1 reply