Training data is quite literally weighted this way - long responses on Reddit have lots of tokens, and brief responses don't get counted nearly as much.
The same goes for "rules" - you train an LLM with trillions of tokens and try to regulate its behavior with thousands. If you think of a person in high school, grading and feedback is a much higher percentage of the training.
Not to mention that Reddit users seek "confident idiots". Look at where the thoughtful questions that you'd expect to hear in a human social setting end up (hint: Downvoted until they disappear). Users on Reddit don't want to have to answer questions. They want to read the long responses that they can then nitpick. LLMs have no doubt picked up on that in the way they are trained.