I thought this would be inherent just on their training? There are many multitudes more Reddit posts than scientific papers or encyclopedia type sources. Although I suppose the latter have their own biases as well.
I'd expect LLMs' biases to originate from the companies' system prompts rather than the volume of training data that happens to align with those biases.
I'd expect LLMs' biases to originate from the companies' system prompts rather than the volume of training data that happens to align with those biases.