That is literally the thing the parent poster wants to avoid by running open models.
[edit] I was a little unfair -- lack of access to training data is a bit of an issue (perhaps moreso for analysis than for for actual use, considering what it takes to train these models). I'm thankful that some of them are also distributed as base models, which should be relatively unbiased compared to what happens later during finetuning.
That is literally the thing the parent poster wants to avoid by running open models.
[edit] I was a little unfair -- lack of access to training data is a bit of an issue (perhaps moreso for analysis than for for actual use, considering what it takes to train these models). I'm thankful that some of them are also distributed as base models, which should be relatively unbiased compared to what happens later during finetuning.