they should run their test against a control baseline such as an open source hosted model to see the overall drift in their test