> There are ways to robustly clean this up analytically but it is largely beyond the capabilities of current tech stacks.
Can you expand on that? Even just conceptually that sounds really hard, how would you know whether you're measuring genuine (unexpected) changes in the environment rather than the result of (possibly sophisticated and coordinated) deliberate manipulation?
If you don't trust your measurements, and you shouldn't because all physical world measurements are proxies, the alternative is to find several unrelated and redundant proxies for "ground truth" and have them corroborate and correct each other. There is a lot of errors, bugs, noise, idiosyncratic behavior, etc degrading the data even ignoring intentional manipulation so you really should be doing this anyway.
Stitching unrelated proxies and sensing modalities into a coherent data model is a spatiotemporal graph reconstruction problem. The join predicates require non-trivial inference algorithms if you want to avoid being buried in false positives. From this you can derive an estimate of ground truth and a model of uncertainty at a point in space and time.
The model of uncertainty is dynamic and unpredictable. It is difficult to manipulate the measurement without producing data that falls outside the uncertainty model across every proxy by which someone might construct that uncertainty model. This is similar to how e.g. GPS spoofing is detected in military systems. All GPS updates must fit within a (classified) dynamic uncertainty model relative to INS; if an update falls outside the model then the GPS signal is presumed compromised and updates ignored.
At the limit, this restricts manipulation to values within the uncertainty model. If you have a lot of unrelated proxies, you can make the window of uncertainty tight enough that manipulation becomes effectively impossible. At a minimum, the adversary would need to be able to manipulate every proxy and modality feeding your uncertainty model simultaneously.
These graph, spatial, and spatiotemporal algorithms scale very poorly on traditional data infrastructure and these data models easily run into petabytes if you are stacking multiple independent data sources.