>Any data of mine or my company's going over the wire to these models needs to stay verifiably private.
I don't think this is possible without running everyting locally and the data not leaving the machine (or possibly local network) you control.
Interestingly enough, it is possible to do private inference in theory, e.g. via oblivious inference protocols but prohibitively slow in practice. You can also throw a model into a trusted execution environment. But again, too slow.
Once someone else knows, it's no longer a secret.
Without diving too technically here there is an additional domain of “verifiability” relevant to ai these days.
Using cryptographic primitives and hardware root of trust (even GPU trusted execution which NVIDIA now supports for nvlink) you can basically attest to certain compute operations. Of which might be confidential inference.
My company, EQTY Lab, and others like Edgeless Systems or Tinfoil are working hard in this space.