That sounds so expensive it's hard to see it making money. You'd processing a 2fps video stream for each customer. That's a huge amount of data.
And all that is for the chance to occasionally detect that someone's seen an ad in the background of a stream? Do any platforms even let a streamer broadcast an NFL game like the example given?
I don't think they mean that kinda streamer - the idea is the roku tv can tell you're watching an ad even if it's on amazon prime, apple tv, youtube, twitch, wherever, and associate the ad watching with your roku account to potentially sell that data somehow?
That way they aren't cut out of the loop by you using a different service to watch something and still have a 'cut'.
I assume these systems are calculating an on device perceptual hash. So not that much data needs get flown back to the mothership.
Confirming how many people actually seen the ad is worth big bucks. No one wants to pay for ads they cannot confirm and publisher can make up impressions - if you can catch publisher making up numbers you might get a huge discount or loads of money back.
That's the thing about scaling; you offload the work to the "client" (the TV in this case) and make it do the work, it need not send back more than a simple identifier or string in an API call (of course they'll send more), so they get to use a little bit of your electricity and your TVs processing power to collect data on you and make money, with relatively little required from them, other than some infra to handle the requests, which they would have had anyway to collect the telemetry that makes them money.
Client side processing like this is legitimate and an excellent way to scale, it just hits a little different when it's being used for something that isn't serving you, the user.
source: backend developer
Not necessarily, it can be done on-device, the screenshot hashed, and the results deduplicated and accumulated over time, then compressed and sent off in a neat package. It'd still be a huge amount of data when you add it all up, but not too different from the volume that e.g. web analytics produces.
Then server-side the hash is matched to a program or ad and the data accumulated and reduced even further before ending up in someone's analytics dashboard.
Are there video "thumbprints" like exists for audio (used by soundhound/etc) - IE a compressed set of features that can reliably be linked in unique content? I would expect that is possible and a lot faster lookup for 2 frames a second. If this is the case, the "your device is taking a snapshot every 30 seconds" sounds a lot worse (not defending it - it's still something I hope can be legislated away - something can be bad and still exaggerated by media)
You only need to grab a few pixels or regions of the screen to fingerprint it. They know what the stream is and can process it once centrally if needed.
The actual screenshot isn’t sent, some hash is generated from the screenshot and compared against a library of known screenshots of ads/shows/etc for similarity.
Not super tough to pull off. I was experimenting with FAISS a while back and indexed screenshots of the entire Seinfeld series. I was able take an input screenshot (or Seinfeld meme, etc) and pinpoint the specific episode and approx timestamp it was from.
Attribution is very painful and advertisers will pay lots of money to close that loop.
Is it? I don’t think you need particularly high fidelity to fingerprint ads/programs.
it's hashed on the tv then they compare hashes in aggregate
[dead]
I used to work for an OTT DSP adtech company i.e. a company that bid on TV ad spots in real time. The bidding platform was handling millions of requests per second, and we were one of the smaller fish in the sea. This system is very real. Your tv is watching what you’re watching. I built the attribution pipeline, which is what this is. If you go buy a product from one of these ads, this is how they track (attribute) it. Not to be alarmist butttt you have zero privacy.