logoalt Hacker News

jamesblondetoday at 1:27 PM9 repliesview on HN

I gave a talk at PyData Berlin on how to build your own TikTok recommendation algorithm. The TikTok personalized recommendation engine is the world's most valuable AI. It's TikTok's differentiation. It updates recommendations within 1 second of you clicking - at human perceivable latency. If your AI recommender has poor feature freshness, it will be perceived as slow, not intelligent - no matter how good the recommendations are.

TikTok's recommender is partly built on European Technology (Apache Flink for real-time feature computation), along with Kafka, and distributed model training infrastructure. The Monolith paper is misleading that the 'online training' is key. It is not. It is that your clicks are made available as features for predicitons in less than 1 second. You need a per-event stream processing architecture for this (like Flink - Feldera would be my modern choice as an incremental streaming engine).

* https://www.youtube.com/watch?v=skZ1HcF7AsM

* Monolith paper - https://arxiv.org/pdf/2209.07663


Replies

bobektoday at 6:09 PM

It is not only recommender though. These guys [1] seem to be able to react pretty quickly and not to create addicts on the way ;(

[1] https://recombee.com

dmixtoday at 2:17 PM

I noticed Youtube shorts also seems to update the feed based on how long the last video you watched. If you're scrolling quickly then stop to watch a dog video long enough the next one is likely to be another dog video.

show 4 replies
vjerancrnjaktoday at 3:09 PM

Flink is too slow for this.

If by features you mean tracking state per user, that stuff can be tracked without Flink insanely fast with Redis as well.

If you re saying they dont have to load data to update the state, I dont see how massive these states are to require inmemory updates, and if so, you could just do inmemory updates without Flink.

Similarly, any consumer will have to deal with batches of users and pipelining.

Flink is just a bottleneck.

If they actually use Flink for this, its not the moat.

show 1 reply
lsureshtoday at 5:35 PM

Thanks for the Feldera shoutout Jim.

For anyone else, if you want to try out Feldera and IVM for feature-engineering (it gives you perfect offline-online parity), you can start here: https://docs.feldera.com/use_cases/fraud_detection/

miohtamatoday at 3:13 PM

TikTok's differention is the userbase of all teenagers in the world.

show 3 replies
ryanjshawtoday at 2:20 PM

Great insight. Any thoughts on RisingWave?

show 1 reply
Jamesbeamtoday at 2:21 PM

[flagged]