I've been using Llama models to identify cookie notices on websites, for the purpose of adding ...

antonok • 01/21/2025 • 4 replies • view on HN

I've been using Llama models to identify cookie notices on websites, for the purpose of adding filter rules to block them in EasyList Cookie. Otherwise, this is normally done by, essentially, manual volunteer reporting.

Most cookie notices turn out to be pretty similar, HTML/CSS-wise, and then you can grab their `innerText` and filter out false positives with a small LLM. I've found the 3B models have decent performance on this task, given enough prompt engineering. They do fall apart slightly around edge cases like less common languages or combined cookie notice + age restriction banners. 7B has a negligible false-positive rate without much extra cost. Either way these things are really fast and it's amazing to see reports streaming in during a crawl with no human effort required.

Code is at https://github.com/brave/cookiemonster. You can see the prompt at https://github.com/brave/cookiemonster/blob/main/src/text-cl....

Replies

GardenLetter27 • 01/22/2025

It's funny that this is even necessary though - that great EU innovation at work.

➕ show 3 replies

bazmattaz • 01/22/2025

This is so cool thanks for sharing. I can imagine it’s not technically possible (yet?) but it would be cool if this could simply be run as a browser extension rather than running a docker container

➕ show 3 replies

rpastuszak • 01/22/2025

Tangentially related, I worked on something similar, using LLMs to find and skip sponsored content in YT videos:

https://butter.sonnet.io/

binarysneaker • 01/22/2025

Maybe it could also send automated petitions to the EU to undo cookie consent legislation, and reverse some of the enshitification.

➕ show 3 replies

alt Hacker News

Replies