I wonder if this would have been useful

willwade • yesterday at 1:55 PM • 3 replies • view on HN

I wonder if this would have been useful https://github.com/microsoft/presidio - its heavy but looks really good. There is a lite version..

Replies

shaoz • yesterday at 7:48 PM

I've used it, lots of false positives out of the box, you need to do a ton of tuning or put a transformer/BERT model with it, but then at that point it's basically the same thing as the OP's project.

threecheese • yesterday at 5:37 PM

Looks like it uses Googles Langextract, which uses only LLMs for NLP, while OP is using a small NER model that runs locally.

winchester6788 • yesterday at 3:11 PM

full of false positives though. but definitely good for some types of entities and regexes

alt Hacker News

Replies