logoalt Hacker News

willwadeyesterday at 1:55 PM3 repliesview on HN

I wonder if this would have been useful https://github.com/microsoft/presidio - its heavy but looks really good. There is a lite version..


Replies

shaozyesterday at 7:48 PM

I've used it, lots of false positives out of the box, you need to do a ton of tuning or put a transformer/BERT model with it, but then at that point it's basically the same thing as the OP's project.

threecheeseyesterday at 5:37 PM

Looks like it uses Googles Langextract, which uses only LLMs for NLP, while OP is using a small NER model that runs locally.

winchester6788yesterday at 3:11 PM

full of false positives though. but definitely good for some types of entities and regexes