logoalt Hacker News

dangtoday at 12:36 AM1 replyview on HN

Alas, yes, at least for now. Seems like an LLM could be good at finding them though. A regex is probably too crude.


Replies

wizzwizz4today at 1:24 AM

The old lesson from the Wizard of Oz experiment says that a regular expression probably isn't too crude, if you're willing to take the time to design it. Though you could probably get away with running a regex golf algorithm (e.g. https://nbviewer.org/url/norvig.com/ipython/xkcd1313.ipynb) over the list of matching titles, and the union of some list of non-matching-but-close titles (chosen to get good discrimination) with some list of way-off titles (to avoid overfitting). (You could treat the whole HN title database, other than the ones you've identified, as losers, but that risks hardcoding the absence of a post you accidentally missed, and would also take slightly longer – though Peter Norvig's first algorithm takes time linear in the number of losers, so it might not be too expensive. I don't know how expensive his improved versions are, given large lists of losers: https://nbviewer.org/url/norvig.com/ipython/xkcd1313-part2.i.... Better algorithms are surely available.)