logoalt Hacker News

apical_dendriteyesterday at 2:29 AM0 repliesview on HN

I'm assuming that the goal of the bloom filter is to prevent the model from producing output that infringes copyright rather than hide that the text is in the training data.

In that case the model would lose the ability to provide relatively brief quotes from copyrighted sources in its answers, which is a really helpful feature when doing research. A brief quote from a copyrighted text, particularly for a transformative purpose like commentary is perfectly fine under copyright law.