logoalt Hacker News

sippeangelotoday at 6:50 PM4 repliesview on HN

With all respect to Mozilla, "respects robots.txt" makes this effectively DoA. AI agents are a form of user agent like any other when initiated by a human, no matter the personal opinion of the content publisher (unlike the egregious automated /scraping/ done for model training).


Replies

MrTravisBtoday at 7:24 PM

This is a valid perspective. Since this is an emerging space, we are still figuring out how to show up in a healthy way for the open web.

We recognize that the balance between content owners and the users or developers accessing that content is delicate. Because of that, our initial stance is to default to respecting websites as much as possible.

That said, to be clear on our implementation: we currently only respond to explicit blocks directed at the Tabstack user agent. You can read more about how this works here: https://docs.tabstack.ai/trust/controlling-access

mossTechniciantoday at 7:32 PM

I agree with you in spirit, but I find it hard to explain that distinction. What's the difference between mass web scraping and an automated tool using this agent? The biggest differences I assume would be scope and intent... But because this API is open for general development, it's difficult to judge the intent and scope of how it could be used.

show 1 reply
ugh123today at 6:55 PM

100%

observationisttoday at 7:29 PM

Exactly. robots.txt with regards to AI is not a standard and should be treated like the performative, politicized, ideologically incoherent virtue signalling that it is.

There are technical improvements to web standards that can and should be made that doesn't favor adtech and exploitative commercial interests over the functionality, freedom, and technically sound operation of the internet