It's problematic doing this analysis that starts with your own ad-hoc categorisation of whether...

stevage • 10/02/2024 • 3 replies • view on HN

It's problematic doing this analysis that starts with your own ad-hoc categorisation of whether a user is a bot or not, which you have no way of validating. If that categorisation is wrong, then all the analysis is wrong.

I noticed in particular this:

> In late 2022, bot comments really took off... around the same time ChatGPT was first widely available.

But remember that one aspect of the categorisation is:

> Did you know ChatGPT generated comments have a higher frequency of words like game-changer? Bot comments also contained characters not easily typeable, like em-dash, or the product’s name verbatim even when it’s very long or contains characters like ™ in the name.

So...he categorises users as bots if they behave like ChatGPT, and then thinks he has found something interesting when the number of users that behave like that goes up after ChatGPT was released. But it's also possible there were already lots of bots before that, they just used different software that behaves differently so he doesn't detect it.

Replies

kelnos • 10/02/2024

True, but if his categorization of ChatGPT-using bots is correct, I think it's at least notable to see that ChatGPT-generated comments taking off was/is actually a thing. And if the categorization of ChatGPT-generated comments is correct, it's notable that -- even if he's undercounting all bots (including those not using ChatGPT) -- bot-generated comments have far outstripped the number of real-person-generated comments.

Of course, like you say, this is quite a few "ifs". If the assumptions I'm making don't hold, neither does the conclusion.

➕ show 1 reply

sublimefire • 10/02/2024

The post starts with a prompt injection test. The premise is set with an evidence. Suggest alternative categorisation as otherwise your comment seems to be made in bad faith and is unhelpful.

➕ show 1 reply

throwaway48476 • 10/02/2024

Such statistical methods can be accurate for determining whether a comment section is full of bots but much less accurate for determination if any one particular comment is a bot.

alt Hacker News

Replies