Identifying AI Content Is A Fool's Errand

Pro@programming.dev · 2 days ago

Identifying AI Content Is A Fool's Errand

Lvxferre [he/him]@mander.xyz · 2 days ago

[OP, sorry for the harsh words. They’re directed at the text and not towards you.]

To be blunt this “essay” is a pile of shit. It’s so bad, but so bad, that I gave up dissecting it. Instead I’ll list the idiocies = fallacies = disingenuous arguments it’s built upon:

Nirvana idiocy = fallacy: “unless its perfect than its useless lol lmao”.
Begging the question: being trained on [ipsis ungulis] “the entire corpus of human output” with enough money to throw at it won’t “magically” make AI output indistinguishable from human generated content.
Straw man: if the author is going to distort the GPTZero FAQ, to double down on the nirvana idiocy, they should at least clip the quote further, to not make it so obvious. There’s a bloody reason the FAQ is focusing on punishment.

Note nirvana fallacy is so prevalent, but so prevalent, that once you try to remove it the text puffs into nothing. The whole text is built upon it. (I’m glad people developing anti-spam systems don’t take the same idiocy seriously, otherwise our mailboxes would be even worse than they already are.)

Bronzebeard@lemmy.zip · 2 days ago

If you can create a tool that accurately identifies what is AI generated, then you’ve just created a tool that can be used to train AI to trick it.

This is essentially how many types of models are trained, already.

Lvxferre [he/him]@mander.xyz · 2 days ago

The core argument of the text isn’t even arms race, like yours. It’s basically “if you can’t get it 100% accurate then it’s pointless lol lmao”. It’s simply a nirvana fallacy; on the same level of idiocy as saying “unless you can live forever might as well die as a baby”.

With that out of the way, addressing your argument separately: the system doesn’t need to be 100% accurate, or perfectly future-proof, to be still useful. It’s fine if you get some false positives and negatives, or if you need to improve it further to account for newer models evading detection.

Accuracy requirements depend a lot on the purpose. For example:

you’re using a system to detect AI “writers” to automatically permaban them - then you need damn high accuracy. Probably 99.9% or perhaps even higher.
you’re using a system to detect AI “writers”, and then manually reviewing their submissions before banning them - then the accuracy can be lower, like 90%.
you aren’t banning anyone, just trialling what you will / won’t read - then 75% accuracy is probably enough.

I’m also unsure if it’s as simple as using the detection tool to “train” the generative tool. Often I notice LLMs spouting nonsense the same model is able to call out afterwards as nonsense; this hints that generating content with certain attributes is more complex than detecting if some content lacks them.