Identifying AI Content Is A Fool's Errand

Pro@programming.dev · 2 days ago

Identifying AI Content Is A Fool's Errand

Bronzebeard@lemmy.zip · 2 days ago

If you can create a tool that accurately identifies what is AI generated, then you’ve just created a tool that can be used to train AI to trick it.

This is essentially how many types of models are trained, already.

Lvxferre [he/him]@mander.xyz · 2 days ago

The core argument of the text isn’t even arms race, like yours. It’s basically “if you can’t get it 100% accurate then it’s pointless lol lmao”. It’s simply a nirvana fallacy; on the same level of idiocy as saying “unless you can live forever might as well die as a baby”.

With that out of the way, addressing your argument separately: the system doesn’t need to be 100% accurate, or perfectly future-proof, to be still useful. It’s fine if you get some false positives and negatives, or if you need to improve it further to account for newer models evading detection.

Accuracy requirements depend a lot on the purpose. For example:

you’re using a system to detect AI “writers” to automatically permaban them - then you need damn high accuracy. Probably 99.9% or perhaps even higher.
you’re using a system to detect AI “writers”, and then manually reviewing their submissions before banning them - then the accuracy can be lower, like 90%.
you aren’t banning anyone, just trialling what you will / won’t read - then 75% accuracy is probably enough.

I’m also unsure if it’s as simple as using the detection tool to “train” the generative tool. Often I notice LLMs spouting nonsense the same model is able to call out afterwards as nonsense; this hints that generating content with certain attributes is more complex than detecting if some content lacks them.