lichess.org
Donate

Detecting engine use and false postives

A while back I reported some suspect behavior and it was quickly acted on. Since then I've been thinking about how sure we can be about identifying engine use, and what the chances are of "false positives", i.e. the tests saying someone used an engine when they actually didn't.

I won't lay out my back-of-envelope math here, but making some plausible assumptions I came up with a ball park figure of 1 in 100,000 that in some shortish game a player's moves would exactly match the moves made by some given engine.

That seems pretty conclusive, but if there are 10 different engines that we compare against, and 20,000 games a day played on the site, that would mean we'd expect to see a couple of games a day where someone purely by chance made the exact same moves as some engine.

I don't know what the actual detection procedures are here, but I thought I'd raise the question.

This type of thing comes up a lot with medical tests for rare conditions, where the small chance of a test result being wrong is balanced by the very large proportion of people tested people who won't have the condition, meaning that many of the results that come out positive are actually for people who are perfectly healthy.

Similarly if cheating is rare, a fair proportion of people flagged by a detection test are liable to be innocent.

I don't know if these type of considerations have been thought about, so I thought I'd mention them.
I want to rectify: there are not 20,000 games played per day on lichess, but 120,000.

To answer your question, we don't decide to mark a cheater on a single game, and we have other indicators (like move times and window switching).
Yeah, a single game could be a false positive, but it's unlikely if three or four separate indications of cheating (other than perfect moves) are also present. I am no mathematician, but perhaps you could work out the false positive rate of four different criteria all being satisfied in a 70 move game?

You're vastly oversimplifying the criteria, and I expect the ballpark figure to be closer to the billions.

But as Thibault said, not a single game is enough, you'd have to work out the probability of it happening over several games.
Good to know.

The numbers are "for instance". The chances of the test suite giving a false positive are pretty hard to know, 1 in 100,000 is a plausible guess only. The generally point is that a test or suite of tests that seems very reliable in an individual case can still throw up lots of false positives when it's used on a large population, like 120k games per day!

The worry is that if it's hard to quantify the risk of a false positive then it's equally hard to say what the chances are that someone who tested positive really has the condition. e.g. If the chance of a false positive is 1 in 120k, you'd expect to see about one a day, if it's 1 in 120m, only one every three years. So it's fairly conceivable that there could be several false positives per day.

But if you look at multiple games, I think that is going to be pretty reliable.
Some notes for anyone else that wants to think through the math and estimate the probabilities involved (if only for fun!)...

- Long games are very unlikely to give accidental matches. The probability goes down roughly exponentially with the number of moves.

- The 1 in 100k might apply to an 18-22 move game or so. Some of the moves are book, some are obvious or forced, so there may only be 5 or 6 where it's fair to compare.

- In many positions the engine choice will be one of a few reasonable options that most players would likely consider, so chances of matching a single move on those are say 1/3 or 1/4.

- In a few positions during the game the engine move is one which a normal player is very unlikely to consider or choose. But in a typical position there might only be 10-20 moves that aren't ridiculous, e.g. not putting something en pris, pointlessly retreating etc. So even making random beginner moves, you might have as much as 1/10 chance of matching the engine move.

- Paradoxically weak players might match these "computer-like" moves more often. By definition they're moves that typical players wouldn't consider, but someone playing more randomly might, without understanding what the point of them might be.

- Multiply up all the move-by-move probabilities to get the chance of matching the whole game.

- Life gets a whole lot more complicated if you think people don't follow the engine for every single move, but just in a few important positions. If they do that, I suspect it will be very hard to detect.
Answer my question, a player plays about 10 games. All games with time control about 20 or less minutes and ALL games have

0 inaccuracies
0 blunders
and
0 mistakes
with ALL games having less then 10 centipawn loss per move on average.

Is this player cheating?

I would say 99.9 percent this player is cheating and that is good enough for the ban hammer. The 0.1 percent who are not cheating would represent players who are IM or GM level which is very rare. In this case the IM or GM should show identity proof if they are accused of cheating.
And this is assuming a player has only played 10 games and no more.

If a player plays 10,000 games there is a much better chance that out of those 10 games he may have perfect play.

And window switching isn't a reliable factor to determine cheating because someone could have another computer on the table and input the moves on the other computer into the chess engine.
Even one game could be enough if it had enough moves in it, 10 games almost certainly. But that's assuming the moves match an engine's choice exactly.

If the moves are simply close enough in evaluation so as to not be classed as inaccuracies it's hard to say anything for sure.

The important thing to be aware of is that if you have a test that is literally "99.9% accurate", when you apply that test to 2,000 people you will get 2 false positives. If only 0.1% of people were actually really cheating, the most likely outcome from testing is that 4 people will be flagged up, but 2 of them will be innocent.

Like I said, this kind of issue comes up a lot with medical diagnoses, with DNA evidence in criminal trials and such like. It's hard to grasp the logic of the math unless you're trained in it, and people's instincts about the odds are generally way off.

It sounds like the site takes a reasonable amount of care to investigate properly though.

This topic has been archived and can no longer be replied to.