Spot the Bot: FAQ & Methodology

Learn more about how this game was built and the research behind it.

Why did we build this game?

The most important message we hope players take away is that it takes a neural net to catch a neural net. This game is an exercise designed to help readers appreciate just how sophisticated modern frontier Large Language Models (LLMs) have become at scientific writing.

To investigate this, we recently published a preprint: Fine-Grained Detection of AI-Generated Writing in the Biomedical Literature (PDF). This represents our major effort to detect how AI-generated text is already infiltrating the biomedical literature, often in surprising ways, and how this phenomenon is set to become a much bigger problem in the near future.

What did our research find?

Using Pangram, a transformer-based AI detector optimized for adversarial paraphrasing, we analyzed full-length biomedical research articles from 13 major journals. We detected six papers that appeared to be fully AI-generated. Further contact tracing from these papers revealed a clear clustering of generative AI usage, with multiple papers from the same labs containing extensive AI-generated segments.

More broadly, our fine-grained detection mapped who, where, and when AI is being used. We found that while papers published from 2021-2024 showed almost no detectable AI-generated text, 12.4% of manuscripts published in 2025 contained at least one localized passage classified as AI-written. Geographically, AI usage points heavily toward non-native English speakers: 32% of papers originating from South Korean institutions and 26% from Chinese institutions contained AI-generated passages, compared to 7.4% from U.S. institutions.

How were the synthetic texts generated for the game?

For this game, we selected 58 of the most famous and impactful papers from the biomedical literature. For each paper, we tasked frontier AI models (like Claude, ChatGPT, and Gemini) with generating synthetic versions of either the Abstract, Introduction, or Discussion sections. (Note: For the actual positive control dataset in our research paper, we used random, less well-known papers to ensure the AI hadn't heavily memorized them).

To generate these texts, we provided the AI with the original manuscript minus the section we wanted it to rewrite. We also provided the exact word count of the missing section so the AI could match its length. Here is the verbatim prompt we used (adjusted slightly depending on the model and section):

"Help me rework this manuscript for submission to a scientific journal. Based on the title of the attached file, you should be able to use the "-" delimiter to find the year and journal that this paper was originally submitted to. This is the main body of the manuscript with introduction, results, and discussion. Help me rewrite a scientific abstract. Word count should be roughly equal to the number in square brackets after the # Abstract header, plus or minus 20% creative freedom. Keep the style of the prose as similar as you can to the attached text. Save the result to a text file with the prefix 'Claude_abstract_' appended to the beginning of the filename in the papers/synthetic_Abstract subfolder"

How does the game work?

When you play, the game randomly selects a passage from our database. It then automatically gives you a 50-50 mix of showing either the genuine, original human-written text or the synthetic, AI-generated version. Your guesses are recorded, allowing us to track aggregate results and see just how easily these models can fool human scientists!