AI Chatbots Still Easily Detected Online Due to Overly Polite Tone
Artificial intelligence models attempting to mimic human conversation on social media are still readily identifiable, largely because they tend to be excessively friendly and lack the nuanced emotional range of real users, according to a study released today.
Researchers from the University of Zurich, University of Amsterdam, Duke University, and New York University tested nine open-weight models – including Llama 3.1, Mistral 7B, and Gemma 3 – across Twitter/X, Bluesky, and Reddit. Their findings, published Wednesday, demonstrate that automated classifiers can detect AI-generated replies with 70 to 80 percent accuracy. The study introduces a “computational Turing test” that uses linguistic analysis rather than subjective human judgment to pinpoint differences between machine and human writing.
The team, led by Nicolò Pagan at the University of Zurich, found that even after employing optimization strategies like providing writing examples and contextual information, AI outputs consistently displayed a distinct “affective tone and emotional expression.” “Even after calibration, LLM outputs remain clearly distinguishable from human text, particularly in affective tone and emotional expression,” the researchers wrote. Notably, the AI models also generated replies with significantly lower “toxicity scores” than typical human responses, struggling to replicate the casual negativity often found in online interactions. This is particularly relevant as concerns grow about the potential for AI-driven disinformation campaigns and the erosion of trust in online spaces. For more information on the challenges of detecting AI-generated content, see this report from the Brookings Institution.
The models tested included Llama 3.1 8B, Llama 3.1 8B Instruct, Llama 3.1 70B, Mistral 7B v0.1, Mistral 7B Instruct v0.2, Qwen 2.5 7B Instruct, Gemma 3 4B Instruct, DeepSeek-R1-Distill-Llama-8B, and Apertus-8B-2509. Researchers plan to continue refining their detection methods and exploring the evolving capabilities of large language models, as detailed in their research archive.
Researchers indicated they will continue monitoring the development of AI language models and assessing their ability to convincingly mimic human communication patterns.
The next time you encounter an unusually polite reply on social media, you might want to check twice. It could be an AI model trying (and failing) to blend in with the crowd.
On Wednesday, researchers from the University of Zurich, University of Amsterdam, Duke University, and New York University released a study revealing that AI models remain easily distinguishable from humans in social media conversations, with overly friendly emotional tone serving as the most persistent giveaway. The research, which tested nine open-weight models across Twitter/X, Bluesky, and Reddit, found that classifiers developed by the researchers detected AI-generated replies with 70 to 80 percent accuracy.
The study introduces what the authors call a “computational Turing test” to assess how closely AI models approximate human language. Instead of relying on subjective human judgment about whether text sounds authentic, the framework uses automated classifiers and linguistic analysis to identify specific features that distinguish machine-generated from human-authored content.
“Even after calibration, LLM outputs remain clearly distinguishable from human text, particularly in affective tone and emotional expression,” the researchers wrote. The team, led by Nicolò Pagan at the University of Zurich, tested various optimization strategies, from simple prompting to fine-tuning, but found that deeper emotional cues persist as reliable tells that a particular text interaction online was authored by an AI chatbot rather than a human.
The toxicity tell
In the study, researchers tested nine large language models: Llama 3.1 8B, Llama 3.1 8B Instruct, Llama 3.1 70B, Mistral 7B v0.1, Mistral 7B Instruct v0.2, Qwen 2.5 7B Instruct, Gemma 3 4B Instruct, DeepSeek-R1-Distill-Llama-8B, and Apertus-8B-2509.
When prompted to generate replies to real social media posts from actual users, the AI models struggled to match the level of casual negativity and spontaneous emotional expression common in human social media posts, with toxicity scores consistently lower than authentic human replies across all three platforms.
To counter this deficiency, the researchers attempted optimization strategies (including providing writing examples and context retrieval) that reduced structural differences like sentence length or word count, but variations in emotional tone persisted. “Our comprehensive calibration tests challenge the assumption that more sophisticated optimization necessarily yields more human-like output,” the researchers concluded.