Claude Significantly Outperforms Rivals in Detecting Antisemitism, ADL Study Shows

Mari del Valle

4 months ago

Anthropic’s Claude substantially outperformed competing artificial intelligence models in identifying and countering antisemitic content, according to a comprehensive assessment released by the Anti-Defamation League.

The ADL’s inaugural AI Index ranked six leading language models on their ability to detect hateful narratives, with Claude earning a score of 80 out of 100—far exceeding OpenAI’s ChatGPT at 57 points. DeepSeek followed with 50, Google’s Gemini scored 49, Meta’s Llama placed at 31, and Elon Musk’s Grok lagged significantly at 21.

The disparity underscores a critical gap in AI safety as these systems become increasingly integrated into mainstream platforms. The test encompassed more than 25,000 interactions conducted between August and October 2025, examining how each model responded to antisemitic conspiracies, anti-Zionist tropes, and extremist narratives across 37 distinct subcategories.

“Claude surpassed all other LLMs in the assessment and demonstrated an exceptional ability to detect and respond to anti-Zionist and anti-Jewish narratives across a variety of prompt types,” the ADL said in its findings.

All Models Failed — Just Some Less Than Others

The research reveals that despite Claude’s relative strength, all six models exhibited concerning vulnerabilities. Each system struggled to consistently identify anti-Jewish bias, refute false allegations against Zionists, and counter extremist rhetoric—suggesting the challenge of embedding values into AI systems remains formidable across the industry.

The testing methodology involved probing models with carefully crafted prompts designed to elicit problematic responses. Researchers evaluated not just whether models recognized harmful content but how effectively they responded with factually accurate counter-narratives. The results suggest that generic safety training produces uneven outcomes when applied to nuanced forms of hatred and conspiracy.

Claude’s superior performance may reflect Anthropic’s focus on constitutional AI principles, which embed specific values into model training. The approach appears more effective at navigating the complexities of antisemitic rhetoric than other methods employed by competitors.

Still, Claude’s score of 80 indicates room for continued refinement. The ADL noted that anti-Zionist narratives and extremist content posed particular challenges for all models tested, suggesting these remain frontier problems in AI safety.

Safety as a Competitive Advantage

The findings carry implications for platforms deploying these systems in content moderation and community safety contexts. As generative AI increasingly mediates online discourse, the ability to recognize and counter hate speech becomes a product differentiator—and a public trust issue.

Industry observers noted the timing of the report. Concerns about AI-generated propaganda have intensified as these tools become more sophisticated and accessible. The ADL’s index provides measurable benchmarks that could incentivize developers to prioritize safety improvements targeting antisemitic and extremist content.

For Anthropic, the results offer validation of its approach to AI safety at a moment when the company competes intensely with larger rivals. OpenAI’s ChatGPT remains the most widely used consumer AI chatbot, yet Claude’s demonstrated strength in this critical domain may resonate with institutions and individuals prioritizing responsible AI deployment.

The ADL indicated plans to expand the index periodically as models evolve and new systems emerge. The organization views the benchmark as part of ongoing efforts to address how artificial intelligence can either amplify or mitigate societal harms.