• Andrej Karpathy coined ‘jagged intelligence’ to describe AI that passes bar exams but stumbles on basic arithmetic—a pattern now shaping the AGI debate.
  • Google DeepMind CEO Demis Hassabis says current AI’s inconsistency is proof that artificial general intelligence is still years away.
  • Salesforce claims it improved enterprise AI task success from 19% to 88% by training models in simulated environments to smooth the ‘jaggedness.’

AI can write legal briefs, debug code, and ace graduate-level exams. It also can’t reliably count the number of r’s in “strawberry.” That gap between brilliance and basic incompetence has a name now—and it’s reshaping how the industry talks about where this technology actually stands.

Andrej Karpathy, the former Tesla AI director and one of the most influential voices in machine learning, coined the term ‘jagged intelligence’ to describe the phenomenon: AI models that perform at superhuman levels on complex tasks while failing at ones a child could handle. The phrase caught fire after the New York Times published a deep dive into what it means for the industry’s trajectory.

The concept isn’t new in practice—anyone who’s used ChatGPT has watched it hallucinate a fact after writing a flawless essay. But giving it a name has crystallized a debate that was previously scattered across Twitter threads and academic papers. The question is no longer whether AI is smart. It’s whether AI’s intelligence is smooth enough to trust.

Why ‘Jagged Intelligence’ Has the Industry Nervous

Google DeepMind CEO Demis Hassabis told reporters in February that current AI’s inconsistency is the clearest evidence that artificial general intelligence remains years away. “We don’t have systems that are reliable across the board,” Hassabis said, arguing that true AGI would need smooth, consistent performance—not jagged peaks and valleys.

Yoshua Bengio, the Turing Award-winning researcher who literally helped build the foundations of modern AI, went further. He warned that jagged intelligence isn’t just an academic curiosity—it’s a societal risk. When AI systems are deployed in healthcare, finance, or criminal justice, a model that’s 99% accurate and 1% catastrophically wrong isn’t safe. It’s a liability with plausible deniability.

The problem is structural. Large language models learn patterns from training data, not rules. They can pattern-match their way through a law school exam because they’ve absorbed millions of legal documents. But ask them to solve a novel logic puzzle that requires step-by-step reasoning, and the pattern-matching falls apart. The intelligence isn’t wrong—it’s uneven in ways that are hard to predict in advance.

This connects to a problem Frontierbeat has covered before: over 100 AI-generated hallucinations were found in papers accepted to NeurIPS, one of the field’s top conferences. The researchers who submitted them didn’t notice. The reviewers who accepted them didn’t either. Jagged intelligence doesn’t just mean the AI makes mistakes—it means humans can’t always tell when it has.

Salesforce’s Bet: Train the Jaggedness Away

Not everyone thinks jagged intelligence is a permanent condition. Salesforce has been quietly building what it calls “eVerse”—simulated environments that act as digital twins of real-world business scenarios. The idea is to train AI agents in these sandboxes before deployment, exposing them to edge cases that reveal weaknesses.

Silvio Savarese, Salesforce’s chief scientist, described pure LLM performance as “fundamentally inconsistent or jagged” and argued that learning from experience—in simulation—is the way forward. Early results are striking: Salesforce reported improving enterprise task success rates from 19% to 88% by training agents in synthetic environments that replicated messy real-world conditions like thick accents, poor connections, and distracted callers.

The approach mirrors how autonomous vehicle companies test self-driving software—running billions of simulated miles before putting cars on real roads. The difference is that Salesforce is applying it to customer service calls and insurance billing. It’s less glamorous than robotaxis, but the market is arguably larger.

There’s also a more speculative development: an Analytics India Magazine piece asked whether OpenAI’s latest model improvements have quietly addressed jagged intelligence, pointing to better performance on tasks that previously stumped the model. OpenAI hasn’t publicly claimed victory on the problem, but the benchmarks are moving.

The Bigger Question: Is Smooth Intelligence Even Possible?

The deeper implication of jagged intelligence is philosophical. If current AI architectures are fundamentally pattern-matchers—if they learn correlations rather than rules—then maybe smooth, reliable intelligence across all domains isn’t achievable with transformers alone. Maybe the jaggedness isn’t a bug to be fixed. Maybe it’s a feature of the approach.

Karpathy himself has suggested that the path to more consistent AI might require hybrid systems: neural networks combined with symbolic reasoning, or architectures that can explicitly verify their own outputs. The industry’s current bet—scaling up transformer models with more data and compute—has produced the jagged peaks. Whether it can fill the valleys remains an open question.

For now, the practical takeaway is simpler. Every enterprise deploying AI in 2026 is gambling that their use case falls on one of the peaks. The valleys are someone else’s problem—until they aren’t. Salesforce’s enterprise task improvement from 19% to 88% suggests the gap can be narrowed. Whether it can be closed entirely is the $100 billion question the industry hasn’t answered yet.

Leave your vote