Amazon Kills Internal AI Leaderboard After Employees Gamed It—By Pointing Agents at Pointless Tasks

Mari del Valle

1 week ago

Data center server racks AI cloud computing infrastructure

In Brief

Amazon reportedly removed an internal AI leaderboard called “Kirorank” after employees gamed it with pointless AI agent tasks.
Senior Vice President Dave Treadwell told staff “please don’t use AI just for the sake of using AI,” calling the dashboard well-intentioned but costly.
Amazon now tracks “normalized deployments” of AI code that actually provides value instead of raw consumption metrics.

Amazon is pulling an internal AI ranking system, the Financial Times reports, after employees inflated their scores through meaningless AI usage and drove up cloud costs in the process.

The so-called “Kirorank” dashboard scored employees based on their activity on Amazon’s Kiro developer platform. Some workers started pointing AI agents at pointless tasks just to climb the rankings.

“Please don’t use AI just for the sake of using AI,” Senior Vice President Dave Treadwell reportedly told staff. The dashboard was built with “good intentions,” he said, but ended up creating extra costs.

The timing is awkward. Amazon has set a target of getting more than 80 percent of its developers to use AI on a weekly basis and plans to spend around $200 billion in 2026, mostly on AI infrastructure. As the company doubles down on Anthropic and builds agentic AI into its ad business, internal metrics showing inflated usage complicate its narrative of AI efficiency.

When Gamification Backfires on AI

The Kirorank system measured developer activity through Amazon’s internal Kiro developer platform, turning LLM usage into a competitive scoreboard. The problem: when you tie compensation or recognition to token counts, you get token consumption, not valuable output.

The same pattern showed up at Meta, where employees chased similar AI usage scores. Across Big Tech’s trillion-dollar AI buildout, companies are discovering that raw adoption metrics create perverse incentives—teams deploy agents just to move numbers, not to solve problems.

Instead of raw token consumption, Amazon now tracks “normalized deployments,” meaning AI-generated code that gets deployed and actually provides value. Amazon’s internal goal of getting 80 percent of developers to use AI weekly remains unchanged, but the metrics used to track that goal are being redesigned from scratch.

FAQ

What was Amazon’s Kirorank leaderboard?

Kirorank was an internal dashboard on Amazon’s Kiro developer platform that scored employees based on their AI usage. It ranked teams and individuals by how much they deployed AI agents and consumed AI tokens.

Why did Amazon remove it?

Employees started gaming the system by assigning AI agents to meaningless tasks solely to inflate their Kirorank scores, driving up cloud costs without delivering value. Senior VP Dave Treadwell reportedly called it a well-intentioned idea that ended up costing the company more than it was worth.

What is Amazon replacing Kirorank with?

Amazon now tracks “normalized deployments,” which measure AI-generated code that gets deployed and actually provides value, instead of raw token consumption or agent activity counts.

Did other companies have similar problems?

Yes. Meta reportedly faced similar issues where employees chased AI usage scores without producing meaningful results. The pattern highlights a broader challenge across Big Tech: tying compensation or recognition to AI token counts creates perverse incentives.