Kimi K2.6 Hits GPT-5.4 and Claude Opus 4.6 Benchmark Scores—With 300-Agent Swarms

Hermes Ladiz

2 weeks ago

Network visualization of 300 AI agents as glowing interconnected nodes working in parallel, abstract swarm representation, dim blue and purple lighting, documentary tech photo

Moonshot AI’s Kimi K2.6 is an open-weight model that benchmarks at GPT-5.4 and Claude Opus 4.6 levels on coding and agent tasks.
The model can run up to 300 specialized agents in parallel via Agent Swarm, chaining 4,000+ tool calls per run.
K2.6 ships under a modified MIT license—commercial users over $20M revenue or 100M MAU must visibly credit “Kimi K2.6.”

Moonshot AI has released Kimi K2.6 as a fully open-weight model, and the numbers are surprisingly competitive. According to The Decoder, K2.6 scores 54.0 on HLE with Tools, 58.6 on SWE-Bench Pro, and 83.2 on BrowseComp—putting it squarely in the same performance class as GPT-5.4 and Claude Opus 4.6. Gemini 3.1 Pro is also in that range, though K2.6’s real differentiator is scale of agentic execution.

The headline feature is Agent Swarm, which can run up to 300 sub-agents simultaneously, each taking up to 4,000 steps. The system automatically decomposes tasks and delegates to specialized agents for web research, document analysis, and writing, with runs meant to produce finished outputs like websites, slide decks, and spreadsheets.

Moonshot also introduced “claw groups,” where multiple humans and agents collaborate as a team—K2.6 coordinates, distributes tasks based on strengths, and intervenes when an agent stalls. That’s built-in orchestration, typically a separate engineering layer in other agent frameworks.

Open Weights with an Attribution Clause

The model is available on kimi.com in chat and agent mode, through Kimi Code, via API, and as a direct download from Hugging Face (where Qwen 3.6 recently pushed the open-weight frontier). K2.6 is natively multimodal and can spin up full websites with database connections directly from prompts, handling basic full-stack tasks including user sign-ups and session management.

According to Moonshot’s official blog, K2.6 handles full-stack work beyond just front-end generation. The model uses 1 trillion parameters with attention optimizations to keep inference costs manageable, per SiliconANGLE.

The licensing is the twist: a modified MIT license that’s nearly free—except that commercial deployments exceeding 100 million monthly active users or $20 million in monthly revenue must visibly credit “Kimi K2.6” in the UI. It’s a middle ground between full permissiveness and copyleft, designed to preserve brand attribution at scale.

The Open-Weight Gap Is Closing

K2.6 puts an open-weight model within striking distance of frontier proprietary scores on coding benchmarks. The 58.6 SWE-Bench Pro result in particular suggests that open models no longer lag by orders of magnitude—they’re now within a few points.

Where open-weight models traditionally required distillation from larger closed models or suffered from inference inefficiency, K2.6 appears to have closed that gap through architectural optimizations and scaling. The question isn’t whether open models can match scores anymore—it’s whether they can deliver the same reliability and tooling ecosystem at commercial scale.

The model is out now. Teams wanting self-hosted agentic AI without vendor lock-in now have a credible option.