
- MiMo-V2.5-Pro compiles SysY to RISC-V in 4.3 hours (110/110 tests passed on Koopa IR, 103/103 on RISC-V backend).
- It builds a complete 8,192-line video editing desktop app in 11.5 hours using only prompts—multi-track timeline, cross-fades, audio mixing, export.
- Token efficiency: scores 64% on ClawEval using ~70K tokens/trajectory—40–60% fewer than GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro.
Xiaomi’s MiMo-V2.5-Pro is now in public beta—and it’s not just another incremental LLM release. The model achieves frontier-tier coding intelligence while spending dramatically fewer tokens than its competitors, according to Xiaomi’s internal benchmarks: MiMo-V2.5-Pro hits ClawEval Pass@3 of 64% using just ~70K tokens per trajectory, roughly 40–60% fewer tokens than Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro at comparable performance levels per the official MiMo blog.
But the real story is what MiMo-V2.5-Pro can do when left to run autonomously across long horizons. Researchers gave it Peking University’s SysY compiler course project—a full Rust compiler from lexer to RISC-V backend that normally takes CS students weeks. MiMo finished it in 4.3 hours of continuous autonomous work, perfecting Koopa IR code generation (110/110 test cases) and the RISC-V backend (103/103), then optimizing performance.
From Compilers to Video Apps: Agentic Capabilities Scale
MiMo’s agentic leap becomes clearer with more open-ended tasks. When prompted to build a functional video editor, the model produced a complete desktop application with 8,192 lines of code across 1,868 tool calls over 11.5 hours. The app features multi-track timeline editing, clip trimming, cross-fades, audio mixing, and an export pipeline—all driven by MiMo-V2-TTS for an AI voice-over demo.
In analog circuit EDA, MiMo-V2.5-Pro designed an FVF-LDO regulator from scratch in TSMC 180nm CMOS, iterating in an ngspice simulation loop with Claude Code as the harness. Within about an hour of closed-loop operation—calling the simulator, reading waveforms, tweaking parameters—it converged all six metrics (phase margin, load regulation, line regulation, PSRR, quiescent current, dropout voltage) within specification simultaneously, a task that typically requires graduate-level EDA expertise.
Token Efficiency + Open Source = Competitive Pressure
These results matter because Xiaomi isn’t claiming super-agentic capabilities in a vacuum—it’s pairing them with a Token Plan that delivers significantly lower compute costs per trajectory. The combination of frontier performance and token efficiency positions MiMo as a credible alternative to Claude Code and OpenAI’s o3 for autonomous software engineering workloads.
Critically, MiMo-V2.5-Pro will be open-sourced. According to Ars Technica, the release under an open-source license could shift the balance for developers who want frontier-tier agentic capabilities without proprietary lock-in—especially those building autonomous coding agents and long-running software engineering harnesses.
MiMo-V2.5-Pro runs today across Xiaomi’s API Platform and AI Studio under the mimo-v2.5-pro model tag with no pricing change. The public beta is live; the open-source release is coming soon. For developers building agentic coding systems, the new benchmark isn’t just raw scores—it’s how many tokens it takes to reach a working solution.
Xiaomi just demonstrated that frontier-grade autonomous coding can be both competent and cost-efficient. The open-sourced weights will be the next test of whether those claims hold up outside the company’s own harness.
