DeepSeek V4 Debuts at $3.48/M Tokens—12 Hours After GPT-5.5 and at 1/20th the Cost of Claude

DeepSeek dropped V4 Pro and V4 Flash just 12 hours after OpenAI’s GPT-5.5 announcement.
V4 Pro costs $3.48 per million output tokens—roughly 1/20th of Claude Opus 4.7 pricing.
The release marks China’s formal re-entry into the open-source model wars amid tightening US chip restrictions.

DeepSeek didn’t waste any time. Less than half a day after Sam Altman unveiled GPT-5.5 in a surprise livestream, the Hangzhou-based lab dropped its most ambitious release yet: DeepSeek-V4 Pro and V4 Flash. The models arrived open-sourced, fully documented, and priced at levels that make American competitors look like luxury goods.

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length.

🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models.
🔹 DeepSeek-V4-Flash: 284B total / 13B active params.… pic.twitter.com/n1AgwMIymu

— DeepSeek (@deepseek_ai) April 24, 2026

The timing was theatrical. DeepSeek’s official X account began posting technical threads at 11 PM Eastern—hours after OpenAI had finished its San Francisco press event. V4 Pro, a 1.6 trillion parameter Mixture-of-Experts model with 49 billion active parameters, now sits at the top of the open-weight benchmarks. V4 Flash, its leaner 284 billion parameter sibling, ships with 13 billion active and costs pennies on the dollar.

“Creativity loves constraints,” wrote UW researcher Yuchen Jin on X. He was referring to how Chinese labs keep training world-class models despite severe hardware limitations. DeepSeek’s own announcement noted something telling: “Due to constraints in high-end compute capacity, the current service capacity for Pro is very limited.” Their solution? Wait for 950 Huawei “supernodes” coming online later this year. If that happens at scale, DeepSeek expects Pro pricing to drop “significantly.”

The model is also available for testing at Deepseek’s official chat UI. The interface is similar to what everybody is already used to if they have ever interacted with ChatGPT, Gemini, Claude, Minimax, Xiaomi’s AI Studio, Z.AI, Mistral’s LeChat and all the other chat interfaces around.

Why DeepSeek V4 Changes the Pricing Game

The numbers border on insulting. V4 Flash costs $0.14 per million input tokens and $0.28 per million output—roughly one-fifth the price of Gemini 3.1 Flash Lite. For perspective, Saoud Rizwan, CEO of coding assistant Cline, calculated that if Uber had used DeepSeek V4 instead of Claude Opus 4.7 for its 2026 AI budget, the company would have stretched its spending from four months to seven years. The comparison is brutal: where OpenAI’s GPT-5.5 costs $30 per million tokens, DeepSeek’s Pro model runs at $3.48.

The architecture explains part of this efficiency. DeepSeek V4 introduces what it calls “Token-wise compression + DSA (DeepSeek Sparse Attention),” a novel attention mechanism that reduces compute and memory costs while maintaining a 1 million token context window. That’s eight times the 128K window of V3.2, their previous flagship. In real terms, V4 can ingest entire codebases, multi-hour transcripts, or complex agent workflows in a single pass.

The agentic capabilities merit particular attention. DeepSeek V4 was benchmarked against real-world coding tasks on GDPval-AA, an evaluation designed to measure how models perform on practical engineering workflows rather than canned academic puzzles. V4 Pro Reasoning (Max) scored 1554 Elo points, clearing GLM-5.1 (1535), MiniMax-M2.7 (1514), and Kimi K2.6 (1484). V4 Flash Reasoning (Max) hit 1388—well ahead of DeepSeek’s own V3.2 despite being drastically smaller. The gap between V3.2 (1203) and V4 Pro represents a 355-point Elo uplift. That’s not incremental improvement. That’s a category jump.

Artificial Analysis, which evaluated the models, noted a quirk in the data: V4 Flash actually scored higher at “High Effort” settings (1414) than at “Max Effort” (1388), suggesting the model has untapped efficiency headroom. The lab also confirmed V4 Pro as the largest open-weights model released to date, surpassing Kimi K2.6 in both total and active parameter counts. The model ships mostly in FP4 precision, putting total size at roughly 865GB—comparable to Kimi’s ~500GB but with nearly double the parameter count.

The Global Context: Sanctions, Symbiosis, and a Strange New Arms Race

DeepSeek’s release lands amid a tightening web of US restrictions. The Trump administration has been pushing aggressive GPU export controls through 2025 and 2026, attempting to starve Chinese AI labs of the high-end silicon required for frontier model training. The Biden administration previously started this trajectory. The result? Chinese researchers have become extraordinarily efficient at squeezing performance from older, sanctioned hardware—and increasingly, from Huawei’s domestic Ascend chips.

The irony runs rich in both directions. American companies have quietly adopted Chinese techniques. Cursor, the AI coding tool that reportedly rejected a $60 billion acquisition offer from SpaceX, is widely understood to run on a Kimi fine-tune. Trump’s own science advisor warned last week that China is engaged in “massive-scale” distillation of American AI models—effectively using US outputs to train domestic systems. Yet the dependency flows both ways: OpenAI and Anthropic engineers study Chinese MoE architectures, attention optimizations, and training recipes published openly on ArXiv and HuggingFace.

Xiaomi just released MiMo 2.5 Pro. zAI and Minimax have dropped SOTA models in recent weeks. The pace suggests a fundamental shift: Chinese labs have become pace-setters, not followers. DeepSeek’s V4 announcement specifically flagged integrations with Claude Code, OpenClaw, and OpenCode—agents built by American and European developers now running on Chinese foundation models.

While all this unfolded, OpenAI was launching GPT-5.5, its most capable model yet, positioned as a unified “super-app” for chat, reasoning, agents, and multimodal work. The pricing—$30 per million tokens—wasn’t an accident. OpenAI is betting that performance premium can sustain its infrastructure costs. DeepSeek’s counter-argument? That same $30 buys you roughly 107 million tokens of V4 Flash output. The market share question becomes existential: if quality gaps narrow while cost gaps widen, what exactly are American customers paying for?

For enterprises, the implications are immediate. A solopreneur running agents on V4 Flash could process roughly 3.5 million output tokens for under a dollar. Large corporations evaluating proprietary API dependencies face a genuine strategic question: is the vendor lock-in worth 10x or 20x pricing premiums when open alternatives achieve parity? Big Tech’s answer so far has been governance, security, and support guarantees—intangible advantages that sound thinner when the cost differential approaches an order of magnitude.

DeepSeek also announced its older models will be fully retired after July 24th, 2026. DeepSeek-chat and deepseek-reasoner—previously routing to V3 variants—will become inaccessible. The company wants the ecosystem consolidated on V4’s architecture, which supports both thinking and non-thinking modes alongside that 1M context window.

For everyday users, the practical difference is modeled availability. Where GPT-5.5 and Claude Opus remain gated behind subscriptions or enterprise contracts, V4 Flash runs on DeepSeek’s API today, at prices that don’t require CFO sign-off. The capability gap between the cheapest tier and the bleeding edge has never been narrower. The geopolitical gap, meanwhile, has never been more consequential. 950 Huawei supernodes are scheduled for the second half of 2026.