- Tencent released Hy3, a 295B-parameter MoE model with only 21B active parameters at once, making it cheaper to run than dense models.
- Hy3 outperformed DeepSeek-V3 on math (76.28% on MATH, 95.37% on GSM8K) and coding benchmarks.
- The model is fully open-source on HuggingFace, ModelScope, and GitCode, with vLLM and SGLang deployment support.
Tencent dropped Hy3 preview yesterday—a 295-billion-parameter AI model that only activates 21 billion of them at any given time. It’s a Mixture-of-Experts architecture, which means the model has 192 different “expert” sub-networks but only consults the top 8 for each token. announced the company on HuggingFace—think of it as a hospital where 192 specialists are on call, but the AI triage nurse only pages the eight most relevant ones for each patient’s symptom.
This approach matters because compute costs scale with active parameters, not total size. Hy3 has nearly half the total parameters of DeepSeek-V3’s 671B, but DeepSeek activates 37B at once—nearly double Hy3’s 21B active footprint. On the MATH benchmark, Hy3 scored 76.28% versus DeepSeek’s 59.37%. On GSM8K, it hit 95.37% against 88.15%. These aren’t marginal gains—they’re the kind of jumps that typically require doubling model size, not cutting active compute by nearly half.
The timing places Hy3 squarely in China’s open-source AI arms race. Alibaba released Qwen3.6 last week with a similar MoE architecture—35B total, 3B active. Google’s Gemma 4 dropped earlier this month. The pattern is clear: Chinese labs are shipping MoE models faster than their American counterparts, and the efficiency gains are stacking up. Hy3 shows Tencent isn’t just keeping pace—it’s trying to leapfrog.
Why Hy3’s Mixture-of-Experts Architecture Changes the Math
The technical details matter here. Hy3 uses a Multi-Token Prediction layer with 3.8B additional parameters, a 256K context window, and supports BF16 precision. It’s trained on a rebuilt infrastructure that Tencent claims produced better reasoning, coding, and agent capabilities than their previous models. On LiveCodeBench-v6—a notoriously hard coding benchmark—Hy3 scored 34.86% versus DeepSeek’s 29.31% and Alibaba’s 27.43%.
The agent scores are arguably more interesting than the academic benchmarks. Hy3 posted competitive results on SWE-bench Verified (a test of whether AI can fix real GitHub issues), BrowseComp (web navigation), and WideSearch (information retrieval). Tencent built internal evaluation sets called Hy-Backend and Hy-SWE Max to test the model in actual development pipelines—real IDE workflows, real backend tasks, real user debugging sessions.
That focus on practical deployment shows in the release. Tencent shipped vLLM and SGLang integration from day one, complete with OpenAI-compatible API endpoints. The weights are available on HuggingFace, ModelScope, and GitCode. There’s no waiting list, no application process, no enterprise sales funnel. The company wants developers running inference within hours, not weeks.
Open Source AI’s Geographic Shift
The licensing is worth noting. Tencent released Hy3 under a custom “Tencent Hy Community” license—not Apache 2.0, not MIT, but something company-specific. It allows commercial use and modification, though the exact restrictions will matter for anyone building a product on top of it. The license file is on GitHub alongside the model cards, datasets, and training configurations.
What Hy3 represents more broadly is the geographic redistribution of open-source AI leadership. Three years ago, the open-weight leaderboard was dominated by American labs—Meta’s Llama series, Mistral’s models, Stability’s diffusion variants. Today, the models grabbing headlines ship from Hangzhou, Shenzhen, and Beijing. The technical quality gap has compressed to the point where benchmark differences are measured in single-digit percentages, not chasms.
Tencent’s AI group has been quieter than Alibaba’s PR machine and DeepSeek’s viral moments. Hy3 suggests they’ve been building in the background. The Qiuzhen College Math PhD qualifying exam result mentioned in the announcement—where Hy3 performed well on Tsinghua’s graduate-level math test—carries symbolic weight. It’s the Chinese domestic equivalent of acing the Putnam or scoring well on the IMO. These aren’t standardized tests designed to be gamed by AI; they’re filtration mechanisms for identifying the country’s most mathematically promising students.
Tencent hasn’t disclosed training compute costs, data sources, or the architecture’s energy footprint. Those omissions are standard—almost no lab publishes full training budgets anymore. But the inference efficiency claims are verifiable: anyone with a few A100s can load the weights and run their own benchmarks. The model requires substantially less VRAM per forward pass than DeepSeek-V3 or AI21’s Jamba at equivalent quality levels.
Whether Hy3 becomes the default choice for Chinese-language AI workloads remains to be seen. The open weights are available now. Tencent is taking questions on GitHub and their HuggingFace discussions page.
