Hume AI Open-Sources TADA, a Speech Model That Runs 5x Faster and Can’t Hallucinate Words

Hume AI's TADA speech model uses a one-to-one text-to-audio alignment that makes hallucinations structurally impossible — and runs five times faster than rivals on hardware small enough to fit in a phone.

Hume AI has open-sourced TADA, a text-to-speech model built on a fundamentally different architecture than existing systems — one that produces zero transcription hallucinations, runs more than five times faster than comparable LLM-based alternatives, and is light enough to run on a smartphone without cloud infrastructure.

The key to all three properties is the same design decision. Every existing LLM-based speech synthesis system faces a mismatch problem: a second of audio requires roughly 12 to 75 audio frames, while the corresponding text might occupy only 2 to 3 tokens.

Systems have to manage long audio sequences against much shorter text sequences, which means longer context windows, more memory consumption, more opportunities for the model to lose track of what it’s supposed to say — and occasional hallucinations where words get skipped or invented.

TADA (Text-Acoustic Dual Alignment) eliminates the mismatch by encoding audio into continuous acoustic vectors that match text one-to-one, token for token. Every autoregressive step produces exactly one text token and one audio frame, moving in lockstep.

700 Seconds of Audio in a Standard Context Window, Zero Hallucinations

The practical consequences of that design choice are measurable. TADA generates audio at a real-time factor of 0.09, operating at 2 to 3 frames per second versus the 12 to 75 frames per second of conventional approaches. A standard 2,048-token context window accommodates about 70 seconds of audio in conventional systems; TADA stretches that to roughly 700 seconds — making long-form narration and extended conversations viable without chunking or context resets.

In tests across more than 1,000 samples from LibriTTS-R, the model produced zero content hallucinations. In human evaluations on expressive, long-form speech, TADA scored 4.18 out of 5 for speaker similarity and 3.78 for naturalness, placing second overall — ahead of several systems trained on significantly more data.

The release includes two model sizes: a 1 billion-parameter English-only model and a 3 billion-parameter multilingual version covering seven additional languages, both based on Llama and both published on GitHub and Hugging Face under the MIT license.

Open-Source and MIT-Licensed, but Not Yet Ready to Replace ElevenLabs

The smaller model is designed for on-device deployment, which matters for applications where sending audio to cloud APIs creates latency, cost, or privacy exposure. Hume has announced plans to expand to a 7 billion-parameter version and add domain-specific fine-tuning datasets for verticals like customer support.

The voice AI market is currently dominated by proprietary commercial systems, ElevenLabs charges per character, and OpenAI’s voice capabilities are gated to its API and product tiers. TADA does not automatically displace those offerings; a naturalness score of 3.78 out of 5 means there is still a quality gap, and the model can drift on very long texts.

But the architecture’s elimination of hallucinations by construction—not by post-processing or filtering, but because the one-to-one mapping makes skipping or inserting words structurally impossible—addresses a failure mode that makes existing systems unsuitable for regulated applications like medical transcription or legal dictation.

Open-sourcing the full stack under MIT means any developer can fine-tune it for those contexts without Hume’s involvement.

Leave your vote

You may also like...

Log In

Forgot password?

Forgot password?

Enter your account data and we will send you a link to reset your password.

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Here you'll find all collections you've created before.

Discover more from Frontierbeat

Subscribe now to keep reading and get access to the full archive.

Continue reading