Getting numbers and emails right is one of those deceptively simple details that separates demo-ready models from systems that actually work in production.

In real customer calls, numbers and emails are everywhere: phone verification, account IDs, appointment reminders, invoices, claims, and support tickets.
When a voice model misreads a digit, skips a dash, or collapses an email into noise, the conversation breaks. Customers lose trust. Agents waste time. And in industries like healthcare, insurance, and logistics, one wrong character can derail an entire workflow.
That’s why email and number pronunciation isn’t a minor detail, it’s a core reliability benchmark.
We tested leading voice models using scripts designed to expose common edge cases in email, number, and alphanumeric pronunciation.
Here’s the sample script, along with audio examples that highlight how each model handles email pronunciation:
Send documents to finance@euro-payments.eu.
Please reply to support_v2@client-help.eu.
Please email support-team@berlin-cloud.de.
The invoice was issued by billing@saas-europe.io.
All approvals go through ops@global-services.eu.
Minimax & Deepgram - Fully accurate
Both models preserved dots, dashes, and letter-by-letter spelling with 100% accuracy.
ElevenLabs - Mostly correct
Strong overall quality, but often dropped dashes or confused them with underscores when reading multiple emails in sequence. When it works, it sounds great. When it doesn’t, clarity drops quickly.
Cartesia - Misses dashes
Consistently skipped dashes, flattening email structure. Underscores and dots were usually correct, but reliability was still limited.
Speechmatics - Worst performer
Omitted both dashes and underscores, making emails difficult to understand.
The rankings shift when it comes to number pronunciation. In our test script, we evaluated how each model performs with pure numbers as well as mixed alphanumeric strings. Below are the sample script and audio samples:
You can contact Milan support at +39 02 7719 3041.
Your reference number is REF-8821932 for this case.
Please quote REF-390112 when emailing support.
Deepgram — Best overall
Perfect accuracy on numbers and mixed strings. The only minor issue was sometimes reading the “+” sign aloud unnecessarily.
Minimax & Cartesia — Strong second
Occasionally skipped dashes between letters and numbers, but otherwise very solid.
ElevenLabs — Mostly great
Strong performance, but sometimes read letter groups as words (e.g., “REF” instead of “R-E-F”).
Speechmatics — Worst performer
Read phone numbers as single values and merged “REF” with the number.
Currency pronunciation is another edge case. In the tested script, only ElevenLabs correctly interpreted “CAD” as “Canadian,” whereas other models mispronounced it as “cat” or spelled it out as "C-A-D".
Sample script:
This service costs $97.50 CAD monthly.
The refund issued was $412.80 CAD.
Deepgram
Minimax
Cartesia
ElevenLabs
Speechmatics
Voice AI rarely fails in obvious ways. It fails through small, repeated inaccuracies that quietly erode trust. Getting numbers and emails right is one of those deceptively simple details that separates demo-ready models from systems that actually work in production.
As voice AI matures, accuracy on these fundamentals will become one of the most important benchmarks for evaluation.
Join leading enterprises using AveraLabs to deliver human-level service at AI speed
© 2025 AveraLabs