Real-time voice translation that keeps the conversation going.
Voxlingo translates voice to voice in real time across 40+ languages. Built for European enterprises that operate across borders, benchmarked #2 globally on French-centric translation pairs, and deployable wherever your conversations need to stay.
The gap that compounds quietly
Your customers don't all speak your agents' language
A French insurer handles claims in Polish, Romanian, and Portuguese. A German contact center routes English, Italian, and Turkish calls every day. A Belgian utility supports French, Dutch, and Arabic. Multilingual hiring is expensive. Interpreter services run €0.80–€2.00 per minute. Call abandonment climbs when caller and agent don't share a language.
Your translation API is generic. Your conversations aren't
Most translation APIs were trained on English-paired data and tuned for English-pivot translation. French → English → Polish works passably. French → Polish directly, with a real-time conversation's vocabulary, accent, and pace, doesn't. Generic models translate adequately. They don't translate exceptionally on the pairs your users actually speak.
Voice-to-voice is hard. Most vendors don't ship it well
Translation is one part of the problem. ASR is another. TTS is a third. Real-time streaming, with sub-second perceived latency, is a fourth. Stitching four AI systems into a single conversational experience that doesn't feel like a phone tree is what separates Voxlingo from a product that promises "live translation" but breaks the moment your caller stops speaking textbook sentences.
Speech to speech, in one streaming pipeline
Capture
The caller's voice is captured in real time over SIP, WebRTC, or the Voxlingo SDK. Voxist's ASR identifies the language in under 100ms and begins transcribing under 200ms, on a streaming output that surfaces words as they're recognized, not at the end of an utterance.
Translate
The streaming transcript flows into VoxTranslate, our in-house translation engine, ranked COMET #2 globally across 20 French-centric EU language pairs in independent benchmarks. The engine handles disfluencies, accents, technical terminology, and code-switching — the things real conversations actually do.
Speak
A natural neural TTS voice speaks the translation in the target language, with prosody and pacing that match the source speaker. Voice preservation (translating in the speaker's own voice) is on roadmap for late 2026. End-to-end perceived latency: under one second.
Deploy
Voxlingo runs in three configurations: as a SaaS API for developers, as a managed deployment inside a Voxlive contact center, or as a fully on-premise stack including the translation models. Cloud, sovereign, or air-gapped — your choice, your data, your perimeter.
Built to do the hard things well
Real-time voice-to-voice translation
streaming ASR, MT, and TTS, integrated into a single pipeline with sub-second perceived latency.
40+ languages, 1600+ pairs
production-grade depth on European languages, growing coverage on Asian and African languages. Each pair is benchmarked and published on the Voxist leaderboard.
COMET #2 globally on French-centric pairs
beating DeepL in 17/20 pairs, beating GPT-4o in 18/20 pairs, sitting 0.0025 COMET points behind Google overall. Independent benchmark, public methodology.
Conversation-mode features
disfluency handling, code-switching detection, technical-domain vocabularies (legal, medical, finance, technical), context preservation across turns.
Live caption mode
when audio output isn't appropriate (meetings, events, broadcast), render the translation as a synchronized live transcript.
Voice preservation roadmap
by late 2026, Voxlingo will translate in the original speaker's voice using Voxist's TTS voice-cloning research. Today, a natural neutral voice is used in the target language.
Deployable on-premise
including the translation models. One of the very few real-time speech translation products that doesn't require a cloud round-trip.
COMET #2 globally on French-centric EU pairs
In independent COMET benchmarking across 20 French-centric EU language pairs, Voxlingo's translation engine ranks #2 globally — ahead of DeepL, GPT-4o, Claude, and EuroLLM variants. Voxlingo beats DeepL in 17 of 20 pairs and GPT-4o in 18 of 20 pairs, sitting only 0.0025 COMET points behind the global #1.
| Pair | Voxlingo | DeepL | Margin |
|---|---|---|---|
| French → German | 4th | 7th | +0.0038 |
| German → French | 4th | 7th | +0.0038 |
| French → Polish | 3rd | 7th | +0.0036 |
| French → Spanish | 3rd | 8th | +0.0036 |
| French → Hungarian | 4th | 7th | +0.0040 |
Four things, every time
Sub-second perceived latency, end to end
Voxlingo's streaming pipeline — ASR, MT, TTS — runs under one second of perceived latency end-to-end, on real conversations with real accents and real disfluencies. The pipeline is Voxist all the way through: no third-party round-trips, no API hops, no quality cliff when a sentence trails off.
Specialized, not generic
Voxlingo is built for European languages, with French at the core. Most translation APIs were trained on English-paired data and translate via English pivot. Voxlingo is trained on direct pairs — French ↔ German, French ↔ Polish, French ↔ Hungarian — which is why the COMET delta over DeepL is most visible on those pairs.
Sovereign by default
Voxlingo is one of the very few real-time voice translation products with a fully on-premise deployment option, including the translation models. Cloud, sovereign (OVHcloud, Scaleway), or air-gapped. GDPR-native. EU AI Act-ready. For regulated industries — healthcare, defense, government, finance — this is the combination that doesn't exist anywhere else.
Outcomes you can measure
Voxlingo deployments inside Voxlive contact centers report sub-second perceived latency, parity with human-interpreter intelligibility, and a 60–80% reduction in interpreter spend within the first quarter. Every claim is from a real customer or a public benchmark.
A short, honest comparison
| Voxlingo | DeepL Voice | KUDO | Wordly | |
|---|---|---|---|---|
| Real-time voice-to-voice | ✅ (with human) | |||
| COMET #2 globally on French-centric pairs | Not benchmarked | N/A | N/A | |
| On-premise deployment | — | — | — | |
| Built in Europe | — | |||
| 40+ languages, growing | 40+ | 200+ (human) | 60+ | |
| Voice preservation (roadmap) | Late 2026 | Late 2026 | Human only | — |
One platform, six products, one flywheel
Voxlingo integrates as a native capability inside the Voxlive contact center, running in the agent's earpiece or as a fully translated agent-customer channel.
capture an expert interview in French; query the resulting knowledge graph in Polish or Arabic. Voxlingo handles the cross-language retrieval inside Voxcept.
record a meeting in mixed-language environments; Voxlingo translates the transcript on demand, in any of the supported languages.
Voxlingo is exposed as a developer-grade translation API at voxist.com/api. Same auth, same SDKs, transparent EUR pricing.
Questions, answered
How accurate is Voxlingo on European languages?
What's the end-to-end latency?
Can Voxlingo run on-premise?
Will the translated voice sound like the original speaker?
Which languages does Voxlingo support?
How does Voxlingo compare to DeepL Voice?
How does Voxlingo compare to KUDO or Wordly?
Does Voxlingo work with our SIP / WebRTC / call platform?
Can Voxlingo handle accents, code-switching, and technical vocabulary?
Is the voxlingo.com mobile app the same product?
Can Voxlingo translate sign language?
Run your multilingual operations on European AI.
English & French · EU-hosted · no audio used for model training