Real-Time Voice Translation

Real-time voice translation that keeps the conversation going.

Voxlingo translates voice to voice in real time across 40+ languages. Built for European enterprises that operate across borders, benchmarked #2 globally on French-centric translation pairs, and deployable wherever your conversations need to stay.

Voice TranslationLive
FrançaisPouvez-vous confirmer le délai de livraison ?
文 ⇄ A · 0.8s
DeutschKönnen Sie den Liefertermin bestätigen?
voice → voiceCOMET #2
The problem

The gap that compounds quietly

The pain

Your customers don't all speak your agents' language

A French insurer handles claims in Polish, Romanian, and Portuguese. A German contact center routes English, Italian, and Turkish calls every day. A Belgian utility supports French, Dutch, and Arabic. Multilingual hiring is expensive. Interpreter services run €0.80–€2.00 per minute. Call abandonment climbs when caller and agent don't share a language.

The cost

Your translation API is generic. Your conversations aren't

Most translation APIs were trained on English-paired data and tuned for English-pivot translation. French → English → Polish works passably. French → Polish directly, with a real-time conversation's vocabulary, accent, and pace, doesn't. Generic models translate adequately. They don't translate exceptionally on the pairs your users actually speak.

With Voxist

Voice-to-voice is hard. Most vendors don't ship it well

Translation is one part of the problem. ASR is another. TTS is a third. Real-time streaming, with sub-second perceived latency, is a fourth. Stitching four AI systems into a single conversational experience that doesn't feel like a phone tree is what separates Voxlingo from a product that promises "live translation" but breaks the moment your caller stops speaking textbook sentences.

How it works

Speech to speech, in one streaming pipeline

1

Capture

The caller's voice is captured in real time over SIP, WebRTC, or the Voxlingo SDK. Voxist's ASR identifies the language in under 100ms and begins transcribing under 200ms, on a streaming output that surfaces words as they're recognized, not at the end of an utterance.

2

Translate

The streaming transcript flows into VoxTranslate, our in-house translation engine, ranked COMET #2 globally across 20 French-centric EU language pairs in independent benchmarks. The engine handles disfluencies, accents, technical terminology, and code-switching — the things real conversations actually do.

3

Speak

A natural neural TTS voice speaks the translation in the target language, with prosody and pacing that match the source speaker. Voice preservation (translating in the speaker's own voice) is on roadmap for late 2026. End-to-end perceived latency: under one second.

4

Deploy

Voxlingo runs in three configurations: as a SaaS API for developers, as a managed deployment inside a Voxlive contact center, or as a fully on-premise stack including the translation models. Cloud, sovereign, or air-gapped — your choice, your data, your perimeter.

Capabilities

Built to do the hard things well

Real-time voice-to-voice translation

streaming ASR, MT, and TTS, integrated into a single pipeline with sub-second perceived latency.

40+ languages, 1600+ pairs

production-grade depth on European languages, growing coverage on Asian and African languages. Each pair is benchmarked and published on the Voxist leaderboard.

COMET #2 globally on French-centric pairs

beating DeepL in 17/20 pairs, beating GPT-4o in 18/20 pairs, sitting 0.0025 COMET points behind Google overall. Independent benchmark, public methodology.

Conversation-mode features

disfluency handling, code-switching detection, technical-domain vocabularies (legal, medical, finance, technical), context preservation across turns.

Live caption mode

when audio output isn't appropriate (meetings, events, broadcast), render the translation as a synchronized live transcript.

Voice preservation roadmap

by late 2026, Voxlingo will translate in the original speaker's voice using Voxist's TTS voice-cloning research. Today, a natural neutral voice is used in the target language.

Deployable on-premise

including the translation models. One of the very few real-time speech translation products that doesn't require a cloud round-trip.

Proof

COMET #2 globally on French-centric EU pairs

In independent COMET benchmarking across 20 French-centric EU language pairs, Voxlingo's translation engine ranks #2 globally — ahead of DeepL, GPT-4o, Claude, and EuroLLM variants. Voxlingo beats DeepL in 17 of 20 pairs and GPT-4o in 18 of 20 pairs, sitting only 0.0025 COMET points behind the global #1.

#2
globally · French-centric pairs
17/20
pairs ahead of DeepL
18/20
pairs ahead of GPT-4o
40+
languages supported
PairVoxlingoDeepLMargin
French → German4th7th+0.0038
German → French4th7th+0.0038
French → Polish3rd7th+0.0036
French → Spanish3rd8th+0.0036
French → Hungarian4th7th+0.0040
What makes it Voxist

Four things, every time

Latency

Sub-second perceived latency, end to end

Voxlingo's streaming pipeline — ASR, MT, TTS — runs under one second of perceived latency end-to-end, on real conversations with real accents and real disfluencies. The pipeline is Voxist all the way through: no third-party round-trips, no API hops, no quality cliff when a sentence trails off.

Languages

Specialized, not generic

Voxlingo is built for European languages, with French at the core. Most translation APIs were trained on English-paired data and translate via English pivot. Voxlingo is trained on direct pairs — French ↔ German, French ↔ Polish, French ↔ Hungarian — which is why the COMET delta over DeepL is most visible on those pairs.

Sovereignty

Sovereign by default

Voxlingo is one of the very few real-time voice translation products with a fully on-premise deployment option, including the translation models. Cloud, sovereign (OVHcloud, Scaleway), or air-gapped. GDPR-native. EU AI Act-ready. For regulated industries — healthcare, defense, government, finance — this is the combination that doesn't exist anywhere else.

Outcomes

Outcomes you can measure

Voxlingo deployments inside Voxlive contact centers report sub-second perceived latency, parity with human-interpreter intelligibility, and a 60–80% reduction in interpreter spend within the first quarter. Every claim is from a real customer or a public benchmark.

How it compares

A short, honest comparison

VoxlingoDeepL VoiceKUDOWordly
Real-time voice-to-voice✅ (with human)
COMET #2 globally on French-centric pairsNot benchmarkedN/AN/A
On-premise deployment
Built in Europe
40+ languages, growing40+200+ (human)60+
Voice preservation (roadmap)Late 2026Late 2026Human only
Works with

One platform, six products, one flywheel

Voxlingo integrates as a native capability inside the Voxlive contact center, running in the agent's earpiece or as a fully translated agent-customer channel.

capture an expert interview in French; query the resulting knowledge graph in Polish or Arabic. Voxlingo handles the cross-language retrieval inside Voxcept.

record a meeting in mixed-language environments; Voxlingo translates the transcript on demand, in any of the supported languages.

Voxlingo is exposed as a developer-grade translation API at voxist.com/api. Same auth, same SDKs, transparent EUR pricing.

Compliance & trust
GDPR-nativeEU AI Act readySecNumCloud roadmapSOC 2 Type II & ISO 27001 (in progress)HDS-hostedMiFID II call recording complianceOn-premise optionAir-gapped option
FAQ

Questions, answered

How accurate is Voxlingo on European languages?
Voxlingo ranks COMET #2 globally on French-centric EU pairs in independent benchmarking, beating DeepL on 17 of 20 pairs and GPT-4o on 18 of 20. Detailed scores by pair are published on the Voxist translation leaderboard.
What's the end-to-end latency?
Under one second of perceived latency, end-to-end, on real conversations. ASR first audio under 200ms, translation streaming as the transcript arrives, TTS rendering in parallel.
Can Voxlingo run on-premise?
Yes — including the translation models, the ASR, and the TTS. Voxlingo is one of the very few real-time speech translation products with this option. Sovereign cloud (OVHcloud, Scaleway) and air-gapped deployment are also supported.
Will the translated voice sound like the original speaker?
Today, no — a natural neutral voice is used in the target language. Voice preservation, where the translation is rendered in the original speaker's own voice, is on roadmap for late 2026 and uses Voxist's in-house TTS voice-cloning research.
Which languages does Voxlingo support?
40+, with production-grade depth on European languages. French ↔ all major EU languages (German, Spanish, Italian, Portuguese, Dutch, Polish, Czech, Hungarian). English ↔ same set. Arabic, Russian, Turkish, Mandarin, Japanese in supported coverage. The full matrix is on the Voxlingo language pair pages.
How does Voxlingo compare to DeepL Voice?
DeepL Voice and Voxlingo are the two leading European real-time voice translation products. Voxlingo outperforms DeepL on French-centric COMET benchmarking (17 of 20 pairs), runs sub-second end-to-end, and offers on-premise deployment that DeepL does not. DeepL has stronger brand recognition and a deeper Microsoft Teams integration today. See the full comparison.
How does Voxlingo compare to KUDO or Wordly?
KUDO and Wordly are event-translation platforms — large conferences, hybrid events, broadcasts. Voxlingo is a real-time conversational translation product, optimized for one-to-one and small-group voice translation in contact-center and business-communication contexts. Buyers evaluating KUDO against Voxlingo are usually buying for different jobs.
Does Voxlingo work with our SIP / WebRTC / call platform?
Yes. Voxlingo exposes SIP, WebRTC, and gRPC interfaces, plus SDKs for Python, Node, Go, Rust, Java, and .NET. Native integration with Voxlive contact center; documented integration with Cisco Webex (via Mobility Services Platform), Microsoft Teams, Zoom, Genesys Cloud, and NICE CXone.
Can Voxlingo handle accents, code-switching, and technical vocabulary?
Yes — these are the things the engine was specifically trained on. Code-switching (a caller switching from French to English mid-sentence) is detected automatically. Domain vocabularies (legal, medical, finance, technical) can be tuned per customer.
Is the voxlingo.com mobile app the same product?
The mobile app is a demonstration surface designed to let users experience the technology firsthand. It runs on the same translation engine as the enterprise product, but with daily limits, no SLA, no integration support, no on-premise option, and no domain tuning. For business use, the enterprise product is the right entry point.
Can Voxlingo translate sign language?
Not today. Sign language translation is a different technical problem — it requires gesture recognition, not speech recognition — and it's not on the Voxlingo roadmap. KUDO offers human-interpreter sign language coverage if that's the requirement.

Run your multilingual operations on European AI.

Book a 30-minute demo

English & French · EU-hosted · no audio used for model training