< 500ms latency in real time

Speak your language.
Let them hear it in theirs.

VoxTwin translates your voice in real time during video calls and plays back your message in a voice that sounds like you. Ideal for interviews, meetings, and international conversations, with an AI assistant that helps you as you speak.

View plans → How it works

✓ Zoom, Meet, Teams ✓ Multiple languages ✓ Cancel anytime

VoxTwin — Active session ● Live

Latency: 340ms Phrases: 12 Session: 00:08:42 🤖 Assistant ON

YOU

Original:

"Tengo experiencia en microservicios con Docker y Kubernetes."

Translation:

"I have experience in microservices with Docker and Kubernetes."

STT: 280ms · Trad: 60ms · SPEC HIT

VISITOR

Original:

"Can you describe a time you solved a race condition?"

Translation:

"¿Puedes describir una vez que resolviste una condición de carrera?"

STT: 310ms · Trad: 55ms

🤖 ASSISTANT AI [10:23:41]

Listening: "Can you describe a time you solved a race condition?"

A race condition occurs when two threads access the same resource without synchronization. I resolved it using a mutex (threading.Lock in Python) to serialize access to the shared cache. I also considered using a queue (queue.Queue) which is thread-safe by design...

How does it work?

Three steps, without interrupting your video call

Speak naturally

Join your video call as usual. VoxTwin listens to your voice and the other participant's at the same time, without interrupting or modifying the call.

Instant translation

Your voice is translated to the other participant's language in under 500ms. The app learns the conversation context and improves accuracy as the session progresses.

They hear you as yourself

The translation plays back with your own voice timbre through the video call. Your contact hears you in their language without perceiving that you're using a translator.

Everything you need

Built specifically for video calls on Windows

⚡

Bidirectional translation

Translates your voice to the visitor's language and vice versa, simultaneously. Both participants understand each other naturally, without waiting or interruptions.

🎙️

Your voice, in any language

The translation doesn't sound like a generic computer voice — it sounds like you. The other participant doesn't perceive that you're using a translator.

🤖

Real-time AI Assistant

An AI assistant monitors the conversation and suggests answers or context in your language while the call is happening. Ideal for interviews or technical meetings.

🧠

Improves with context

The longer the session, the better it understands the topic. VoxTwin remembers the terms used and maintains translation consistency from start to finish.

📋

Post-call summary

When you close the session, you automatically receive a conversation summary, key points discussed, and a bilingual glossary of the most relevant terms.

🔧

No-code configurable

Configure the language pair, voice type, and preferences directly from the app interface. No config files or technical knowledge needed.

Compatible with: 📹 Zoom 📹 Google Meet 📹 Microsoft Teams 📹 Discord 📹 Skype Windows 10/11

🔒 Privacy by design

Your data, your full control

VoxTwin uses your own API keys from voice and AI providers — not ours. Here's why that's an advantage, not an inconvenience.

🔒

Your keys never go through us

Your API keys are stored only on your computer. VoxTwin has no access to them, cannot read or leak them. If we close tomorrow, your credentials are still yours.

💰

Pay for what you actually use

You pay directly to each provider (Deepgram, DeepL, ElevenLabs) with no middlemen or hidden markup. Most have free tiers sufficient for normal sessions.

🔓

No platform lock-in

Your keys work with any other application. You're not tied to VoxTwin — if you find something better in the future, you migrate in seconds without losing anything.

What does it actually cost?

For normal use of 2–3 sessions per week of ~30 minutes, free tiers cover everything:

✓Deepgram— 12,000 min/year free
✓DeepL— 500,000 chars/month free
✓ElevenLabs— 10,000 chars/month free
→Anthropic / OpenAI— optional, for AI assistant

Estimated cost for normal use

$0 – $5 /month

With provider free tiers

See step-by-step setup guide →

Simple, transparent plans

Cancel anytime. No commitments.

Basic

Free

Forever

✓Real-time translation
✓High-accuracy speech recognition
✓System audio capture
✗Real-time AI assistant
✗Post-call AI summary
✗Voice cloning

Create free account →

Frequently asked questions

Yes. Audio is routed through VB-Cable (included in setup instructions), which acts as a virtual microphone. Any video call app that lets you select the input microphone works.

VoxTwin supports all languages available in the configured providers. Deepgram supports 30+ languages for STT, and DeepL/Claude/OpenAI cover the world's major languages. The language pair is configured in the app before each session.

Yes, STT, translation and TTS are cloud services. A stable connection of at least 5 Mbps is sufficient. Data consumption is minimal — mainly compressed audio in real time.

With the Complete plan, you set up your cloned voice on ElevenLabs (you need their account and a few minutes of recording). The app uses that voice ID so the translation sounds with your timbre. The cloning process is external — in VoxTwin you only enter your voice ID.

From your user dashboard → "My Plan" → "Cancel subscription". Cancellation is immediate and you keep access until the end of the paid period. No penalties or questions.

Yes. VoxTwin uses your own API keys — not ours. This means your credentials never go through our servers, you pay directly to the provider without markup, and you're not locked to our platform. The app guides you step by step to get each key from the Settings window. See why it works this way →

Speak your language. Let them hear it in theirs.