Push-to-Talk Translation in Meetings: A Practical Guide

Always-on translation sounds like the dream until you sit through the third meeting where the tool solemnly translates your throat-clearing, your partner asking about dinner, and the coffee machine gurgling in the next room. Somebody on the call politely ignores the German synthesis of "do we have oat milk." You promise yourself you will never do that again.

Push-to-Talk is what you switch to after that meeting. You hold a key, you speak, you release. Nothing goes into the room until you mean it. The rest of the time, you are listening — captions flowing on the side panel, your mic contributing silence. It is the same model radio operators have used since the 1920s for the same reason: intentional transmission beats hot-mic chaos every single time.

This is a practitioner guide to getting the PTT workflow right inside Gaavala. Not the marketing pitch — the actual mechanics, the keyboard habits, the voice engine trade-offs, and the little things that make the difference between "this is uncanny" and "this is just how I take meetings now."

When Push-to-Talk Beats Timed Speak

Gaavala has two ways to drive Speak Mode. Timed Speak commits a fixed window of speech on a schedule — great for monologues, demos where you know you will be talking continuously, or scripted pitches where you want to batch-translate in rhythm. Push-to-Talk commits on release — great for everything that looks like a real conversation.

Reach for PTT when:

Timed Speak still wins when you are clearly driving the meeting end-to-end — a sales pitch, a product demo, a one-way announcement. For everything else, PTT is the default I reach for.

Setup

The first time you configure PTT takes about four minutes. Do this before your meeting, not during.

  1. Install the extension. Add Gaavala to Chrome. The install drops the icon in your toolbar.
  2. Sign in. Click the icon, open the side panel, sign in with Google or Microsoft. The OAuth round-trip lands you back in the side panel authenticated.
  3. Upgrade to Pro. PTT is a Pro-only feature and is NOT included in the free trial. The one-time free trial (5 minutes of transcription, no credit card, never resets) covers Caption Mode (live captions) only. Speak Mode — which is where PTT lives — requires a direct upgrade to Pro at $24.99/month. If you are still on the free trial, you will see the upgrade CTA in the side panel.
  4. Open Speak Mode settings. In the side panel, find the Speak Mode section. You will see a toggle, a mode selector (PTT / Timed Speak / Toggle), a voice engine picker, and a key binding field.
  5. Assign your PTT key. Click the key binding field, press the key you want. I use the right Option key on macOS because nothing else in my normal workflow touches it. Avoid Space, avoid Enter, avoid Cmd/Ctrl combos. More on this later.
  6. Pick an output voice. Soniox studio voices are the Pro default — 28 voices across all 60 languages, nothing to configure. Kokoro gives you a set of clean on-device English voices. ElevenLabs, if you have a key, gives you cloned-voice options. The Soniox default is fine for the first test.
  7. Do a dry run. Before you join a meeting, test it standalone. Open any page, hold your key, say "testing one two three" in your source language, release. The side panel should show the transcript and you should hear the translated synthesis through your speakers. If that works end-to-end, you are ready.

The Workflow in a Live Meeting

Here is what a real PTT session looks like. I will assume you are joining a Zoom call with a client who speaks German, and you are answering in English that Gaavala will render into German for them.

  1. Join the meeting. Zoom tab opens, you land in the call. Unmute yourself in Zoom itself — Gaavala's output goes into the tab, and the tab has to be unmuted for the other side to hear it.
  2. Open the Gaavala side panel. Click the extension icon or use your shortcut. The side panel opens next to the Zoom tab.
  3. Start the capture. With the Zoom tab focused, press Start in the side panel. Gaavala grabs the meeting audio instantly through Chrome's tab capture — no share prompt, no picker.
  4. Pick languages. In the side panel, set the source language to English (your speaking language) and the target to German (what the room should hear). The listening side flips these automatically so incoming German comes back as English captions.
  5. Confirm Speak Mode is in PTT. The PTT button should show the idle state — a small ring, no glow.
  6. Listen. The client speaks. English captions roll in the side panel. You read, you think, you formulate your reply.
  7. Hold the PTT key. As soon as you press, the button blooms into a coral pulse ring. That is the visual confirmation that your mic is hot and Gaavala is listening to you specifically.
  8. Speak. One sentence, maybe two. Natural pace. Watch the ring pulse with your voice.
  9. Release the key. The pulse snaps back to idle. A beat later — one to two seconds with Kokoro, two to four with ElevenLabs — your translated voice plays into the Zoom tab and the client hears it in German.
  10. Listen again. Repeat. You can release and immediately start listening to their reply. There is no mode switch, no menu.

The rhythm you will develop is: listen, think, hold, speak, release, listen. It becomes muscle memory inside the first meeting.

Voice Engine Choice

Speak Mode has three voice engines and they make meaningfully different trade-offs.

Soniox studio voices (Pro default). The same engine that powers your captions also speaks — 28 studio voices covering all 60 languages Gaavala translates, on a managed key with nothing to configure. Synthesis goes browser-direct to Soniox using the same short-lived keys as captions, so it just works out of the box.

Kokoro (on-device). An 82M-parameter neural TTS model that runs locally in the extension's offscreen document. Latency is the star of the show here — you get one to two seconds from release to audible output because nothing leaves your machine for synthesis. It works offline and costs nothing on top of your subscription. The catch: Kokoro's speak-side output is English-only right now. That is perfect if you are speaking into English-speaking rooms, or if you speak B-level English and want a cleaner delivery, but it means you cannot use Kokoro to output German or Japanese.

ElevenLabs (cloud, BYOK). You bring your own ElevenLabs API key, paste it into Gaavala settings, and you get multilingual synthesis across 29 languages plus voice cloning if you have an ElevenLabs Pro seat. Quality is outstanding. Latency is two to four seconds because the synthesis round-trips through ElevenLabs' servers. Good network makes this feel fine; bad wifi turns it into an awkward pause game.

Rule of thumb: the Soniox default already covers every language Gaavala translates, so start there. If the room needs English and you want fully on-device synthesis, use Kokoro. Switch to ElevenLabs for the calls where a cloned voice on the other side earns the latency tax.

Real-World Scenarios

A few patterns I have watched people use PTT for. These are not hypotheticals — these are the meetings PTT was built for.

Tips From Actual Usage

Small habits that make PTT feel good instead of awkward.

When PTT Feels Laggy

If the gap between release and output feels longer than it should, walk through this list in order before blaming the tool.

  1. Network. If you are on ElevenLabs, every utterance round-trips through the cloud. Airport wifi or a hotel guest network will add seconds. Switch to Kokoro for the rest of the call or tether to your phone.
  2. Wrong engine for the call. If you picked ElevenLabs but the room is English-speaking, you are paying the latency tax for no reason. Switch to Kokoro.
  3. Captured the wrong tab. If you pressed Start while a different tab was focused, Gaavala is listening to the wrong audio source, which confuses STT and delays the next cycle. End the session, focus the meeting tab, and press Start again.
  4. Chrome GPU pressure. Other tabs running WebGL, video playback, or heavy canvas work will starve Kokoro of GPU cycles. Close the Figma file, the second YouTube tab, the Google Earth window you forgot about. Gaavala benchmarks much better with a clean Chrome.
  5. Laptop thermal throttling. After 90 minutes of meetings, a hot laptop will slow Kokoro inference enough to notice. Plugged-in, on a cooling pad, with good airflow — you get consistent latency for hours.

Privacy Note

Your microphone audio goes browser-direct to Soniox over a WebSocket. It does not pass through Gaavala servers. The Speak Mode side follows the same architecture as Caption Mode: we mint a short-lived Soniox key, the extension opens its own connection, and the audio never touches us. Kokoro synthesis happens locally in the offscreen document. Soniox studio-voice synthesis — the Pro default — uses the same short-lived-key, browser-direct pattern as captions; the text goes straight from your browser to Soniox. ElevenLabs synthesis, if you use it, goes browser-direct to ElevenLabs with your own key — again, not through our backend. If you want the full architectural breakdown, the meeting audio privacy post walks through every hop.

This matters for PTT specifically because people assume mic capture is where the privacy risk lives. In Gaavala's model, the mic stream is the least-processed, most-direct path in the whole system.

Keyboard Shortcut Etiquette

The single most common setup mistake is picking a PTT key that fights with the meeting app. A short list of keys to avoid and why:

Good choices on macOS: right Option, right Cmd, or a function key you have remapped. On Windows: right Alt, right Ctrl, or a dedicated side key on a gaming mouse. The best PTT keys are the ones you can hold without moving your other hand off the keyboard, and that nothing else in your stack fights for.

FAQ

Can other meeting participants tell it is AI? Kokoro sounds clearly synthetic but clean. ElevenLabs with a cloned voice is indistinguishable from a human in most short exchanges — people in the call will almost never clock it unless you tell them. The bigger tell is rhythm: if you hold and release in a steady pattern, listeners notice the silences between your turns more than the voice itself.

Does it work if I am muted in the meeting? No. Gaavala's Speak Mode output plays into the tab's audio pipeline, and the meeting app sends the tab's audio through your own mic channel. If you are muted in Zoom or Teams, the output gets muted with everything else. Unmute yourself in the meeting app, let Gaavala handle the rest.

Can I use PTT and Timed Speak in the same session? You pick one mode at a time. Switching between them takes one click in the side panel and is instant — no restart, no re-share. A common pattern is to start a demo in Timed Speak while you are presenting slides, then switch to PTT when the Q&A starts.

Battery impact on a laptop? Kokoro runs on CPU and sips GPU. Expected battery draw is noticeable but not dramatic — think "watching a 1080p YouTube video" level. ElevenLabs is actually lighter on battery because the synthesis happens in the cloud. The mic capture itself is trivial. A full day of back-to-back meetings on a MacBook Air runs the battery down faster than it would with no translation, but not alarmingly so.

What happens if I release the key mid-sentence? The utterance commits as-is. Whatever STT had captured up to the moment of release gets translated and synthesized. If you cut yourself off mid-word, you will hear a truncated output. The fix is to hold until you complete the thought. Early releases are the most common new-user mistake and the one that fixes itself after half a dozen turns.

Try It

If you have been looking for a way to contribute confidently in cross-language meetings without surrendering control of when your voice enters the room, this is the workflow. Install Gaavala free (one-time 5-minute trial, no credit card, never resets), then upgrade to Pro to enable PTT. Pro is $24.99/month with no lock-in — no trial period, just a direct upgrade when you are ready.

Add Gaavala to Chrome

Install, sign in, upgrade to Pro, assign a key, do one dry run. Your next client call is where the difference shows up.


Related Articles

Back to Gaavala