Google's New AI Voices Add Nuance and Control to Workspace Videos
Google Workspace is introducing a significant upgrade to video narration. Starting April 15, 2026, users of Google Vids will have access to a new text-to-speech model, Gemini 3.1 Flash TTS, which...
Google Workspace is introducing a significant upgrade to video narration. Starting April 15, 2026, users of Google Vids will have access to a new text-to-speech model, Gemini 3.1 Flash TTS, which provides an unprecedented level of vocal control. The system supports 30 distinct voices, each capable of speaking in 24 languages—a notable expansion that now includes Arabic, Hindi, Russian, and Thai, among others.
The model’s defining feature is its responsiveness to audio tags embedded directly in a script. A creator can instruct the AI to [whisper] a line, [shout] the next, or insert a [laugh] that sounds contextually appropriate. This allows for mid-sentence shifts in tone and pacing, moving synthetic speech closer to a directed performance. Google DeepMind has described it as their most controllable TTS model to date.
Beyond Workspace, the model is available in preview for developers via the Gemini API and for enterprises on Vertex AI. It arrives with practical safeguards; each audio clip carries a SynthID watermark to denote its AI origin. For businesses, the implications are immediate. Marketing teams can produce locally nuanced audio for global campaigns, while internal training modules can feature engaging, expressive narration without recording sessions.
This move integrates with Google's broader strategy. It follows recent additions to Google Vids, like AI video generation, creating a more comprehensive content studio within Workspace. The advancement signals a shift where AI-generated audio is becoming a flexible, authorial tool, potentially changing how teams communicate with video.
Source: Webpronews
Ready to Modernize Your Business?
Get your AI automation roadmap in minutes, not months.
Analyze Your Workflows →