Algorithm Features
Live Transcription
Zoom's Live Transcription feature serves as a core technology that enables many features of Zoom's comprehensive suite of AI workplace solutions, including many features of AI Companion. By converting real-time audio into structured, searchable text data, Live Transcription creates the foundational layer that powers intelligent meeting summaries, automated action item extraction, sentiment analysis, and advanced collaboration workflows across the entire Zoom Workplace ecosystem.
When Live Transcription is enabled, live audio from sessions such as meetings, webinars, or phone calls is transmitted into Zoom's automatic speech recognition service, which converts the speech-to-text and distributes the dynamic transcript to applicable participants' Zoom Workplace apps. Participants can view the transcript in real-time, access segments through closed captioning, or leverage AI Companion to ask live, in-meeting questions. After the session ends, if the transcript is retained, it can be utilized further by AI Companion for transcript-dependent features like Meeting Summaries, post-meeting queries, and context-aware follow-ups, etc.
Live Transcription operates independently of large language models, relying instead on Zoom's automatic speech recognition technology to convert audio to text. However, the structured text output generated by Live Transcription serves as critical input data for large language models that power AI Companion's advanced features, enabling these models to analyze conversation content and generate intelligent insights.

Refer to Zoom’s Support Center for more information on how Live Transcription powers features like AI Companion’s In-Meeting Questions and Meeting Summary, as well as non-AI Companion features such as Automated Captions.
Local Live Transcription
As of Zoom Workplace app version 6.5.3, users can now access local, on-device live transcription that processes audio directly on their device, instead of using cloud-based processing. While this option is designed to provide enhanced privacy and reduced latency, transcripts generated through local processing cannot be utilized by AI Companion features, which require cloud-based transcript data to function.
Live Translation (Captions)
Zoom's Live Translation feature operates as an extension of the Live Transcription feature, where the original live transcript serves as the foundation for the translation process. The live transcript data is transmitted from the Live Transcription module to Zoom's live translation module, which processes the transcript in the detected source language and translates it into the user-requested target language(s). The translated transcripts are then returned to the live transcription service, which distributes the localized content to meeting participants through their Zoom Workplace apps, enabling real-time multilingual communication without interrupting the natural flow of conversation.


Refer to Zoom’s support center for more information on using translated captions.
Personalized Audio Isolation
Personal Audio Isolation leverages a user’s voiceprint to differentiate their voice and suppress background noise, even in open environments. Users authorize Zoom to create a voiceprint, which captures the unique characteristics and nuances of their voice patterns. The user can also choose to upload a recording of their voice. The user’s voiceprint enables Zoom to intelligently filter and isolate the user's voice from ambient background audio detected by their microphone. The technology effectively suppresses environmental sounds such as coffee shop chatter, vacuum cleaners, barking dogs, or conversations from other people within microphone range, helping ensure that only the user's voice is prominently transmitted.
Refer to Zoom’s support center for more information on using personalized audio isolation.
Last updated
Was this helpful?