Microsoft Teams users will soon be able to use cloned versions of their voices to speak and translate conversation in real time, as the company unveils its new, AI-powered Interpreter tool.
Announced at the annual Microsoft Ignite conference and reported by TechCrunch, the new feature allows users to create digital replicas of their voices that can then be used to translate their speech into various languages. "Imagine being able to sound just like you in a different language. Interpreter in Teams provides real-time speech-to-speech translation during meetings, and you can opt to have it simulate your speaking voice for a more personal and engaging experience," wrote Microsoft CMO Jared Spataro in a blog post shared with the publication.
The feature will only be available to Microsoft365 subscribers, and will launch initially for English, French, German, Italian, Japanese, Korean, Portuguese, Mandarin Chinese, and Spanish.
Microsoft's Interpreter has the potential to make the business of remote work and digital socialization more accessible to a wider array of non-English speakers, though it's not yet as dynamic as a live, human translator. And beyond its express application, the tool raises even more questions about security and technological bias.
A recent study found that popular AI-powered transcription tool Whisper — also used in Microsoft's cloud computing programs — were rife for hallucinations, including inventing content or phrases when translating patient information in the medical field. This was especially true for patients with speech disorders like aphasia. The previously hyped Humane AI pin, advertised for its live translation abilities, turned out to be an inconsistent digital alternative to human translation. Addressing similar concerns for Teams' Interpreter, Microsoft told TechCrunch: "Interpreter is designed to replicate the speaker’s message as faithfully as possible without adding assumptions or extraneous information. Voice simulation can only be enabled when users provide consent via a notification during the meeting or by enabling ‘Voice simulation consent’ in settings."
The technology could have immense implications in the accessibility space, with notable figures like U.S. representative Jennifer Wexton amplifying the use of personalized high-tech voice cloning for people with atypical speech. But it has also prompted concerns about nonconsensual deepfake uses and the potential for the tech to be a tool in the arsenal of scammers. Powerful AI speech cloning tech — Microsoft's is reportedly impressively human-like — has evoked ethical concerns, with Microsoft's own CEO calling for stronger guardrails and AI governance in the face of increasing celebrity deepfakes.
Still, the buzz around voice cloning, bolstered by the AI craze, has only grown among the industry's innovators, adding to previous investments in AI speech-to-text translation. Last year, Apple announced its Personal Voice feature, a machine learning tool that creates a synthesized version of a user's voice that can be used in live text-to-speech situations, like FaceTime, and was advertised as an accessibility. Microsoft unveiled its own Personal Voice feature around the same time, powered by its Azure AI and available in 90 languages.