What is AI Speech Model Supporting 30 Languages

Recent research by ElevenLabs released a multilingual voice generation model called Eleven Multilingual v2 that produces ’emotionally rich’ AI audio in nearly 30 languages. This work will enable producers to localize audio for European, Asian, and Middle Eastern markets.

The research team studied human speech indicators for 18 months and developed new methods for detecting context, expressing emotions in speech generation, and synthesizing new, distinctive voices. The model automatically recognizes nearly 30 written languages and generates voice in them with an unprecedented level of authenticity when text is entered into the ElevenLabs text-to-speech platform.

The cloned or synthetic voice retains the distinctive characteristics of the speaker’s voice, such as their native accent, in all languages spoken. It’s now possible to utilize the same voice to animate material in 28 different languages.

This launch came after the platform made it possible for all authors to use professional voice cloning. Users can now make a digital replica of their voice that is practically indistinguishable from the original thanks to this update, which was released alongside improved security and protections. Adding on to the existing languages (English, Polish, German, Spanish, French, Italian, Hindi, and Portuguese), the new model also supports Chinese, Korean, Dutch, Turkish, Swedish, Indonesian, Filipino, Japanese, Ukrainian, Greek, Czech, Finnish, Romanian, Danish, Bulgarian, Malay, Slovak, Croatian, Classical Arabic, and Tamil.

ElevenLabs has verified that the platform is exiting beta today, following the introduction of new features and ongoing enhancements. This change represents a watershed point in the company’s commitment to serving its 1 million+ users throughout the world with dependable and state-of-the-art resources.

ElevenLabs is also working on a method that will enable users to collaborate with AI to create new audio through the platform.

By adding text-to-speech in many languages to visual content, the application makes it more accessible to people with visual impairments or other learning requirements. Some examples are as follows:

The multilingual speech generation tool opens up new possibilities for indie game developers and publishers to translate game experiences and audio content for international audiences, allowing them to connect with players and listeners in their languages without sacrificing quality or accuracy.
Similarly, schools now have the resources to provide students with timely access to high-quality, native-speaker audio content in target languages, improving students’ listening and pronunciation abilities and meeting a variety of instructional preferences within their international student body.

By lowering the time and expense needed to produce high-quality audio in numerous languages, ElevenLabs is assisting businesses and creators in producing more original and accessible content that is understandable by people of all backgrounds and languages.