The development of language technology and the promotion of linguistic variety around the world have advanced significantly with the help of multilingual speaking projects. To recognize and produce speech in a wide variety of languages—often spanning thousands of different linguistic backgrounds—these efforts make use of AI language models. Multilingual speech initiatives seek to remove obstacles and enable people to communicate, learn, and access information in their native languages by utilizing cutting-edge techniques, such as utilizing atypical data sources or utilizing self-supervised speech representation learning.
Meta has decided to put MMS out as an Open Source Project
With the ground-breaking Massively Multilingual Speech (MMS) project, Meta has unveiled its most impressive achievement in AI language models, distinguishing itself from simple ChatGPT copies. In a ground-breaking breakthrough, Meta’s MMS surpasses the capabilities of its forerunners by boasting the ability to identify and synthesize speech in an astounding array of over 4,000 spoken languages. Instead of keeping this innovation a secret, Meta has opted to open-source MMS and invite researchers to build on its foundation. By doing this, Meta hopes to maintain control over the preservation of linguistic diversity and promote cooperative research in the area.
Traditional text-to-speech and speech recognition models demand extensive training on huge audio datasets, complete with thorough transcription labels that help machine learning algorithms. Many endangered languages, mostly those spoken outside of industrialized countries, lack such thorough data, putting them at risk of extinction. Recognizing this situation, Meta used a brilliant strategy by consulting translated religious books. Similar to the Bible, these books provide a variety of linguistic renditions that have been thoroughly examined for text-based language translation study.
By training an alignment model using the wav2vec 2.0 model for self-supervised speech representation learning, Meta improved the usability of the data even more. Unusual data sources and self-supervised speech modeling worked together to provide impressive results. MMS outperformed OpenAI’s Whisper in comparison tests, reducing word errors by 50% while outperforming Whisper’s language coverage by an incredible ratio of 11.
By making MMS available as an open-source research project, Meta hopes to buck the alarming trend of technology’s erosion of linguistic diversity, which frequently restricts support to the top 100 languages favored by IT giants. Meta aspires to encourage the preservation and vitality of languages all over the world by imagining a world where people can communicate and study in their original tongues thanks to assistive technology, text-to-speech, and even virtual and augmented reality technologies.