Thanks to the advances in speech and natural language processing, hopefully one day you’ll be able to ask your virtual assistant what the best salad ingredients are. Nowadays, it is possible to ask your home gadget to play music or open it with voice commands, a feature already found in many devices.
If you speak Moroccan, Algerian, Egyptian, Sudanese or any other dialect of Arabic, which varies greatly from region to region, where some of them are mutually incomprehensible, it is a different story. If your mother tongue is Arabic, Finnish, Mongolian, Navajo or any other language with a high level of morphological complexity, you may be excluded.
These complex constructions made Ahmed Ali interested in finding a solution. He is a Chief Engineer in the Arabic Language Technology Group at the Qatar Computing Research Institute (QCRI) – part of the Hamad Bin Khalifa University of the Qatar Foundation and founder of Arabic Speech, a “community that exists to facilitate Arabic speech science and speech technology.”
Ali was fascinated by the idea of talking to cars, appliances and gadgets while on IBM many years ago. “Can we build a machine capable of understanding different dialects – an Egyptian pediatrician to automate a prescription, a Syrian teacher to help children get the key parts of their lessons, or a Moroccan chef to describe the best couscous recipe?” He is from the United States. However, the algorithms that power these machines cannot operate through about 30 Arabic variants, let alone those. Today, most speech recognition tools only work in English and a handful of other languages.
The coronavirus epidemic has further fueled the already intense reliance on voice technology, where natural language processing technology has helped people adhere to home stay guidelines and physical distance measures. However, as we use voice commands to help with e-commerce purchases and manage our families, there are more applications in the future.
Millions of people worldwide use the huge Open Online Course (MOOC) for its open access and unlimited participation. Speech recognition is one of the key features of MOOC, where students can search specific areas of the course’s spoken content and enable translation through subtitles. Speech technology enables lectures to display spoken words as text in university classrooms.
According to a recent article in Speech Technology Magazine, the Voice and Speech Recognition market is projected to reach $ 26.8 billion by 2025, as millions of customers and organizations around the world rely on voice bots to communicate with their devices or vehicles alone. Also to improve customer service, drive health-care innovations and improve accessibility and inclusion for hearing, speech or motor impairments.
In a 2019 survey, Capgemini predicts that by 2022, more than two in three customers will choose a voice assistant instead of going to a store or bank branch; One share that could grow fairly is the epidemic that has plagued the world for more than a year and a half in terms of home-based, physically distant lives and trade.
Nevertheless, these devices fail to deliver to a wide area of the world. For these 30 types of Arabic and millions of people, this is a substantial lost opportunity.
Arabic for machines
The English- or French-speaking voice bot is far from perfect. Nevertheless, teaching machines to understand Arabic for various reasons is particularly complicated. Here are three commonly accepted challenges:
- Lack of diacritics. Arabic dialects are indigenous languages, such as primarily spoken. Most of the text available is non-diacritical, meaning that it does not contain pronunciations such as Intense (´) or Grave (`) which indicate the word value of the letter. Therefore, it is difficult to determine where the vowels go.
- Lack of resources. There is a lack of labeled data for different Arabic dialects. Collectively, they lack standard orthographic rules that dictate how to write a language, including rules or spelling, hyphenation, punctuation, and emphasis. These resources are crucial for training computer models, and very few of them have hindered the development of Arabic speech recognition.
- Morphological complexity. Many Arabic speakers are involved in code switching. For example, in the colonies of French-North Africa, Morocco, Algeria, and Tunisia, there are many borrowed French words in the dialect. As a result, there are a large number of words outside the vocabulary that are not understood by speech recognition technologies because these words are not Arabic.
“But the field is moving at lightning speed,” Ali said. This is a collaborative effort among many researchers to move it faster Ali’s Arabic Language Technology Lab is leading the Arabic Speech project to integrate Arabic translations with the local dialects of each region. For example, Arabic dialects can be divided into four regional dialects: North African, Egyptian, Gulf, and Levantine. However, dialects do not cross borders, it can go as fine as a dialect in every city; For example, an Egyptian native speaker could distinguish between Alexandrian dialects from their fellow citizens from Aswan (1,000 kilometers away on the map).
Build a technology-intelligent future for all
At the moment, machines are as accurate as human replicators, thanks in large part to advances in deep neural networks, a subfield of machine learning in artificial intelligence that relies on algorithms inspired by how the human brain works biologically and efficiently. However, until recently, speech recognition has been hacked somewhat together. The technology has a history of relying on different modules for sound modeling, pronunciation lexicon creation and language modeling; All modules that need to be trained separately. More recently, researchers have been training models that convert acoustic features directly into text transcription, potentially optimizing all parts for final work.
Even with this progress, Ali still can’t give voice commands in most of his local Arabic devices. “It’s 2021, and I still can’t talk to a lot of machines in my dialect,” he commented. “I mean, now I have a device that can understand my English, but machine recognition of multi-dialect Arabic speech has not yet happened.”
Making it happen is the focus of Ali’s work, which has become the first transformer of Arabic speech recognition and its dialect; Which has achieved unparalleled performance so far. Dubbed the QCRI Advanced Transcription System, the technology is currently being used by broadcasters Al-Jazeera, DW, and the BBC to replicate online content.
There are several reasons why Ali and his team have succeeded in building this speech engine right now. Initially, he said, “All dialects need to have resources. We need to build resources to be able to train the model. “Advances in computer processing mean that computationally intensive machine learning now takes place in a graphics processing unit that can quickly process and display complex graphics. As Ali said,” We have a great architecture. We have good modules and we have data that represents reality. ”
Researchers at QCRI and Kanari AI have recently developed a model that can achieve human equality in the news of Arabic broadcasting. The system demonstrates the effect of subtitling Al Jazeera’s daily report. Although the English human error rate (HER) is about 5.6%, studies have shown that Arabic HER is significantly higher and can reach up to 10% due to the morphological complexity of the language and the lack of standard orthographic rules in dialect Arabic. Thanks to deep learning and recent advances in end-to-end architecture, the Arabic Speech Recognition Engine manages to outperform native speakers in broadcast news.
Although modern standard Arabic speech recognition seems to be working well, researchers at QCRI and Kanari AI are immersed in examining the boundaries of dialectical processing and achieving great results. Since no one at home speaks modern standard Arabic, we need to pay attention to the dialect to enable our voice assistant to understand us.
This content was written by a member of the Qatar Computing Research Institute, Hamad Bin Khalifa University, Qatar Foundation. It was not written by the editorial staff of MIT Technology Review.