Mark Lippert | Executive Vice President, North American Public Affairs, Samsung Electronics North America | Samsung website
Samsung continues to innovate in the field of mobile AI experiences, and the latest development is Galaxy AI1's support for 16 languages. This allows users to enhance their language skills even while offline, courtesy of on-device translation features such as Live Translate2, Interpreter, Note Assist, and Browsing Assist. But how does AI language development occur? This article delves into the challenges encountered during mobile AI development and how they were surmounted.
The journey begins in Indonesia where we learn about the initial steps of teaching a new language to AI. The team at Samsung R&D Institute Indonesia (SRIN) explains that quality and relevant data are crucial for successful AI. "Each language demands a different way to process this, so we dive deep to understand the linguistic needs and the unique conditions of our country," says Junaidillah Fadlil, head of AI at SRIN. His team recently added Bahasa Indonesia support to Galaxy AI. He further elaborates that local language development should be driven by insight and science.
Galaxy AI's features like Live Translate perform three core processes: automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS). Each process requires a distinct set of information.
For ASR, extensive recordings of speech in various environments paired with accurate text transcriptions are needed. Muchlisin Adi Saputra, ASR lead at SRIN, emphasizes the importance of capturing authentic sounds from diverse environments such as traffic or malls.
Data sources also play a significant role in this process. "We need to keep up to date with the latest slang and how it is used, and mostly we find it on social media!" Saputra adds.
MT demands translation training data. Muhamad Faisal, MT lead at SRIN explains that translating Bahasa Indonesia is challenging due to its extensive use of contextual and implicit meanings which rely on social and situational cues.
Lastly, TTS requires recordings that cover a range of voices and tones. Harits Abdurrohman, TTS lead at SRIN, explains that good voice recordings can do half the job and cover all the required phonemes for the AI model.
The SRIN team collaborated with linguistics experts to plan for this vast data requirement. "This challenge requires creativity, resourcefulness and expertise in both Bahasa Indonesia and machine learning," Fadlil reflects. He credits Samsung's philosophy of open collaboration as instrumental in achieving their goals.
Working with other Samsung Research centers worldwide, the SRIN team was able to quickly adopt best practices and overcome complexities of establishing data targets. This collaboration also fostered cultural exchange and understanding.
Fadlil concludes by expressing pride in their achievements and reaffirms their commitment to refining their models and improving output quality. He views this expansion as a reflection of their values and respect for cultural identities through language.
In the next part of this series, we will explore how an AI was built to accommodate diverse dialects in Jordan's Arabic language project.