Even Google says it's tough to translate Filipino

Filipino is among the four languages that have presented quite a challenge to the engineers behind Google's Translate app, the Internet giant admitted Friday.

Eugene Weinstein and Pedro Moreno of the Google Speech Team said Google needed creative solutions to crack the four tongues.

"Although we’ve been working on speech recognition for several years, every new language requires our engineers and scientists to tackle unique challenges. Our most recent additions - Croatian, Filipino, Ukrainian, and Vietnamese - required creative solutions to reflect how each language is used across devices and in everyday conversations," they said in a blog post.

In the case of Filipino, they said the tongue particularly presented "interesting challenges" since Filipinos often mix several languages in daily life.

Such a practice, called code switching, "complicates the design of pronunciation, language, and acoustic models," they said.

The engineers eventually decided to reflect the "reality of daily language use in our speech recognizer design."

"If users mix several languages, our recognizers should do their best in modeling this behavior. Hence our Filipino voice search system, while mainly focused on the Filipino language, also allows users to mix in English terms," they said.

Meanwhile, they said they had to take tones into consideration in Vietnamese.

One simple technique is to model the tone and vowel combinations directly in Google's lexicons.

"As a result we had to come up with special algorithms to handle the increased complexity. Additionally, Vietnamese is a heavily diacritized language, with tone markers on a majority of syllables," they said.

The solution was a special diacritic restoration algorithm "which enables us to present properly formatted text to our users in the majority of cases," they said.

Weinstein and Moreno said they use Google's distributed large-scale neural network learning infrastructure - the one that learned to spontaneously discover cats on YouTube.

"By partitioning the gigantic parameter set of the model, and by evaluating each partition on a separate computation server, we’re able to achieve unprecedented levels of parallelism in training acoustic models," they said.

However, they said there must also be more people using Google speech recognition products, so the technology will become more accurate.

"These new neural network technologies will help us bring you lots of improvements and many more languages in the future," they said. — TJD, GMA News

Editor’s note:Yahoo Philippines encourages responsible comments that add dimension to the discussion. No bashing or hate speech, please. You can express your opinion without slamming others or making derogatory remarks.

  • Ayungin dilemma Ramon Casiple - Parallaxis
    Ayungin dilemma

    China faces a dilemma in Ayungin Shoal and other contested areas. If it waits for the ITLOS—which may decide against it—it would have tacitly bound itself to UNCLOS and risk a rogue state reputation if it asserts its claim in the South China Sea. If its militarily acts now, it may face international isolation. …

  • 48 nabbed in biggest anti-trafficking catch in Bongao VERA Files - The Inbox
    48 nabbed in biggest anti-trafficking catch in Bongao

    By Jake Soriano, VERA Files Bongao, Tawi-tawi—A team of Marines and policemen intercepted around noon Thursday 48 people, 12 of them minors, believed recruited by a human trafficking syndicate for work in Malaysia. The arrest constitutes what advocates called the … Continue reading → …

  • Docs vow to pay right taxes, make peace with BIR VERA Files - The Inbox
    Docs vow to pay right taxes, make peace with BIR

    By Kiersnerr Gerwin Tacadena, VERA Files Leaders of the medical profession have made peace with their former adversary, the Bureau of Internal Revenue (BIR), and joined forces in a campaign to get doctors to pay the right taxes. BIR Commissioner … Continue reading → …

POLL
Loading...
Poll Choice Options