University of Southern California USC Logo

USC News logo

Translating One Word at a Time

07/07/05
Two-way voice translation – known as one of engineering’s ‘holy grails’ – is shaped by a USC team turning English phrases into Persian, but not without a few obstacles that must be overcome.
By Eric Mankin
Shrikanth Narayanan, an associate professor of electrical engineering, led the USC Viterbi School of Engineering group that developed Transonics.

Three years of work by a USC interdisciplinary team has created a functional two-way voice translation system that allows an English-speaking doctor to talk to a Persian-speaking patient.

The Transonics Spoken Dialog Translator turns a doctor’s spoken English questions into spoken Persian and translates patients’ spoken Persian replies into English.

Shrikanth Narayanan led the USC Viterbi School of Engineering group that developed Transonics. A member of the team recently presented a report on the system at the Association for Computational Linguistics conference in Ann Arbor, Mich.

“Fluent two-way machine voice translation is one of the holy grails of engineering,” said Narayanan, an associate professor of electrical engineering, computer science and linguistics who directs the Speech Analysis and Interpretation Laboratory in the Integrated Media Systems Center.

“We are years away from perfecting it, but we think the choices we have made about how to go about creating such a system are working. We hope to have something that will be useful in emergency rooms or ambulances within two years or so.”

The existing system, funded by two grants from the Defense Advanced Research Projects Agency totaling $3.8 million, is a result of research in information technology, supplemented by observation of patient-doctor dynamics in numerous bilingual interaction sessions staged for the project.

“Two-way voice translation involves combining at least three highly imperfect existing disciplines, with the errors multiplying at every stage,” Narayanan said.

The disciplines are:

• Text translation: Taking a written text in one language and translating it into another. Machine translation systems developed by researchers Kevin Knight and Daniel Marcu in the Viterbi School’s Information Sciences Institute consistently rank among the world’s best – but still make frequent grammatical and other errors. Marcu and Knight developed a specialized system specifically for use in Transonics.

• Spoken-word recognition: Narayanan’s specialty. Being able to reliably recognize a large number of different single words, in a variety of regional or foreign accents, is a difficult problem. Recognizing a wide variety of words informally spoken in a noisy, chaotic environment (emergency room, ambulance) adds another level of difficulty.

• Extra-verbal communication: Humans speak with words and intonations. A rising tone at the end of a sentence to express a question is difficult for a machine to assess. Nonsense syllables (um, uh, ah, er), catchphrases (you know, like) and exclamations (Wow! Hey!) in utterances are easy for humans to decode or ignore, but major stumbling blocks for machines. The insights of David Traum of the USC Institute for Creative Technologies in dialog management are aiding in this area by narrowing the range of possibilities and bringing context and previous exchanges into the computer’s decision-making.

Teaching computers to detect human emotions in speech is a major focus by researchers at the USC Speech Analysis and Interpretation Laboratory under the direction of Narayanan and his colleague, USC research assistant professor Panos Georgiou.

“We can take advantage of using essentially pre-fabricated sentences in many cases by trying to understand and paraphrase what is being communicated instead of doing exact word-for-word translation,” Narayanan said.

The system also uses the human ability to read text as a bridge over some of the worst problems of speech recognition and machine translation, by allowing users to select alternate possible messages.

The Transonics interface runs on a laptop computer using the Linux operating system. Doctor and patient both wear headphones with attached microphones. A small keypad connected to the computer speeds and simplifies certain routine commands – switching from doctor to patient mode, for example.

When a doctor asks a question, the speech recognition software captures it – but hedges its bets by displaying its best guess about what was said plus a range of options.

When the doctor chooses the most appropriate (some of the most often used phrases can be put in a quick access “ready menu”), the result is a spoken Persian question in the earphones of the patient.

The same process then takes place in reverse.

Narayanan said much of the success of the interface grows directly out of analysis of a large database of some 300 English-speaking-doctor/Persian-speaking-patient dialogs created by USC medical students and Iranian-heritage USC students and Los Angeles residents.

“Rather than imagining what people might say, we analyzed what people did say,” he explained, adding that recordings of the encounters were used to train and tune the system.

USC linguistics Ph.D. candidate Shadi Ganjavi played a key role in setting up these encounters. “We are grateful to her and to the large Persian-speaking community in Los Angeles,” Narayanan said.

The system contains about 23,000 English and 9,000 Persian words, a disproportion that exists because relatively little has so far been done in machine translation of Persian (a language also called Farsi), either written or spoken.

For Narayanan, one of the striking things that emerged during the process was the dependence of the system, in its current state, on the ability of users to recognize its limits and weaknesses, and work within them.

The team has created an elaborate user manual, and as with any system, reading the manual improves performance a great deal.

In addition to the previously mentioned researchers and institutions, Malibu-based HRL Laboratories has worked with USC on the project. HRL personnel included USC alumni Robert Belvin and Howard Neely.

Usability testing and interface design contributions were made by Scott Millward, a postdoctoral scientist at IMSC. USC electrical engineering graduate students Emil Ettellaie, Dagen Wang, Ananthakrishnan Shankar, Murtaza Bulut also made contributions, as well as Sudeep Ghande, who presented the paper.

For more information on the system, visit http://sail.usc.edu/transonics.