USC Annenberg Online Journalism ReviewUSC

Sections
Article Archive
Readers' Blog
Wikis
Ethics
Events Calendar
Making Money
Reporting
Video
Writing
Resources
Register
About OJR
Privacy Policy
OJR Delivered
OJR by E-mail
RSS Article Feed
RSS Blog Feed
Search




The Net Translation Minefield
The Language Barrier
The Online Babble Barrier
The Net Translation Minefield
Computers don't translate languages very well

Douglas Adams, author of the cult classic science fiction book series ?The Hitchhiker?s Guide to the Galaxy,? defines a Babel fish as a small, leechlike creature that, when placed in one?s ear, allows a person to understand all spoken languages (including extraterrestrial tongues).

Unfortunately for current residents of Earth, not only have humans not discovered the existence of this mythical animal, we have not been able to teach our computers to translate languages very well, either.

Experts point to many potential pitfalls when explaining the limits of natural language processing, which means trying to get computers to do intelligent things with human language

Experts point to many potential pitfalls when explaining the limits of natural language processing, which means trying to get computers to do intelligent things with human language. Words such as ?bank? or ?bat,? which have multiple, unrelated meanings, tend to trip machines because they are unable to apply reason and judge which form is correct by context.

?Ambiguous words are the bane of natural language processing, and machine translation is no exception,? said Philip Resnik, an associate professor at the University of Maryland at College Park.

Resnik, who has a joint appointment with both the university?s Department of Linguistics and the Institute for Advanced Computer Studies, said another translation minefield involves transference of content from one language to another -- an adjective usually precedes a noun in English (?an American car?), but generally follows a noun in French and Spanish (?une voiture americaine? and ?un coche americano?).

?In English, you would say ?I like the book,? but the same sentence in Spanish, ?me gusta el libro? translates literally to, ?the book pleases me.? The expression is backwards, so the transfer of content is not direct,? Resnik said.

In an effort to bridge these communication gaps, researchers have spent decades attempting to create rules that computers can follow. Extensive bilingual dictionaries have been built.

And machine translation systems exist for widely used language pairs such as English with French or Spanish, and to a lesser extent, English with Chinese, Resnik said. Today?s systems can translate text from one of these languages to another often with ?reasonable? quality, but sometimes with ?terrible? results, he said.

?High quality still is a very hard problem, even for these language pairs. With less common languages, the technology for translation often simply doesn?t exist,? said Resnik.

Ed Hovy, senior project leader at the Intelligent Systems Division of the Information Sciences Institute at the University of Southern California,  said machine translation between English and the Romance languages -- French, Spanish and Italian -- gives ?readable? quality output. For languages that are more distant, such as English and Korean, the results are much less spectacular.

?Every second sentence, you can somewhat understand it,? Hovy said.

?There is only one system that gives perfect quality translations, and that is built for weather reports. It recognizes only 200 words.?

Both Hovy and Resnik said the World Wide Web is an extremely valuable tool in the quest for more accurate machine translation systems.

?In the past two years, people realized we can use the billions of pages of text available online to create statistical programs that link words to words and structure to structure,? said Hovy. ?You take a sentence, do a Google search to find a translation. If you have a pair of documents, you try to teach translation programs to teach themselves structure, words and phrases.?

?The bottleneck to this research had been obtaining parallel translated documents, and lots of them,? said Resnik.

?...The Web is only getting bigger, and as language evolves by adding new words and new names, the Web evolves also.?

Machine translation products are available in many language pairs at a wide variety of prices ? starting at $30 to $200 for simple home or small business software. Professional, single-user systems are available from $200 to $1,000, while professional, multi-user translation products that use client-server access cost $5,000 to $20,000.

At the top end of the product line are customized enterprise-level systems that operate over an intranet or an extranet. These cost $10,000 to $200,000 or more.

Companies that offer translation software include Systran, World Language Resources, IBM and SDL International.

Hovy and Resnik identified Systran as one of the long-time leaders in the industry. The company prices its consumer software by language pairs. HTML documents for the Web can be translated between English and French, English-Spanish, English-German, English-Italian and English-Portuguese. Systran charges $30 for one language pair or $49 for five pairs for downloadable software, or $69 for a CD-ROM version for five language pairs.

Systran?s other consumer-level offering translates text on a home computer. For the same price tiers as the HTML products, this product translates all of the above language pairs plus English-Greek, Greek-French and French-Spanish.

Company spokeswoman Reba Rosenbluth said Systran offers customized corporate solutions whose costs run much higher and depend on the language pairs needed. Systran?s Web site says the company offers English-Japanese and English-Korean language pairs, plus translation from Chinese to English and Russian to English.

Los Angeles-based World Language Resources offers 565 translation products in 38 languages. A company spokesman said its wares range in price from $40 for a simple product with 20 dozen entries up to $1,000 for 800,000 entries.

?For simple languages like Spanish, you can get a good system for $90, but for a complex language like Arabic, it might be as much as $1,000,? he said.

Berlitz, one of the most well known names in the language business, does not sell translation software because of its many limitations.

?We don?t believe in it, because it doesn?t work ? at least not now,? said spokesman Kevin Sher. ?If you use one of those programs to translate something from English to another language, then back to English, it doesn?t make any sense.? 

 

News briefs from around the world give you the latest developments that affect online journalism.
Babel fish
Berlitz
Douglas Adams
IBM
Intelligent Systems Division of the Information Sciences Institute
SDL International
Systran
Systran
The Hitchhiker's Guide to the Galaxy
University of Maryland at College Park
University of Southern California
World Language Resources
World Language Resources