Machine Translation


Machine Translation (MT) is the task of automatically converting one natural language into another, preserving the meaning of the input text, and producing fluent text in the output language. While machine translation is one of the oldest subfields of artificial intelligence research, the recent shift towards large-scale empirical techniques has led to very significant improvements in translation quality. The Stanford Machine Translation group's research interests lie in techniques that utilize both statistical methods and deep linguistic analyses. We currently have one of the world's best machine translation systems, placing second on Arabic to English translation at the most recent NIST Open Machine Translation (OpenMT) Evaluation (2009).

Research in our group currently focuses on the following topics:

Better Training in MT

Determining the appropriate weights for a translation system’s decoding model is usually performed using Minimum Error Rate Training (MERT), a procedure that optimizes the system’s performance on an automated measure of translation quality. In our lab, we have developed improved algorithms for performing MERT (Cer et al. 2008). We have also studied the consequences of training to different automated translation evaluation metrics. We found surprisingly that training to different popular word sequence matching based evaluation metrics, such a BLEU, TER, and METEOR, did not seem to have a reliable impact on human preferences for the resulting translations (Cer et al. 2010). However, preliminary results suggest that training to our textual entailment based evaluation metric, which performs a deep semantic analysis of the translations being evaluated, may in fact produce better translation performance (Pado et al. 2009). Currently, we are continuing to investigate the feasibility and effectiveness of training to evaluation metrics that perform a deeper semantic and syntactic analysis of the translations being evaluated.

Syntactic MT

Our work in syntactic machine translation aims to incorporate hierarchical and syntactic structure within phrase-based decoders such as Phrasal or Moses. This approach is motivated by the fact that phrase-based and syntax-based paradigms have much to benefit from each other. While phrase-based decoders offer speed and robustness, syntax-based systems yield translations that generally more well-formed. In (Galley and Manning, 2008), we improve a phrase reordering model by exploiting a hierarchical structure built in linear time as a by-product of standard phrase-based decoding. In (Galley and Manning, 2009), we extend our phrase-based decoder to build a dependency parse left-to-right in only quadratic time, whereas nearly all previous works that build target-language tree structures during decoding are asymptotically slower (cubic time or worse). Both papers show significant gains in machine translation performance.

Chinese MT

Our work also focuses on improving Chinese-to-English translation using deep source-side linguistic analysis. In our Chinese-English system, we train a classifier to categorize each occurrence of 的 (DE) according to its syntactic and semantic context. We use this classifier to preprocess MT data by explicitly labeling 的 constructions, as well as reordering phrases. Our Chinese-English system also uses typed dependencies identified in the source sentence to improve a lexicalized phrase reordering model. Finally, we have also done work to improve the segmentation consistency of our Chinese word segmenter, a characteristic that is often desirable in MT. These three components all show significant gains in translation performance, and are respectively described in (Chang et al., 2009a) (Chang et al., 2009b), and (Chang et al., 2008).

Arabic MT

Although Arabic-to-English translation quality has improved significantly in recent years, pervasive problems remain. One of them is the re-ordering of verb-initial clauses--especially matrix clauses--during translation. We have recently developed a high-precision Arabic subject detector that can be integrated into phrase-based translation pipelines (Green et al., 2009). A characteristic feature of our work is the decision to influence decoding directly instead of re-ordering the Arabic input prior to translation. We have also created a state-of-the-art Arabic parser that can be used for a variety of MT tasks.

MT Evaluation

In (Padó et al., 2008), we developed a metric that evaluates MT output based on a set of deep linguistic features motivated by textual entailment, such as lexical-semantic incompatibility and argument structure overlap. As shown in the figure, our approach (1) constructs typed dependency graphs of the reference (or premise) and hypothesis, (2) computes the highest-scoring alignment between the two sentences, (3) produces roughly 70 syntactic and semantic features for the aligned reference-hypothesis, and (4) finally performs a regression prediction using a linear combination of all features. We compared this metric against a combination metric of four state-of-the-art scores (BLEU, NIST, TER, and METEOR) in two different settings. The combination metric outperforms the individual scores, but is bested by the entailment-based metric. Combining the entailment and traditional features yields further improvements.


NIST Evaluations

Our group has participated in two NIST Open MT evaluations. We submitted one Chinese-English system in 2008, which was ranked as the 8th best system (out of 20 institutions), and submitted one Arabic-English system in 2009, which was ranked as the 2nd best system (out of 13 institutions).

Descriptions of our NIST systems:


We have released as open source Phrasal, the state-of-the-art phrase-based decoder developed by our group. It is fully implemented in Java, and it uses a JNI interface to SRILM to represent language models efficiently in memory.


Accurate Non-Hierarchical Phrase-Based Translation [pdf]
Michel Galley and Christopher D. Manning
NAACL 2010

The Best Lexical Metric for Phrase-Based Statistical MT System Optimization [pdf]
Daniel Cer, Daniel Jurafsky, and Christopher D. Manning
NAACL 2010

Phrasal: A Toolkit for Statistical Machine Translation with Facilities for Extraction and Incorporation of Arbitrary Model Features [pdf]
Daniel Cer, Michel Galley, Daniel Jurafsky and Christopher Manning
NAACL Demo 2010

Improved Models of Distortion Cost for Statistical Machine Translation [pdf]
Spence Green, Michel Galley, and Christopher D. Manning
NAACL 2010

Quadratic-Time Dependency Parsing for Machine Translation [pdf]
Michel Galley and Christopher D. Manning

NP subject detection in verb-initial Arabic clauses [pdf]
Spence Green, Conal Sathi, and Christopher D. Manning.
MT Summit XII (2009), Third Workshop on Computational Approaches to Arabic Script-based Languages (CAASL3).

Stanford University's Arabic-to-English Statistical Machine Translation System for the 2009 NIST Evaluation [pdf]
Michel Galley, Spence Green, Daniel Cer, Pi-Chuan Chang, Christopher D. Manning.
NIST 2009 Open Machine Translation Evaluation Workshop.

Robust Machine Translation Evaluation with Entailment Features [pdf]
Sebastian Padó, Michel Galley, Dan Jurafsky, and Chris Manning.

Discriminative Reordering with Chinese Grammatical Relations Features [pdf]
Pi-Chuan Chang, Huihsin Tseng, Dan Jurafsky, and Christopher D. Manning.
NAACL 2009 Third Workshop on Syntax and Structure in Statistical Translation.

Textual Entailment Features for Machine Translation Evaluation [pdf]
Sebastian Padó, Michel Galley, Dan Jurafsky, and Christopher D. Manning.
EACL 2009 Fourth Workshop on Statistical Machine Translation.

Disambiguating "DE" for Chinese-English Machine Translation [pdf]
Pi-Chuan Chang, Dan Jurafsky and Christopher D. Manning.
EACL 2009 Fourth Workshop on Statistical Machine Translation.

Evaluating MT output with entailment technology [pdf]
Sebastian Padó, Michel Galley, Christopher D. Manning, Dan Jurafsky
AMTA 2008 Metrics MATR workshop - Metrics for Machine Translation Challenge

A Simple and Effective Hierarchical Phrase Reordering Model [pdf]
Michel Galley and Christopher D. Manning.
EMNLP 2008.

Optimizing Chinese Word Segmentation for Machine Translation Performance [pdf]
Pi-Chuan Chang, Michel Galley and Christopher D. Manning.
ACL 2008 Third Workshop on Statistical Machine Translation.

Regularization and Search for Minimum Error Rate Training [pdf]
Daniel Cer, Daniel Jurafsky, and Christopher D. Manning.
ACL 2008 Third Workshop on Statistical Machine Translation.

Stanford University's Chinese-to-English Statistical Machine Translation System for the 2008 NIST Evaluation [pdf]
Michel Galley, Pi-Chuan Chang, Daniel Cer, Jenny R. Finkel, Christopher D. Manning.
NIST 2008 Open Machine Translation Evaluation Workshop.

Extensions to HMM-based Statistical Word Alignment Models [pdf]
Kristina Toutanova, H. Tolga Ilhan, and Christopher D. Manning.
2002 Conference on Empirical Methods in Natural Language Processing.