|Email:||Address:||1350 Charleston Road
Mountain View, CA
map | directions
The Pac-Man projects were developed for Berkeley's undergraduate artificial intelligence course (CS188). These projects have now been shared with faculty from over 30 universities world-wide, including Stanford, the University of Washington, and UT Austin.
While morphological agreement is generally considered a reflection of syntactic configurations, many agreement relations apply to adjacent words in a sentence. In this paper, we promote local agreement using a first-order Markov model over a rich set of labels, which express both syntactic categories and morphological features.
The particular approach we describe combines discriminative models for segmentation and tagging with a generative model to score tag sequences. All models can be trained on a monolingual treebank — the approach does not require tagging a large parallel corpus.
Spence conducted this research during a summer internship at Google.
Unsupervised constituent grammar induction is the task of automatically discovering the hierarchical syntactic structure of natural language sentences from text. In practice, probabilistic models of grammar induction have applied only to short sentences. A standard dataset for the training and evaluation of grammar induction systems is WSJ10 — the set of Wall Street journal sentences from the Penn Treebank that contain 10 words or less, after removing punctuation. For practical applications of grammar induction, such as machine translation reordering, models must apply effectively to longer sentences.
This paper builds upon the Constituent-Context Model (CCM) by parameterizing its multinomial distributions as log-linear component models that include both fine-grained and general features. The general features encourage the model to assign similar probabilities to similar sequences. As a result, the model performs dramatically better on sentences up to 40 words in length.
Dave conducted this research during a summer internship at Google.
Bilingual dictionaries are written to reflect the way in which one language is translated into another. In principle, a learning system should be able to reconstruct the contents of bilingual dictionaries by analyzing large corpora of monolingual and parallel (translated) text. Many previous publications have dealt with the problem of identifying a set of translations for a word. This paper address the problem of learning how to cluster a set of translations into synonymous subsets.
We describe a method to project a set of monolingual synonym clusters — such as those found in WordNet — onto a set of translations for a single word. We also explore new features for inducing synonym clusters based on corpus statistics.
Our experiments show a substantial improvement in the quality of induced synonym clusters by using bilingual features (e.g., how a word translates) in addition to monolingual distributional features (e.g., what other words tend to colocate with a word). I suspect that these translation features are useful for other applications that characterize words by their distributional similarity.
Mohit conducted this research during a summer internship at Google.
Syntactic pre-ordering can lead to major improvements in translation quality for phrase-based systems. However, a supervised parser is required to analyze source-side sentences before either training or decoding.
In this paper, we train an unsupervised parser and corresponding reordering model that perform as well as pre-ordering and forest-to-string systems in English-Japanese translation. BLEU improves by 3.8 points over a phrase-based system with lexicalized reordering.
I'm excited about this idea, not only because it can potentially improve translation for source languages that don't have robust supervised parsers, but also because it provides a clear extrinsic evaluation for unsupervised parsers. If you've built grammar induction systems and wondered where to apply them, take a look at our paper.
Bob Moore and I spent some time analyzing the accuracy and model sparsity effects of employing different regularizers in a learning setting that is common in NLP: multiclass classification with lots of training examples, but even more indicator features. In part-of-speech tagging, we see ~10^6 training examples and ~10^7 indicator features to predict a label set of 45 tags.
Most researchers are familiar with the broad effects of regularization: L2 log loss gives dense models, while L1 log loss and hinge loss give sparse models. In comparing all of these various techniques together, Bob found some rather subtle interactions between regularization weight and model sparsity. Details are in the paper!