SEMPRE: Semantic Parsing with Execution

SEMPRE is a toolkit for training semantic parsers, which map natural language utterances to denotations (answers) via intermediate logical forms. Here's an example for querying databases: Here's another example for programming via natural language:

SEMPRE has the following functionality:

Code

You can download all the code and documentation for SEMPRE from GitHub. To learn more about the system, walk through our tutorial.

Data

In our EMNLP 2013 paper, we created a new dataset, WebQuestions, which is released under the CC BY 4.0 license. Here are the train and test splits.

To make sure your evaluation on WebQuestions is identical to ours you can download a package containing an evaluation script and the predictions of our ACL 2014 system.

In addition, we preprocessed the Free917 dataset (Cai & Yates, 2013) to work with our system. Here are the train and test splits.

Both datasets are provided in JSON format. WebQuestions contains 3,778 training examples and 2,032 test examples. Free917 contains 641 training example and 276 test examples.

On WebQuestions, each example contains three fields:

On Free917, each example contains two fields:

Papers

SEMPRE was used in the papers:

Jonathan Berant, Andrew Chou, Roy Frostig, Percy Liang. Semantic Parsing on Freebase from Question-Answer Pairs. Empirical Methods in Natural Language Processing (EMNLP), 2013.
Jonathan Berant, Percy Liang. Semantic Parsing via Paraphrasing. Association for Computational Linguistics (ACL), 2014.

SEMPRE supports lambda DCS logical forms, which is the default one used for querying Freebase:

Percy Liang. Lambda Dependency-Based Compositional Semantics. arXiv report.