Web Entity Extractor

This is the official home page of the paper:

Panupong Pasupat, Percy Liang. Zero-shot Entity Extraction from Web Pages. Association for Computational Linguistics (ACL), 2014.

Task

We consider the task of zero-shot entity extraction: Given a natural language query and a web page as the inputs, the system should output a list of entities on the web page corresponding to the query. Here is an example:

Code / Dataset

Our code is hosted on GitHub. Follow the instructions in the README file to get started.

We also release the OpenWeb dataset which contains 2773 example of diverse queries and web pages. The dataset can be retrieved using the download-dependencies script in the code base, which will also put the downloaded resources in the appropriate directories. Alternatively, The dataset can be downloaded directly from the links below.