edu.stanford.nlp.objectbank (Stanford JavaNLP API)

Interface Summary
Interface	Description
IteratorFromReaderFactory<T>	An IteratorFromReaderFactory is used to convert a java.io.Reader into an Iterator over the Objects of type T represented by the text in the java.io.Reader.

Class Summary
Class	Description
DelimitRegExIterator<T>	An Iterator that reads the contents of a Reader, delimited by the specified delimiter, and then subsequently processed by an Function to produce Objects of type T.
DelimitRegExIterator.DelimitRegExIteratorFactory<T>
IdentityFunction<X>	An Identity function that returns its argument.
LineIterator<X>	An Iterator that returns a line of a file at a time.
LineIterator.LineIteratorFactory<X>
ObjectBank<E>	The ObjectBank class is designed to make it easy to change the format/source of data read in by other classes and to standardize how data is read in javaNLP classes.
ObjectBank.PathToFileFunction	This is handy for having getLineIterator return a collection of files for feeding into another ObjectBank.
ReaderIteratorFactory	A ReaderIteratorFactory provides a means of getting an Iterator which returns java.util.Readers over a Collection of input sources.
ResettableReaderIteratorFactory	Vends ReaderIterators which can always be rewound.
XMLBeginEndIterator<E>	A class which iterates over Strings occurring between the begin and end of a selected tag or tags.

Package edu.stanford.nlp.objectbank Description

The ObjectBank class is designed to make it easy to change the format/source of data read in by other classes and to standardize how data is read in javaNLP classes. This should make reuse of existing code (by non-authors of the code) easier because one has to just create a new ObjectBank which knows where to look for the data and how to turn it into Objects, and then use the new ObjectBank in the class. This will also make it easier to reuse code for reading in the same data.

An ObjectBank is a Collection of Objects. These objects are taken from input sources and then tokenized and parsed into the desired kind of Object. An ObjectBank requires a ReaderIteratorFactory and an IteratorFromReaderFactory. The ReaderIteratorFactory is used to get an Iterator over java.util.Readers which contain representations of the Objects. A ReaderIteratorFactory resembles a Collection that takes input sources and dispenses Iterators over java.util.Readers of those sources. An IteratorFromReaderFactory is used to turn a single java.util.Reader into an Iterator over Objects. The IteratorFromReaderFactory splits the contents of the java.util.Reader into Strings and then parses them into appropriate Objects.

Example Usage:

You have a collection of files in the directory /u/nlp/data/gre/questions. Each file contains several Puzzle documents which look like:

 <puzzle>
 <preamble> some text </preamble>
 <question> some intro text
 <answer> answer1 </answer>
 <answer> answer2 </answer>
 <answer> answer3 </answer>
 <answer> answer4 </answer>
 </question>
 <question> another question
 <answer> answer1 </answer>
 <answer> answer2 </answer>
 <answer> answer3 </answer>
 <answer> answer4 </answer>
 </question>
 </puzzle>

First you need to build a ReaderIteratorFactory which will provide java.io.Readers over all the files in your directory:

Collection c = new FileSequentialCollection("/u/nlp/data/gre/questions/", "", false); ReaderIteratorFactory rif = new ReaderIteratorFactory(c);

Next you need to make a IteratorFromReaderFactory which will take the java.io.Readers vended by the ReaderIteratorFactory, split them up into documents (Strings) and then convert the Strings into Objects. In this case we want to keep everything between each set of <puzzle> </puzzle> tags so we would use a BeginEndIteratorFactory. You would also need to write a class which extends Appliable and whose apply method converts the String between the <puzzle> </puzzle> tags into Puzzle objects.

 public class PuzzleParser implements Appliable {
 public Object apply (Object o) {
 String s = (String)o;
 ...
 Puzzle p = new Puzzle(...);
 ...
 return p;

Now to build the IteratorFromReaderFactory:

 IteratorFromReaderFactory rtif = BeginEndIterator.getFactory("<puzzle>", "</puzzle>", new PuzzleParser());

Now, to create your ObjectBank you just give it the ReaderIteratorFactory and IteratorFromReaderFactory that you just created:

 ObjectBank puzzles = new ObjectBank(rif, rtif);

Now, if you get a new set of puzzles that are located elsewhere and formatted differently you create a new ObjectBank for reading them in and use that ObjectBank instead with only trivial changes (or possible none at all if the ObjectBank is read in on a constructor) to your code. Or even better, if someone else wants to use your code to evaluate their puzzles, which are located elsewhere and formatted differently, they already know what they have to do to make your code work for them.