public class ObjectBank<E>
extends java.lang.Object
implements java.util.Collection<E>, java.io.Serializable
An ObjectBank is a Collection of Objects. These objects are taken from input sources and then tokenized and parsed into the desired kind of Object. An ObjectBank requires a ReaderIteratorFactory and a IteratorFromReaderFactory. The ReaderIteratorFactory is used to get an Iterator over java.util.Readers which contain representations of the Objects. A ReaderIteratorFactory resembles a collection that takes input sources and dispenses Iterators over java.util.Readers of those sources. A IteratorFromReaderFactory is used to turn a single java.io.Reader into an Iterator over Objects. The IteratorFromReaderFactory splits the contents of the java.util.Reader into Strings and then parses them into appropriate Objects.
getLineIterator
method.
In its simplest use, it returns an ObjectBank<String>
, which is a subclass of
Collection<String>
. So, statements like these work:
for (String str : ObjectBank.getLineIterator(filename) {
System.out.println(str);
}
String[] strings = ObjectBank.getLineIterator(filename).toArray(new String[0]);
String[] strings = ObjectBank.getLineIterator(filename, "GB18030").toArray(new String[0]);
More complex uses of getLineIterator let you interpret each line of a file
as an object of arbitrary type via a transformer Function.
For more general uses with existing classes, you first construct a collection of sources, then a class that will make the objects of interest from instances of those sources, and then set up an ObjectBank that can vend those objects:
ReaderIteratorFactory rif = new ReaderIteratorFactory(Arrays.asList(new String[] { "file1", "file2", "file3" }));
IteratorFromReaderFactory<Mention> corefIFRF = new MUCCorefIteratorFromReaderFactory(true);
for (Mention m : new ObjectBank(rif, corefIFRF)) {
...
}
As an example of the general power of this class, suppose you have
a collection of files in the directory /u/nlp/data/gre/questions. Each file
contains several Puzzle documents which look like:
<puzzle> <preamble> some text </preamble> <question> some intro text <answer> answer1 </answer> <answer> answer2 </answer> <answer> answer3 </answer> <answer> answer4 </answer> </question> <question> another question <answer> answer1 </answer> <answer> answer2 </answer> <answer> answer3 </answer> <answer> answer4 </answer> </question> </puzzle>First you need to build a ReaderIteratorFactory which will provide java.io.Readers over all the files in your directory:
Collection c = new FileSequentialCollection("/u/nlp/data/gre/questions/", "", false);
ReaderIteratorFactory rif = new ReaderIteratorFactory(c);
Next you need to make an IteratorFromReaderFactory which will take the
java.io.Readers vended by the ReaderIteratorFactory, split them up into
documents (Strings) and
then convert the Strings into Objects. In this case we want to keep everything
between each set of <puzzle> </puzzle> tags so we would use a BeginEndTokenizerFactory.
You would also need to write a class which extends Function and whose apply method
converts the String between the <puzzle> </puzzle> tags into Puzzle objects.
public class PuzzleParser implements Function {
public Object apply (Object o) {
String s = (String)o;
...
Puzzle p = new Puzzle(...);
...
return p;
}
}
Now to build the IteratorFromReaderFactory:
IteratorFromReaderFactory rtif = new BeginEndTokenizerFactory("<puzzle>", "</puzzle>", new PuzzleParser());
Now, to create your ObjectBank you just give it the ReaderIteratorFactory and IteratorFromReaderFactory that you just created:
ObjectBank puzzles = new ObjectBank(rif, rtif);
Now, if you get a new set of puzzles that are located elsewhere and formatted differently you create a new ObjectBank for reading them in and use that ObjectBank instead with only trivial changes (or possible none at all if the ObjectBank is read in on a constructor) to your code. Or even better, if someone else wants to use your code to evaluate their puzzles, which are located elsewhere and formatted differently, they already know what they have to do to make your code work for them.
Modifier and Type | Class and Description |
---|---|
static class |
ObjectBank.PathToFileFunction
This is handy for having getLineIterator return a collection of files for feeding into another ObjectBank.
|
Modifier and Type | Field and Description |
---|---|
protected IteratorFromReaderFactory<E> |
ifrf |
protected ReaderIteratorFactory |
rif |
Constructor and Description |
---|
ObjectBank(ReaderIteratorFactory rif,
IteratorFromReaderFactory<E> ifrf)
This creates a new ObjectBank with the given ReaderIteratorFactory
and ObjectIteratorFactory.
|
Modifier and Type | Method and Description |
---|---|
boolean |
add(E o)
Unsupported Operation.
|
boolean |
addAll(java.util.Collection<? extends E> c)
Unsupported Operation.
|
void |
clear() |
void |
clearMemory()
If you are keeping the contents in memory,
this will clear the memory, and they will be
recomputed the next time iterator() is
called.
|
boolean |
contains(java.lang.Object o)
Can be slow.
|
boolean |
containsAll(java.util.Collection<?> c)
Can be slow.
|
static <X> ObjectBank<X> |
getLineIterator(java.util.Collection<?> filesStringsAndReaders,
java.util.function.Function<java.lang.String,X> op) |
static <X> ObjectBank<X> |
getLineIterator(java.util.Collection<?> filesStringsAndReaders,
java.util.function.Function<java.lang.String,X> op,
java.lang.String encoding) |
static ObjectBank<java.lang.String> |
getLineIterator(java.util.Collection<?> filesStringsAndReaders,
java.lang.String encoding) |
static ObjectBank<java.lang.String> |
getLineIterator(java.io.File file) |
static <X> ObjectBank<X> |
getLineIterator(java.io.File file,
java.util.function.Function<java.lang.String,X> op) |
static <X> ObjectBank<X> |
getLineIterator(java.io.File file,
java.util.function.Function<java.lang.String,X> op,
java.lang.String encoding) |
static ObjectBank<java.lang.String> |
getLineIterator(java.io.File file,
java.lang.String encoding) |
static ObjectBank<java.lang.String> |
getLineIterator(java.io.Reader reader) |
static <X> ObjectBank<X> |
getLineIterator(java.io.Reader reader,
java.util.function.Function<java.lang.String,X> op) |
static ObjectBank<java.lang.String> |
getLineIterator(java.lang.String filename) |
static <X> ObjectBank<X> |
getLineIterator(java.lang.String filename,
java.util.function.Function<java.lang.String,X> op) |
static ObjectBank<java.lang.String> |
getLineIterator(java.lang.String filename,
java.lang.String encoding) |
boolean |
isEmpty() |
java.util.Iterator<E> |
iterator() |
void |
keepInMemory(boolean keep)
Tells the ObjectBank to store all of
its contents in memory so that it doesn't
have to be recomputed each time you iterate
through it.
|
boolean |
remove(java.lang.Object o)
Unsupported Operation.
|
boolean |
removeAll(java.util.Collection<?> c)
Unsupported Operation.
|
boolean |
retainAll(java.util.Collection<?> c)
Unsupported Operation.
|
int |
size()
Can be slow.
|
java.lang.Object[] |
toArray() |
<T> T[] |
toArray(T[] o)
Can be slow.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
protected ReaderIteratorFactory rif
protected IteratorFromReaderFactory<E> ifrf
public ObjectBank(ReaderIteratorFactory rif, IteratorFromReaderFactory<E> ifrf)
rif
- The ReaderIteratorFactory
from which to get Readersifrf
- The IteratorFromReaderFactory
which turns java.io.Readers
into Iterators of Objectspublic static ObjectBank<java.lang.String> getLineIterator(java.lang.String filename)
public static <X> ObjectBank<X> getLineIterator(java.lang.String filename, java.util.function.Function<java.lang.String,X> op)
public static ObjectBank<java.lang.String> getLineIterator(java.lang.String filename, java.lang.String encoding)
public static ObjectBank<java.lang.String> getLineIterator(java.io.Reader reader)
public static <X> ObjectBank<X> getLineIterator(java.io.Reader reader, java.util.function.Function<java.lang.String,X> op)
public static ObjectBank<java.lang.String> getLineIterator(java.io.File file)
public static <X> ObjectBank<X> getLineIterator(java.io.File file, java.util.function.Function<java.lang.String,X> op)
public static ObjectBank<java.lang.String> getLineIterator(java.io.File file, java.lang.String encoding)
public static <X> ObjectBank<X> getLineIterator(java.io.File file, java.util.function.Function<java.lang.String,X> op, java.lang.String encoding)
public static <X> ObjectBank<X> getLineIterator(java.util.Collection<?> filesStringsAndReaders, java.util.function.Function<java.lang.String,X> op)
public static ObjectBank<java.lang.String> getLineIterator(java.util.Collection<?> filesStringsAndReaders, java.lang.String encoding)
public static <X> ObjectBank<X> getLineIterator(java.util.Collection<?> filesStringsAndReaders, java.util.function.Function<java.lang.String,X> op, java.lang.String encoding)
public java.util.Iterator<E> iterator()
public void keepInMemory(boolean keep)
keep
- Whether to keep contents in memorypublic void clearMemory()
public boolean isEmpty()
isEmpty
in interface java.util.Collection<E>
public boolean contains(java.lang.Object o)
contains
in interface java.util.Collection<E>
public boolean containsAll(java.util.Collection<?> c)
containsAll
in interface java.util.Collection<E>
public int size()
size
in interface java.util.Collection<E>
public void clear()
clear
in interface java.util.Collection<E>
public java.lang.Object[] toArray()
toArray
in interface java.util.Collection<E>
public <T> T[] toArray(T[] o)
toArray
in interface java.util.Collection<E>
public boolean add(E o)
add
in interface java.util.Collection<E>
public boolean remove(java.lang.Object o)
remove
in interface java.util.Collection<E>
public boolean addAll(java.util.Collection<? extends E> c)
addAll
in interface java.util.Collection<E>
public boolean removeAll(java.util.Collection<?> c)
removeAll
in interface java.util.Collection<E>
public boolean retainAll(java.util.Collection<?> c)
retainAll
in interface java.util.Collection<E>