readTokenFile
public static java.util.Map<java.lang.Integer,java.util.List<CoreLabel>> readTokenFile(java.lang.String filename,
Annotation novel)
The main output here is data/tokens/dickens.oliver.tokens, which contains the original book, one token per line, with part of speech, syntax, NER, coreference and other annotations. The (tab-separated) format is:
Paragraph id
Sentence id
Token id
Byte start
Byte end
Whitespace following the token (useful for pretty-printing the original text)
Syntactic head id (-1 for the sentence root)
Original token
Normalized token (for quotes etc.)
Lemma
Penn Treebank POS tag
NER tag (PERSON, NUMBER, DATE, DURATION, MISC, TIME, LOCATION, ORDINAL, MONEY, ORGANIZATION, SET, O)
Stanford basic dependency label
Within-quotation flag
Character id (all coreferent tokens share the same character id)
- Parameters:
filename
-