public class StanfordCoreNLP extends AnnotationPipeline
java edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit -file document.txt
Modifier and Type | Class and Description |
---|---|
static class |
StanfordCoreNLP.AnnotatorSignature
An annotator name and its associated signature.
|
static class |
StanfordCoreNLP.OutputFormat |
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
CUSTOM_ANNOTATOR_PREFIX |
static java.lang.String |
DEFAULT_NEWLINE_IS_SENTENCE_BREAK |
static java.lang.String |
DEFAULT_OUTPUT_FORMAT |
static java.util.Map<StanfordCoreNLP.AnnotatorSignature,Lazy<Annotator>> |
GLOBAL_ANNOTATOR_CACHE
A global cache of annotators, so we don't have to re-create one if there's enough memory floating around.
|
static java.lang.String |
NEWLINE_IS_SENTENCE_BREAK_PROPERTY |
static java.lang.String |
NEWLINE_SPLITTER_PROPERTY |
AnnotatorPool |
pool
The annotator pool we should be using to get annotators.
|
TIME
DEFAULT_REQUIREMENTS, STANFORD_CDC_TOKENIZE, STANFORD_CLEAN_XML, STANFORD_COLUMN_DATA_CLASSIFIER, STANFORD_COREF, STANFORD_COREF_MENTION, STANFORD_DEPENDENCIES, STANFORD_DETERMINISTIC_COREF, STANFORD_DOCDATE, STANFORD_ENTITY_MENTIONS, STANFORD_GENDER, STANFORD_KBP, STANFORD_LEMMA, STANFORD_LINK, STANFORD_MWT, STANFORD_NATLOG, STANFORD_NER, STANFORD_OPENIE, STANFORD_PARSE, STANFORD_POS, STANFORD_QUOTE, STANFORD_QUOTE_ATTRIBUTION, STANFORD_REGEXNER, STANFORD_RELATION, STANFORD_SENTIMENT, STANFORD_SSPLIT, STANFORD_TOKENIZE, STANFORD_TOKENSREGEX, STANFORD_TRUECASE, STANFORD_UD_FEATURES
Constructor and Description |
---|
StanfordCoreNLP()
Constructs a pipeline using as properties the properties file found in the classpath
|
StanfordCoreNLP(java.util.Properties props)
Construct a basic pipeline.
|
StanfordCoreNLP(java.util.Properties props,
boolean enforceRequirements) |
StanfordCoreNLP(java.util.Properties props,
boolean enforceRequirements,
AnnotatorPool annotatorPool)
Construct a CoreNLP with a custom Annotator Pool.
|
StanfordCoreNLP(java.lang.String propsFileNamePrefix)
Constructs a pipeline with the properties read from this file, which must be found in the classpath.
|
StanfordCoreNLP(java.lang.String propsFileNamePrefix,
boolean enforceRequirements) |
Modifier and Type | Method and Description |
---|---|
void |
annotate(Annotation annotation)
Run the pipeline on an input annotation.
|
void |
annotate(Annotation annotation,
java.util.function.Consumer<Annotation> callback) |
void |
annotate(CoreDocument document)
Annotate the CoreDocument wrapper.
|
static void |
clearAnnotatorPool()
Call this if you are no longer using StanfordCoreNLP and want to
release the memory associated with the annotators.
|
void |
conllPrint(Annotation annotation,
java.io.Writer w)
Displays the output of many annotators in CoNLL format.
|
static java.util.function.BiConsumer<Annotation,java.io.OutputStream> |
createOutputter(java.util.Properties properties,
AnnotationOutputter.Options options)
Create an outputter to be passed into
processFiles(String, Collection, int, Properties, BiConsumer, BiConsumer, OutputFormat, boolean) . |
static java.lang.String |
ensurePrerequisiteAnnotators(java.lang.String[] annotators,
java.util.Properties props)
Take a collection of requested annotators, and produce a list of annotators such that all of the
prerequisites for each of the annotators in the input is met.
|
protected AnnotatorImplementations |
getAnnotatorImplementations()
Get the implementation of each relevant annotator in the pipeline.
|
static AnnotatorPool |
getDefaultAnnotatorPool(java.util.Properties inputProps,
AnnotatorImplementations annotatorImplementation)
Construct the default annotator pool, and save it as the static annotator pool
for CoreNLP.
|
java.lang.String |
getEncoding() |
static Annotator |
getExistingAnnotator(java.lang.String name) |
java.util.Properties |
getProperties()
Fetches the Properties object used to construct this Annotator.
|
void |
jsonPrint(Annotation annotation,
java.io.Writer w)
Displays the output of all annotators in JSON format.
|
static void |
main(java.lang.String[] args)
This can be used just for testing or for command-line text processing.
|
void |
prettyPrint(Annotation annotation,
java.io.OutputStream os)
Displays the output of all annotators in a format easily readable by people.
|
void |
prettyPrint(Annotation annotation,
java.io.PrintWriter os)
Displays the output of all annotators in a format easily readable by people.
|
protected static void |
printHelp(java.io.PrintStream os,
java.lang.String helpTopic)
Prints the list of properties required to run the pipeline
|
Annotation |
process(java.lang.String text)
Runs the entire pipeline on the content of the given text passed in.
|
void |
processFiles(java.util.Collection<java.io.File> files,
boolean clearPool,
java.util.Optional<Timing> tim) |
void |
processFiles(java.util.Collection<java.io.File> files,
int numThreads,
boolean clearPool,
java.util.Optional<Timing> tim) |
void |
processFiles(java.lang.String base,
java.util.Collection<java.io.File> files,
int numThreads,
boolean clearPool,
java.util.Optional<Timing> tim)
Process a collection of files.
|
protected static void |
processFiles(java.lang.String base,
java.util.Collection<java.io.File> files,
int numThreads,
java.util.Properties properties,
java.util.function.BiConsumer<Annotation,java.util.function.Consumer<Annotation>> annotate,
java.util.function.BiConsumer<Annotation,java.io.OutputStream> print,
StanfordCoreNLP.OutputFormat outputFormat,
boolean clearPool) |
protected static void |
processFiles(java.lang.String base,
java.util.Collection<java.io.File> files,
int numThreads,
java.util.Properties properties,
java.util.function.BiConsumer<Annotation,java.util.function.Consumer<Annotation>> annotate,
java.util.function.BiConsumer<Annotation,java.io.OutputStream> print,
StanfordCoreNLP.OutputFormat outputFormat,
boolean clearPool,
java.util.Optional<StanfordCoreNLP> pipeline,
java.util.Optional<Timing> tim)
A common method for processing a set of files, used in both
StanfordCoreNLP as well as
StanfordCoreNLPClient . |
CoreDocument |
processToCoreDocument(java.lang.String text)
Runs the entire pipeline on the content of the given text passed in.
|
protected static java.util.Collection<java.io.File> |
readFileList(java.lang.String fileName) |
void |
run() |
void |
run(boolean clearPool) |
java.lang.String |
timingInformation()
Return a String that gives detailed human-readable information about
how much time was spent by each annotator and by the entire annotation
pipeline.
|
static boolean |
usesBinaryTrees(java.util.Properties props)
Determines whether the parser annotator should default to
producing binary trees.
|
void |
xmlPrint(Annotation annotation,
java.io.OutputStream os)
Displays the output of all annotators in XML format.
|
void |
xmlPrint(Annotation annotation,
java.io.Writer w)
Wrapper around xmlPrint(Annotation, OutputStream).
|
addAnnotator, annotate, annotate, annotate, annotate, getTotalTime, requirementsSatisfied, requires
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
exactRequirements, unmount
public static final java.util.Map<StanfordCoreNLP.AnnotatorSignature,Lazy<Annotator>> GLOBAL_ANNOTATOR_CACHE
public static final java.lang.String CUSTOM_ANNOTATOR_PREFIX
public static final java.lang.String NEWLINE_SPLITTER_PROPERTY
public static final java.lang.String NEWLINE_IS_SENTENCE_BREAK_PROPERTY
public static final java.lang.String DEFAULT_NEWLINE_IS_SENTENCE_BREAK
public static final java.lang.String DEFAULT_OUTPUT_FORMAT
public final AnnotatorPool pool
public StanfordCoreNLP()
public StanfordCoreNLP(java.util.Properties props)
public StanfordCoreNLP(java.util.Properties props, boolean enforceRequirements)
public StanfordCoreNLP(java.lang.String propsFileNamePrefix)
propsFileNamePrefix
- Filename/resource name of properties file without extensionpublic StanfordCoreNLP(java.lang.String propsFileNamePrefix, boolean enforceRequirements)
public StanfordCoreNLP(java.util.Properties props, boolean enforceRequirements, AnnotatorPool annotatorPool)
protected AnnotatorImplementations getAnnotatorImplementations()
The canonical use case for this is as an implementation of the Curator server, where the annotators make server calls rather than calling each annotator locally.
AnnotatorImplementations
.public java.util.Properties getProperties()
public java.lang.String getEncoding()
public static java.lang.String ensurePrerequisiteAnnotators(java.lang.String[] annotators, java.util.Properties props)
annotators
- The annotators the user has requested.public static void clearAnnotatorPool()
public static AnnotatorPool getDefaultAnnotatorPool(java.util.Properties inputProps, AnnotatorImplementations annotatorImplementation)
public static Annotator getExistingAnnotator(java.lang.String name)
public void annotate(CoreDocument document)
public void annotate(Annotation annotation)
annotate
in interface Annotator
annotate
in class AnnotationPipeline
annotation
- The input annotation, usually a raw documentpublic void annotate(Annotation annotation, java.util.function.Consumer<Annotation> callback)
public static boolean usesBinaryTrees(java.util.Properties props)
public Annotation process(java.lang.String text)
text
- The text to processpublic CoreDocument processToCoreDocument(java.lang.String text)
text
- The text to processpublic void prettyPrint(Annotation annotation, java.io.OutputStream os)
annotation
- Contains the output of all annotatorsos
- The output streampublic void prettyPrint(Annotation annotation, java.io.PrintWriter os)
annotation
- Contains the output of all annotatorsos
- The output streampublic void xmlPrint(Annotation annotation, java.io.Writer w) throws java.io.IOException
annotation
- The Annotation to printw
- The Writer to send the output tojava.io.IOException
- If any IO problempublic void xmlPrint(Annotation annotation, java.io.OutputStream os) throws java.io.IOException
annotation
- Contains the output of all annotatorsos
- The output streamjava.io.IOException
- If any IO problempublic void jsonPrint(Annotation annotation, java.io.Writer w) throws java.io.IOException
annotation
- Contains the output of all annotatorsw
- The Writer to send the output tojava.io.IOException
- If any IO problempublic void conllPrint(Annotation annotation, java.io.Writer w) throws java.io.IOException
annotation
- Contains the output of all annotatorsw
- The Writer to send the output tojava.io.IOException
- If any IO problemprotected static void printHelp(java.io.PrintStream os, java.lang.String helpTopic)
os
- PrintStream to print usage tohelpTopic
- a topic to print help about (or null for general options)public java.lang.String timingInformation()
println()
.timingInformation
in class AnnotationPipeline
protected static java.util.Collection<java.io.File> readFileList(java.lang.String fileName)
public static java.util.function.BiConsumer<Annotation,java.io.OutputStream> createOutputter(java.util.Properties properties, AnnotationOutputter.Options options)
processFiles(String, Collection, int, Properties, BiConsumer, BiConsumer, OutputFormat, boolean)
.properties
- The properties file to use.public void processFiles(java.lang.String base, java.util.Collection<java.io.File> files, int numThreads, boolean clearPool, java.util.Optional<Timing> tim) throws java.io.IOException
base
- The base input directory to process from.files
- The files to process.numThreads
- The number of threads to annotate on.clearPool
- Whether or not to clear pool when process is donejava.io.IOException
protected static void processFiles(java.lang.String base, java.util.Collection<java.io.File> files, int numThreads, java.util.Properties properties, java.util.function.BiConsumer<Annotation,java.util.function.Consumer<Annotation>> annotate, java.util.function.BiConsumer<Annotation,java.io.OutputStream> print, StanfordCoreNLP.OutputFormat outputFormat, boolean clearPool) throws java.io.IOException
java.io.IOException
protected static void processFiles(java.lang.String base, java.util.Collection<java.io.File> files, int numThreads, java.util.Properties properties, java.util.function.BiConsumer<Annotation,java.util.function.Consumer<Annotation>> annotate, java.util.function.BiConsumer<Annotation,java.io.OutputStream> print, StanfordCoreNLP.OutputFormat outputFormat, boolean clearPool, java.util.Optional<StanfordCoreNLP> pipeline, java.util.Optional<Timing> tim) throws java.io.IOException
StanfordCoreNLP
as well as
StanfordCoreNLPClient
.base
- The base input directory to process from.files
- The files to process.numThreads
- The number of threads to annotate on.properties
- The properties file to use during annotation.
This should match the properties file used in the implementation of the annotate function.annotate
- The function used to annotate a document.print
- The function used to print a document.outputFormat
- The format used for printing out documentsclearPool
- Whether or not to clear the pool when donepipeline
- the pipeline annotating the objectstim
- the Timing object for this annotation runjava.io.IOException
- If any IO problempublic void processFiles(java.util.Collection<java.io.File> files, int numThreads, boolean clearPool, java.util.Optional<Timing> tim) throws java.io.IOException
java.io.IOException
public void processFiles(java.util.Collection<java.io.File> files, boolean clearPool, java.util.Optional<Timing> tim) throws java.io.IOException
java.io.IOException
public void run() throws java.io.IOException
java.io.IOException
public void run(boolean clearPool) throws java.io.IOException
java.io.IOException
public static void main(java.lang.String[] args) throws java.io.IOException
Example usage:
java -mx6g edu.stanford.nlp.pipeline.StanfordCoreNLP properties
args
- List of required propertiesjava.io.IOException
- If IO problem