StanfordCoreNLP (Stanford JavaNLP API)

java.lang.Object
- edu.stanford.nlp.pipeline.AnnotationPipeline
- - edu.stanford.nlp.pipeline.StanfordCoreNLP

All Implemented Interfaces:

Annotator
```
public class StanfordCoreNLP
extends AnnotationPipeline
```
This is a pipeline that takes in a string and returns various analyzed linguistic forms. The String is tokenized via a tokenizer (using a TokenizerAnnotator), and then other sequence model style annotation can be used to add things like lemmas, POS tags, and named entities. These are returned as a list of CoreLabels. Other analysis components build and store parse trees, dependency graphs, etc. This class is designed to apply multiple Annotators to an Annotation. The idea is that you first build up the pipeline by adding Annotators, and then you take the objects you wish to annotate and pass them in and get in return a fully annotated object. At the command-line level you can, e.g., tokenize text with StanfordCoreNLP with a command like:
```
 java edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit -file document.txt
 
```
Please see the package level javadoc for sample usage and a more complete description. The main entry point for the API is StanfordCoreNLP.process() . Implementation note: There are other annotation pipelines, but they don't extend this one. Look for classes that implement Annotator and which have "Pipeline" in their name.
Author:

Jenny Finkel, Anna Rafferty, Christopher Manning, Mihai Surdeanu, Steven Bethard

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`StanfordCoreNLP.AnnotatorSignature` An annotator name and its associated signature.
`static class`	`StanfordCoreNLP.OutputFormat`

Field Summary

Fields
Modifier and Type	Field and Description
`static java.lang.String`	`CUSTOM_ANNOTATOR_PREFIX`
`static java.lang.String`	`DEFAULT_NEWLINE_IS_SENTENCE_BREAK`
`static java.lang.String`	`DEFAULT_OUTPUT_FORMAT`
`static java.util.Map<StanfordCoreNLP.AnnotatorSignature,Lazy<Annotator>>`	`GLOBAL_ANNOTATOR_CACHE` A global cache of annotators, so we don't have to re-create one if there's enough memory floating around.
`static java.lang.String`	`NEWLINE_IS_SENTENCE_BREAK_PROPERTY`
`static java.lang.String`	`NEWLINE_SPLITTER_PROPERTY`
`AnnotatorPool`	`pool` The annotator pool we should be using to get annotators.

Fields inherited from class edu.stanford.nlp.pipeline.AnnotationPipeline
TIME

Constructor Summary

Constructors
Constructor and Description
`StanfordCoreNLP()` Constructs a pipeline using as properties the properties file found in the classpath
`StanfordCoreNLP(java.util.Properties props)` Construct a basic pipeline.
`StanfordCoreNLP(java.util.Properties props, boolean enforceRequirements)`
`StanfordCoreNLP(java.util.Properties props, boolean enforceRequirements, AnnotatorPool annotatorPool)` Construct a CoreNLP with a custom Annotator Pool.
`StanfordCoreNLP(java.lang.String propsFileNamePrefix)` Constructs a pipeline with the properties read from this file, which must be found in the classpath.
`StanfordCoreNLP(java.lang.String propsFileNamePrefix, boolean enforceRequirements)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`annotate(Annotation annotation)` Run the pipeline on an input annotation.
`void`	`annotate(Annotation annotation, java.util.function.Consumer<Annotation> callback)`
`void`	`annotate(CoreDocument document)` Annotate the CoreDocument wrapper.
`static void`	`clearAnnotatorPool()` Call this if you are no longer using StanfordCoreNLP and want to release the memory associated with the annotators.
`void`	`conllPrint(Annotation annotation, java.io.Writer w)` Displays the output of many annotators in CoNLL format.
`static java.util.function.BiConsumer<Annotation,java.io.OutputStream>`	`createOutputter(java.util.Properties properties, AnnotationOutputter.Options options)` Create an outputter to be passed into `processFiles(String, Collection, int, Properties, BiConsumer, BiConsumer, OutputFormat, boolean)`.
`static java.lang.String`	`ensurePrerequisiteAnnotators(java.lang.String[] annotators, java.util.Properties props)` Take a collection of requested annotators, and produce a list of annotators such that all of the prerequisites for each of the annotators in the input is met.
`protected AnnotatorImplementations`	`getAnnotatorImplementations()` Get the implementation of each relevant annotator in the pipeline.
`static AnnotatorPool`	`getDefaultAnnotatorPool(java.util.Properties inputProps, AnnotatorImplementations annotatorImplementation)` Construct the default annotator pool, and save it as the static annotator pool for CoreNLP.
`java.lang.String`	`getEncoding()`
`static Annotator`	`getExistingAnnotator(java.lang.String name)`
`java.util.Properties`	`getProperties()` Fetches the Properties object used to construct this Annotator.
`void`	`jsonPrint(Annotation annotation, java.io.Writer w)` Displays the output of all annotators in JSON format.
`static void`	`main(java.lang.String[] args)` This can be used just for testing or for command-line text processing.
`void`	`prettyPrint(Annotation annotation, java.io.OutputStream os)` Displays the output of all annotators in a format easily readable by people.
`void`	`prettyPrint(Annotation annotation, java.io.PrintWriter os)` Displays the output of all annotators in a format easily readable by people.
`protected static void`	`printHelp(java.io.PrintStream os, java.lang.String helpTopic)` Prints the list of properties required to run the pipeline
`Annotation`	`process(java.lang.String text)` Runs the entire pipeline on the content of the given text passed in.
`void`	`processFiles(java.util.Collection<java.io.File> files, boolean clearPool, java.util.Optional<Timing> tim)`
`void`	`processFiles(java.util.Collection<java.io.File> files, int numThreads, boolean clearPool, java.util.Optional<Timing> tim)`
`void`	`processFiles(java.lang.String base, java.util.Collection<java.io.File> files, int numThreads, boolean clearPool, java.util.Optional<Timing> tim)` Process a collection of files.
`protected static void`	`processFiles(java.lang.String base, java.util.Collection<java.io.File> files, int numThreads, java.util.Properties properties, java.util.function.BiConsumer<Annotation,java.util.function.Consumer<Annotation>> annotate, java.util.function.BiConsumer<Annotation,java.io.OutputStream> print, StanfordCoreNLP.OutputFormat outputFormat, boolean clearPool)`
`protected static void`	`processFiles(java.lang.String base, java.util.Collection<java.io.File> files, int numThreads, java.util.Properties properties, java.util.function.BiConsumer<Annotation,java.util.function.Consumer<Annotation>> annotate, java.util.function.BiConsumer<Annotation,java.io.OutputStream> print, StanfordCoreNLP.OutputFormat outputFormat, boolean clearPool, java.util.Optional<StanfordCoreNLP> pipeline, java.util.Optional<Timing> tim)` A common method for processing a set of files, used in both `StanfordCoreNLP` as well as `StanfordCoreNLPClient`.
`CoreDocument`	`processToCoreDocument(java.lang.String text)` Runs the entire pipeline on the content of the given text passed in.
`protected static java.util.Collection<java.io.File>`	`readFileList(java.lang.String fileName)`
`void`	`run()`
`void`	`run(boolean clearPool)`
`java.lang.String`	`timingInformation()` Return a String that gives detailed human-readable information about how much time was spent by each annotator and by the entire annotation pipeline.
`static boolean`	`usesBinaryTrees(java.util.Properties props)` Determines whether the parser annotator should default to producing binary trees.
`void`	`xmlPrint(Annotation annotation, java.io.OutputStream os)` Displays the output of all annotators in XML format.
`void`	`xmlPrint(Annotation annotation, java.io.Writer w)` Wrapper around xmlPrint(Annotation, OutputStream).

Methods inherited from class edu.stanford.nlp.pipeline.AnnotationPipeline
addAnnotator, annotate, annotate, annotate, annotate, getTotalTime, requirementsSatisfied, requires

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface edu.stanford.nlp.pipeline.Annotator
exactRequirements, unmount

- Field Detail
  - GLOBAL_ANNOTATOR_CACHE
```
public static final java.util.Map<StanfordCoreNLP.AnnotatorSignature,Lazy<Annotator>> GLOBAL_ANNOTATOR_CACHE
```
    A global cache of annotators, so we don't have to re-create one if there's enough memory floating around.
  - CUSTOM_ANNOTATOR_PREFIX
```
public static final java.lang.String CUSTOM_ANNOTATOR_PREFIX
```
    See Also:
    
    Constant Field Values
  - NEWLINE_SPLITTER_PROPERTY
```
public static final java.lang.String NEWLINE_SPLITTER_PROPERTY
```
    See Also:
    
    Constant Field Values
  - NEWLINE_IS_SENTENCE_BREAK_PROPERTY
```
public static final java.lang.String NEWLINE_IS_SENTENCE_BREAK_PROPERTY
```
    See Also:
    
    Constant Field Values
  - DEFAULT_NEWLINE_IS_SENTENCE_BREAK
```
public static final java.lang.String DEFAULT_NEWLINE_IS_SENTENCE_BREAK
```
    See Also:
    
    Constant Field Values
  - DEFAULT_OUTPUT_FORMAT
```
public static final java.lang.String DEFAULT_OUTPUT_FORMAT
```
    See Also:
    
    Constant Field Values
  - pool
```
public final AnnotatorPool pool
```
    The annotator pool we should be using to get annotators.
- Constructor Detail
  - StanfordCoreNLP
```
public StanfordCoreNLP()
```
    Constructs a pipeline using as properties the properties file found in the classpath
  - StanfordCoreNLP
```
public StanfordCoreNLP(java.util.Properties props)
```
    Construct a basic pipeline. The Properties will be used to determine which annotators to create, and a default AnnotatorPool will be used to create the annotators.
  - StanfordCoreNLP
```
public StanfordCoreNLP(java.util.Properties props,
                       boolean enforceRequirements)
```
  - StanfordCoreNLP
```
public StanfordCoreNLP(java.lang.String propsFileNamePrefix)
```
    Constructs a pipeline with the properties read from this file, which must be found in the classpath.
    
    Parameters:
    
    propsFileNamePrefix - Filename/resource name of properties file without extension
  - StanfordCoreNLP
```
public StanfordCoreNLP(java.lang.String propsFileNamePrefix,
                       boolean enforceRequirements)
```
  - StanfordCoreNLP
```
public StanfordCoreNLP(java.util.Properties props,
                       boolean enforceRequirements,
                       AnnotatorPool annotatorPool)
```
    Construct a CoreNLP with a custom Annotator Pool.
- Method Detail
  - getAnnotatorImplementations
```
protected AnnotatorImplementations getAnnotatorImplementations()
```
    Get the implementation of each relevant annotator in the pipeline. The primary use of this method is to be overwritten by subclasses of StanfordCoreNLP to call different annotators that obey the exact same contract as the default annotator.
    The canonical use case for this is as an implementation of the Curator server, where the annotators make server calls rather than calling each annotator locally.
    
    Returns:
    
    A class which specifies the actual implementation of each of the annotators called when creating the annotator pool. The canonical annotators are defaulted to in AnnotatorImplementations.
  - getProperties
```
public java.util.Properties getProperties()
```
    Fetches the Properties object used to construct this Annotator.
  - getEncoding
```
public java.lang.String getEncoding()
```
  - ensurePrerequisiteAnnotators
```
public static java.lang.String ensurePrerequisiteAnnotators(java.lang.String[] annotators,
                                                            java.util.Properties props)
```
    Take a collection of requested annotators, and produce a list of annotators such that all of the prerequisites for each of the annotators in the input is met. For example, if the user requests lemma, ensure that pos is also run because lemma depends on pos. As a side effect, this function orders the annotators in the proper order. Note that this is not guaranteed to return a valid set of annotators, as properties passed to the annotators can change their requirements.
    
    Parameters:
    
    annotators - The annotators the user has requested.
    
    Returns:
    
    A sanitized annotators string with all prerequisites met.
  - clearAnnotatorPool
```
public static void clearAnnotatorPool()
```
    Call this if you are no longer using StanfordCoreNLP and want to release the memory associated with the annotators.
  - getDefaultAnnotatorPool
```
public static AnnotatorPool getDefaultAnnotatorPool(java.util.Properties inputProps,
                                                    AnnotatorImplementations annotatorImplementation)
```
    Construct the default annotator pool, and save it as the static annotator pool for CoreNLP.
    
    See Also:
    
    constructAnnotatorPool(Properties, AnnotatorImplementations)
  - getExistingAnnotator
```
public static Annotator getExistingAnnotator(java.lang.String name)
```
  - annotate
```
public void annotate(CoreDocument document)
```
    Annotate the CoreDocument wrapper.
  - annotate
```
public void annotate(Annotation annotation)
```
    Run the pipeline on an input annotation. The annotation is modified in place.
    
    Specified by:
    
    annotate in interface Annotator
    
    Overrides:
    
    annotate in class AnnotationPipeline
    
    Parameters:
    
    annotation - The input annotation, usually a raw document
  - annotate
```
public void annotate(Annotation annotation,
                     java.util.function.Consumer<Annotation> callback)
```
  - usesBinaryTrees
```
public static boolean usesBinaryTrees(java.util.Properties props)
```
    Determines whether the parser annotator should default to producing binary trees. Currently there is only one condition under which this is true: the sentiment annotator is used.
  - process
```
public Annotation process(java.lang.String text)
```
    Runs the entire pipeline on the content of the given text passed in.
    
    Parameters:
    
    text - The text to process
    
    Returns:
    
    An Annotation object containing the output of all annotators
  - processToCoreDocument
```
public CoreDocument processToCoreDocument(java.lang.String text)
```
    Runs the entire pipeline on the content of the given text passed in.
    
    Parameters:
    
    text - The text to process
    
    Returns:
    
    An Annotation object containing the output of all annotators
  - prettyPrint
```
public void prettyPrint(Annotation annotation,
                        java.io.OutputStream os)
```
    Displays the output of all annotators in a format easily readable by people.
    
    Parameters:
    
    annotation - Contains the output of all annotators
    
    os - The output stream
  - prettyPrint
```
public void prettyPrint(Annotation annotation,
                        java.io.PrintWriter os)
```
    Displays the output of all annotators in a format easily readable by people.
    
    Parameters:
    
    annotation - Contains the output of all annotators
    
    os - The output stream
  - xmlPrint
```
public void xmlPrint(Annotation annotation,
                     java.io.Writer w)
              throws java.io.IOException
```
    Wrapper around xmlPrint(Annotation, OutputStream). Added for backward compatibility.
    
    Parameters:
    
    annotation - The Annotation to print
    
    w - The Writer to send the output to
    
    Throws:
    
    java.io.IOException - If any IO problem
  - xmlPrint
```
public void xmlPrint(Annotation annotation,
                     java.io.OutputStream os)
              throws java.io.IOException
```
    Displays the output of all annotators in XML format.
    
    Parameters:
    
    annotation - Contains the output of all annotators
    
    os - The output stream
    
    Throws:
    
    java.io.IOException - If any IO problem
  - jsonPrint
```
public void jsonPrint(Annotation annotation,
                      java.io.Writer w)
               throws java.io.IOException
```
    Displays the output of all annotators in JSON format.
    
    Parameters:
    
    annotation - Contains the output of all annotators
    
    w - The Writer to send the output to
    
    Throws:
    
    java.io.IOException - If any IO problem
  - conllPrint
```
public void conllPrint(Annotation annotation,
                       java.io.Writer w)
                throws java.io.IOException
```
    Displays the output of many annotators in CoNLL format. (Only used by CoreNLPServelet.)
    
    Parameters:
    
    annotation - Contains the output of all annotators
    
    w - The Writer to send the output to
    
    Throws:
    
    java.io.IOException - If any IO problem
  - printHelp
```
protected static void printHelp(java.io.PrintStream os,
                                java.lang.String helpTopic)
```
    Prints the list of properties required to run the pipeline
    
    Parameters:
    
    os - PrintStream to print usage to
    
    helpTopic - a topic to print help about (or null for general options)
  - timingInformation
```
public java.lang.String timingInformation()
```
    Return a String that gives detailed human-readable information about how much time was spent by each annotator and by the entire annotation pipeline. This String includes newline characters but does not end with one, and so it is suitable to be printed out with a println().
    
    Overrides:
    
    timingInformation in class AnnotationPipeline
    
    Returns:
    
    Human readable information on time spent in processing.
  - readFileList
```
protected static java.util.Collection<java.io.File> readFileList(java.lang.String fileName)
```
  - createOutputter
```
public static java.util.function.BiConsumer<Annotation,java.io.OutputStream> createOutputter(java.util.Properties properties,
                                                                                             AnnotationOutputter.Options options)
```
    Create an outputter to be passed into processFiles(String, Collection, int, Properties, BiConsumer, BiConsumer, OutputFormat, boolean).
    
    Parameters:
    
    properties - The properties file to use.
    
    Returns:
    
    A consumer that can be passed into the processFiles method.
  - processFiles
```
public void processFiles(java.lang.String base,
                         java.util.Collection<java.io.File> files,
                         int numThreads,
                         boolean clearPool,
                         java.util.Optional<Timing> tim)
                  throws java.io.IOException
```
    Process a collection of files.
    
    Parameters:
    
    base - The base input directory to process from.
    
    files - The files to process.
    
    numThreads - The number of threads to annotate on.
    
    clearPool - Whether or not to clear pool when process is done
    
    Throws:
    
    java.io.IOException
  - processFiles
```
protected static void processFiles(java.lang.String base,
                                   java.util.Collection<java.io.File> files,
                                   int numThreads,
                                   java.util.Properties properties,
                                   java.util.function.BiConsumer<Annotation,java.util.function.Consumer<Annotation>> annotate,
                                   java.util.function.BiConsumer<Annotation,java.io.OutputStream> print,
                                   StanfordCoreNLP.OutputFormat outputFormat,
                                   boolean clearPool)
                            throws java.io.IOException
```
    Throws:
    
    java.io.IOException
  - processFiles
```
protected static void processFiles(java.lang.String base,
                                   java.util.Collection<java.io.File> files,
                                   int numThreads,
                                   java.util.Properties properties,
                                   java.util.function.BiConsumer<Annotation,java.util.function.Consumer<Annotation>> annotate,
                                   java.util.function.BiConsumer<Annotation,java.io.OutputStream> print,
                                   StanfordCoreNLP.OutputFormat outputFormat,
                                   boolean clearPool,
                                   java.util.Optional<StanfordCoreNLP> pipeline,
                                   java.util.Optional<Timing> tim)
                            throws java.io.IOException
```
    A common method for processing a set of files, used in both StanfordCoreNLP as well as StanfordCoreNLPClient.
    
    Parameters:
    
    base - The base input directory to process from.
    
    files - The files to process.
    
    numThreads - The number of threads to annotate on.
    
    properties - The properties file to use during annotation. This should match the properties file used in the implementation of the annotate function.
    
    annotate - The function used to annotate a document.
    
    print - The function used to print a document.
    
    outputFormat - The format used for printing out documents
    
    clearPool - Whether or not to clear the pool when done
    
    pipeline - the pipeline annotating the objects
    
    tim - the Timing object for this annotation run
    
    Throws:
    
    java.io.IOException - If any IO problem
  - processFiles
```
public void processFiles(java.util.Collection<java.io.File> files,
                         int numThreads,
                         boolean clearPool,
                         java.util.Optional<Timing> tim)
                  throws java.io.IOException
```
    Throws:
    
    java.io.IOException
  - processFiles
```
public void processFiles(java.util.Collection<java.io.File> files,
                         boolean clearPool,
                         java.util.Optional<Timing> tim)
                  throws java.io.IOException
```
    Throws:
    
    java.io.IOException
  - run
```
public void run()
         throws java.io.IOException
```
    Throws:
    
    java.io.IOException
  - run
```
public void run(boolean clearPool)
         throws java.io.IOException
```
    Throws:
    
    java.io.IOException
  - main
```
public static void main(java.lang.String[] args)
                 throws java.io.IOException
```
    This can be used just for testing or for command-line text processing. This runs the pipeline you specify on the text in the file that you specify and sends some results to stdout. The current code in this main method assumes that each line of the file is to be processed separately as a single sentence.
    Example usage:
    java -mx6g edu.stanford.nlp.pipeline.StanfordCoreNLP properties
    
    Parameters:
    
    args - List of required properties
    
    Throws:
    
    java.io.IOException - If IO problem

Class StanfordCoreNLP

Nested Class Summary

Field Summary

Fields inherited from class edu.stanford.nlp.pipeline.AnnotationPipeline

Fields inherited from interface edu.stanford.nlp.pipeline.Annotator

Constructor Summary

Method Summary

Methods inherited from class edu.stanford.nlp.pipeline.AnnotationPipeline

Methods inherited from class java.lang.Object

Methods inherited from interface edu.stanford.nlp.pipeline.Annotator

Field Detail

GLOBAL_ANNOTATOR_CACHE

CUSTOM_ANNOTATOR_PREFIX

NEWLINE_SPLITTER_PROPERTY

NEWLINE_IS_SENTENCE_BREAK_PROPERTY

DEFAULT_NEWLINE_IS_SENTENCE_BREAK

DEFAULT_OUTPUT_FORMAT

pool

Constructor Detail

StanfordCoreNLP

StanfordCoreNLP

StanfordCoreNLP

StanfordCoreNLP

StanfordCoreNLP

StanfordCoreNLP

Method Detail

getAnnotatorImplementations

getProperties

getEncoding

ensurePrerequisiteAnnotators

clearAnnotatorPool

getDefaultAnnotatorPool

getExistingAnnotator

annotate

annotate

annotate

usesBinaryTrees

process

processToCoreDocument

prettyPrint

prettyPrint

xmlPrint

xmlPrint

jsonPrint

conllPrint

printHelp

timingInformation

readFileList

createOutputter

processFiles

processFiles

processFiles

processFiles

processFiles

run

run

main