StripTagsProcessor (Stanford JavaNLP API)

Skip navigation links

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- edu.stanford.nlp.process.AbstractListProcessor<Word,Word,L,F>
- - edu.stanford.nlp.process.StripTagsProcessor<L,F>

Type Parameters:

L - The type of the labels

F - The type of the features

All Implemented Interfaces:

DocumentProcessor<Word,Word,L,F>, ListProcessor<Word,Word>
```
public class StripTagsProcessor<L,F>
extends AbstractListProcessor<Word,Word,L,F>
```
A Processor whose process method deletes all SGML/XML/HTML tags (tokens starting with < and ending with >. Optionally, newlines can be inserted after the end of block-level tags to roughly simulate where continuous text was broken up (this helps finding sentence boundaries for example).

Author:

Christopher Manning, Sarah Spikes (sdspikes@cs.stanford.edu) (Templatization)

Field Summary

Fields
Modifier and Type Field and Description

static java.util.Set<java.lang.String> blockTags
Block-level HTML tags that are rendered with surrounding line breaks.

Constructor Summary

Constructors
Constructor and Description
`StripTagsProcessor()` Constructs a new StripTagsProcessor that doesn't mark line breaks.
`StripTagsProcessor(boolean markLineBreaks)` Constructs a new StripTagProcessor that marks line breaks as specified.

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`boolean`	`getMarkLineBreaks()` Returns whether the output of the processor will contain newline words ("\n") at the end of block-level tags.
`static void`	`main(java.lang.String[] args)` For internal debugging purposes only.
`java.util.List<Word>`	`process(java.util.List<? extends Word> in)` Returns a new Document with the same meta-data as `in`, and the same words except tags are stripped.
`void`	`setMarkLineBreaks(boolean markLineBreaks)` Sets whether the output of the processor will contain newline words ("\n") at the end of block-level tags.

Methods inherited from class edu.stanford.nlp.process.AbstractListProcessor
processDocument, processLists

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - blockTags
```
public static final java.util.Set<java.lang.String> blockTags
```
    Block-level HTML tags that are rendered with surrounding line breaks.
- Constructor Detail
  - StripTagsProcessor
```
public StripTagsProcessor()
```
    Constructs a new StripTagsProcessor that doesn't mark line breaks.
  - StripTagsProcessor
```
public StripTagsProcessor(boolean markLineBreaks)
```
    Constructs a new StripTagProcessor that marks line breaks as specified.
- Method Detail
  - getMarkLineBreaks
```
public boolean getMarkLineBreaks()
```
    Returns whether the output of the processor will contain newline words ("\n") at the end of block-level tags.
    
    Returns:
    
    Whether the output of the processor will contain newline words ("\n") at the end of block-level tags.
  - setMarkLineBreaks
```
public void setMarkLineBreaks(boolean markLineBreaks)
```
    Sets whether the output of the processor will contain newline words ("\n") at the end of block-level tags.
  - process
```
public java.util.List<Word> process(java.util.List<? extends Word> in)
```
    Returns a new Document with the same meta-data as in, and the same words except tags are stripped.
  - main
```
public static void main(java.lang.String[] args)
```
    For internal debugging purposes only.

Skip navigation links

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

Stanford NLP Group