ChineseLexicon (Stanford JavaNLP API)

java.lang.Object
- edu.stanford.nlp.parser.lexparser.BaseLexicon
- - edu.stanford.nlp.parser.lexparser.ChineseLexicon

All Implemented Interfaces:

Lexicon, java.io.Serializable
```
public class ChineseLexicon
extends BaseLexicon
```
A lexicon class for Chinese. Extends the (English) BaseLexicon class, overriding its score and train methods to include a ChineseUnknownWordModel.

Author:

Roger Levy

See Also:

Serialized Form

Field Summary

Fields
Modifier and Type Field and Description

boolean useCharBasedUnknownWordModel

boolean useGoodTuringUnknownWordModel
- Fields inherited from class edu.stanford.nlp.parser.lexparser.BaseLexicon
  DEBUG_LEXICON, DEBUG_LEXICON_SCORE, flexiTag, NULL_ITW, nullTag, nullWord, op, rulesWithWord, seenCounter, smartMutation, smoothInUnknownsThreshold, tagIndex, tags, testOptions, trainOptions, useSignatureForKnownSmoothing, uwModel, uwModelTrainer, uwModelTrainerClass, wordIndex, words
- Fields inherited from interface edu.stanford.nlp.parser.lexparser.Lexicon
  BOUNDARY, BOUNDARY_TAG, UNKNOWN_WORD

Fields
Modifier and Type	Field and Description
`boolean`	`useCharBasedUnknownWordModel`
`boolean`	`useGoodTuringUnknownWordModel`

Constructor Summary

Constructors
Constructor and Description
`ChineseLexicon(Options op, ChineseTreebankParserParams params, Index<java.lang.String> wordIndex, Index<java.lang.String> tagIndex)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`float`	`score(IntTaggedWord iTW, int loc, java.lang.String word, java.lang.String featureSpec)` Get the score of this word with this tag (as an IntTaggedWord) at this location.

Methods inherited from class edu.stanford.nlp.parser.lexparser.BaseLexicon
addAll, addAll, addTagging, evaluateCoverage, examineIntersection, finishTraining, getBaseTag, getUnknownWordModel, incrementTreesRead, initializeTraining, initRulesWithWord, isKnown, isKnown, listToEvents, main, numRules, printLexStats, readData, ruleIteratorByWord, ruleIteratorByWord, ruleIteratorByWord, setUnknownWordModel, tagSet, train, train, train, train, train, train, trainUnannotated, trainWithExpansion, treeToEvents, tune, writeData

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - useCharBasedUnknownWordModel
```
public final boolean useCharBasedUnknownWordModel
```
  - useGoodTuringUnknownWordModel
```
public final boolean useGoodTuringUnknownWordModel
```
- Constructor Detail
  - ChineseLexicon
```
public ChineseLexicon(Options op,
                      ChineseTreebankParserParams params,
                      Index<java.lang.String> wordIndex,
                      Index<java.lang.String> tagIndex)
```
- Method Detail
  - score
```
public float score(IntTaggedWord iTW,
                   int loc,
                   java.lang.String word,
                   java.lang.String featureSpec)
```
    Description copied from class: BaseLexicon
    
    Get the score of this word with this tag (as an IntTaggedWord) at this location. (Presumably an estimate of P(word | tag).)
    Implementation documentation: Seen: c_W = count(W) c_TW = count(T,W) c_T = count(T) c_Tunseen = count(T) among new words in 2nd half total = count(seen words) totalUnseen = count("unseen" words) p_T_U = Pmle(T|"unseen") pb_T_W = P(T|W). If (c_W > smoothInUnknownsThreshold) = c_TW/c_W Else (if not smart mutation) pb_T_W = bayes prior smooth[1] with p_T_U p_T= Pmle(T) p_W = Pmle(W) pb_W_T = log(pb_T_W * p_W / p_T) [Bayes rule] Note that this doesn't really properly reserve mass to unknowns. Unseen: c_TS = count(T,Sig|Unseen) c_S = count(Sig) c_T = count(T|Unseen) c_U = totalUnseen above p_T_U = Pmle(T|Unseen) pb_T_S = Bayes smooth of Pmle(T|S) with P(T|Unseen) [smooth[0]] pb_W_T = log(P(W|T)) inverted
    
    Specified by:
    
    score in interface Lexicon
    
    Overrides:
    
    score in class BaseLexicon
    
    Parameters:
    
    iTW - An IntTaggedWord pairing a word and POS tag
    
    loc - The position in the sentence. In the default implementation this is used only for unknown words to change their probability distribution when sentence initial
    
    word - The word itself; useful so we don't have to look it up in an index
    
    featureSpec - TODO
    
    Returns:
    
    A float score, usually, log P(word|tag)

Class ChineseLexicon

Field Summary

Fields inherited from class edu.stanford.nlp.parser.lexparser.BaseLexicon

Fields inherited from interface edu.stanford.nlp.parser.lexparser.Lexicon

Constructor Summary

Method Summary

Methods inherited from class edu.stanford.nlp.parser.lexparser.BaseLexicon

Methods inherited from class java.lang.Object

Field Detail

useCharBasedUnknownWordModel

useGoodTuringUnknownWordModel

Constructor Detail

ChineseLexicon

Method Detail

score