edu.stanford.nlp.sequences
Class SeqClassifierFlags

java.lang.Object
  extended by edu.stanford.nlp.sequences.SeqClassifierFlags
All Implemented Interfaces:
Serializable

public class SeqClassifierFlags
extends Object
implements Serializable

Flags for sequence classifiers. Documentation for general flags and flags for NER can be found in the Javadoc of edu.stanford.nlp.ie.NERFeatureFactory. Documentation for the flags for Chinese word segmentation can be found in the Javadoc of edu.stanford.nlp.wordseg.ChineseSegmenterFeatureFactory.

IMPORTANT NOTE IF CHANGING THIS FILE: MAKE SURE TO ONLY ADD NEW VARIABLES AT THE END OF THE LIST OF VARIABLES (and not to change existing variables)! Otherwise you usually break all currently serialized classifiers!!! Search for "ADD VARIABLES ABOVE HERE" below. Some general flags are described here

Property NameTypeDefault ValueDescription
useQNbooleantrueUse Quasi-Newton (L-BFGS) to find minimum. NOTE: Need to set this to false if using other minimizers such as SGD.
QNsizeint25Number of previous iterations of Quasi-Newton to store (this increases memory use, but speeds convergence by letting the Quasi-Newton optimization more effectively approximate the second derivative).
QNsize2int25Number of previous iterations of Quasi-Newton to store (used when pruning features, after the first iteration - the first iteration is with QNSize).
useInPlaceSGDbooleanfalseUse SGD (tweaking weights in place) to find minimum (more efficient than the old SGD, faster to converge than Quasi-Newtown if there are very large of samples). Implemented for CRFClassifier. NOTE: Remember to set useQN to false
tuneSampleSizeint-1If this number is greater than 0, specifies the number of samples to use for tuning (default is 1000).
SGDPassesint-1If this number is greater than 0, specifies the number of SGD passes over entire training set) to do before giving up (default is 50). Can be smaller if sample size is very large.
useSGDbooleanfalseUse SGD to find minimum (can be slow). NOTE: Remember to set useQN to false
useSGDtoQNbooleanfalseUse SGD (SGD version selected by useInPlaceSGD or useSGD) for a certain number of passes (SGDPasses) and then switches to QN. Gives the quick initial convergence of SGD, with the desired convergence criterion of QN (there is some rampup time for QN). NOTE: Remember to set useQN to false
evaluateItersint0If this number is greater than 0, evaluates on the test set every so often while minimizing. Implemented for CRFClassifier.
evalCmdStringIf specified (and evaluateIters is set), runs the specified cmdline command during evaluation (instead of default CONLL-like NER evaluation)
evaluateTrainbooleanfalseIf specified (and evaluateIters is set), also evaluate on training set (can be expensive)

Author:
Jenny Finkel
See Also:
Serialized Form

Field Summary
 String adaptFile
          NER adaptation (Gaussian prior) parameters.
 double adaptSigma
           
 String altAnswerFile
           
 double annealingRate
           
 String annealingType
           
 boolean announceObjectBankEntries
           
 String answerFile
           
 boolean augmentedDateChars
           
 String auxTrueCaseModels
           
 String backgroundSymbol
           
 boolean baseline
           
 String baseTestDir
           
 String baseTrainDir
           
 int beamSize
           
 String biasedTrainFile
           
 int[] binnedLengths
           
 boolean bioSubmitOutput
           
 boolean booleanFeatures
           
 boolean cacheNGrams
           
 boolean casedDistSim
          Whether to (not) lowercase tokens before looking them up in distsim lexicon.
 int charHalfWindow
           
 boolean checkNameList
           
 String classBias
           
 String classifierType
           
 boolean cleanGazette
           
 boolean collapseNN
           
 boolean combo
           
 List<String> comboProps
           
 String confusionMatrix
           
 boolean conjoinShapeNGrams
           
 int CRForder
           
 String crfType
           
 int CRFwindow
           
static String DEFAULT_BACKGROUND_SYMBOL
           
 boolean dehyphenateNGrams
           
 boolean deleteBlankLines
           
 String devFile
           
 String dictionary
           
 String dictionary2
           
 int disjunctionWidth
           
 String distSimFileFormat
          The format of the distsim file.
 String distSimLexicon
           
 int distSimMaxBits
          If this number is greater than 0, the distSim class is assume to be a bit string and is truncated at this many characters.
 boolean doAdaptation
           
 String documentReader
           
 boolean doFE
           
 boolean doGibbs
           
 String domain
           
 boolean dontExtendTaggy
           
 String dropGaz
           
 boolean dump
           
 int endFold
           
 String entitySubclassification
           
 double epsilon
           
 boolean estimateInitial
           
 String evalCmd
           
 int evaluateIters
           
 boolean evaluateTrain
           
 boolean expandMidDot
           
 String exportFeatures
           
 boolean fakeDataset
           
 String featThreshFile
           
 int featureCountThreshold
           
 double featureDiffThresh
           
 String featureFactory
           
 int featureThreshold
           
 double featureWeightThreshold
           
 String femaleNameList
           
 double gainSGD
           
 List<String> gazettes
           
 String gazFilesFile
           
 boolean greekifyNGrams
           
 int hybridCutoffIteration
           
 String inferenceType
           
 double initialGain
           
 String initialWeights
           
 boolean initViterbi
           
 boolean innaPPAttach
           
 String inputEncoding
           
 int interimOutputFreq
           
 boolean intern
           
 boolean intern2
           
 boolean iobTags
           
 boolean iobWrapper
           
 boolean justify
           
 int kBest
           
 boolean keepAllWhitespaces
          Keep all the whitespace words in testFile when printing out answers.
 boolean keepEnglishWhitespaces
          Keep the whitespace between English words in testFile when printing out answers.
 boolean keepOBInMemory
           
 double l1reg
           
 boolean largeChSegFile
           
 String lastNameList
           
 String loadAuxClassifier
           
 String loadClassifier
           
 String loadDatasetsDir
           
 String loadJarClassifier
           
 String loadProcessedData
           
 String loadTextClassifier
           
 boolean lowercaseNGrams
           
 boolean lowerNewgeneThreshold
           
 boolean makeConsistent
           
 String maleNameList
           
 String map
           
 boolean markMasdar
           
 boolean markProperNN
           
 int maxDocSize
           
 int maxIterations
           
 int maxLeft
           
 int maxNGramLeng
           
 int maxRight
           
 boolean memoryThrift
           
 boolean mergeTags
           
 String mixedCaseMapFile
           
 String morphFeatureFile
           
 double newgeneThreshold
           
 boolean noMidNGrams
           
 String normalizationTable
           
 boolean normalize
           
 boolean normalizeTerms
           
 boolean normalizeTimex
           
 String normTableEncoding
           
 boolean numberEquivalenceDistSim
          If this is set to true, all digit characters get mapped to '9' in a distsim lexicon and for lookup.
 int numDatasetsPerFile
           
 int numFolds
           
 int numRuns
           
 int numSamples
           
 int numStartLayers
           
 int numTags
           
 int numTimesPruneFeatures
           
 int numTimesRemoveTopN
           
 int ocrFold
           
 boolean ocrTrain
           
 String outDict2
           
 String outputEncoding
           
 String outputFile
           
 String outputFormat
           
 boolean outputIterationsToFile
           
 CoreLabel pad
           
 List<String> phraseGazettes
           
 String predProp
           
 String printClassifier
           
 int printClassifierParam
           
 String printFeatures
           
 int printFeaturesUpto
           
 boolean printFirstOrderProbs
           
 String printGazFeatures
           
 boolean printLabelValue
           
 boolean printNR
           
 boolean printProbs
           
 boolean printXML
           
 String priorType
           
 Properties props
           
 boolean purgeDatasets
           
 int purgeFeatures
           
 String pushDir
           
 int QNPasses
           
 int QNsize
           
 int QNsize2
           
 double randomizedRatio
           
 String readerAndWriter
           
 boolean removeBackgroundSingletonFeatures
           
 int removeTopN
           
 double removeTopNPercent
           
 boolean restrictLabels
           
 boolean restrictTransitionsTimit
           
 boolean retainEntitySubclassification
           
 boolean saveFeatureIndexToDisk
           
 int scaledSGDMethod
           
 String searchGraphPrefix
           
 double searchGraphPrune
           
 boolean selfTest
           
 double selfTrainConfidenceThreshold
           
 String selfTrainFile
           
 int selfTrainIterations
           
 int selfTrainWindowSize
           
 String serializeDatasetsDir
           
 String serializedDictionary
           
 String serializeTo
           
 String serializeToText
           
 int SGD2QNhessSamples
           
 int SGDPasses
           
 String sighanCorporaDict
          for Sighan bakeoff 2005, the path to the dictionary of bigrams appeared in corpus
 boolean sighanPostProcessing
           
 double sigma
           
 boolean sloppyGazette
           
 boolean splitDocuments
           
 boolean splitOnHead
           
 int startFold
           
 int stochasticBatchSize
           
 StochasticCalculateMethods stochasticMethod
           
 boolean strictlyFirstOrder
           
 boolean strictlySecondOrder
           
 boolean strictlyThirdOrder
           
 boolean strictlyZeroethOrder
           
 boolean subCWGaz
           
 boolean suppressMidDotPostprocessing
           
 String svmModelFile
           
 String testDirs
           
 String testFile
           
 String testFiles
           
 boolean testHessSamples
           
 boolean testObjFunction
           
 boolean testVariance
           
 String textFile
           
 boolean timitDatum
           
 String tokenFactory
           
 String tokensAnnotationClassName
           
 double tolerance
           
 String trainDirs
           
 String trainFile
           
 String trainFileList
           
 String trainFiles
           
 String trainHierarchical
           
 String transferSigmas
           
 int tuneSampleSize
           
 boolean tuneSGD
           
 boolean twoStage
           
 String type
           
 boolean use2W
           
 boolean use4Clique
           
 boolean useAbbr
           
 boolean useAbbr1
           
 boolean useABGENE
           
 boolean useABSTR
           
 boolean useABSTRFreq
           
 boolean useABSTRFreqDict
           
 boolean useAccCase
           
 boolean useAcqPrior
           
 boolean useACR
           
 boolean useAgreement
           
 boolean useAltGazFeatures
           
 boolean useAnnexing
           
 boolean useANTE
           
 boolean useAs
           
 boolean useASBCChar2
           
 boolean useASBCPre1
           
 boolean useASBCSuf1
           
 boolean useAuxPairs
           
 boolean useBeginSent
           
 boolean useBig5
           
 boolean useBigramInTwoClique
           
 boolean useBoundarySequences
           
 boolean useChPos
          use POS information (an "open" feature for Chinese segmentation)
 boolean useChunks
           
 boolean useChunkySequences
           
 boolean useClassFeature
           
 boolean useConcord
           
 boolean useConjBreak
           
 boolean useCorefFeatures
           
 boolean useCTBChar2
           
 boolean useCTBPre1
           
 boolean useCTBSuf1
           
 boolean useDict2
           
 boolean useDictASBC2
           
 boolean useDictCTB2
           
 boolean useDictHK2
           
 boolean useDictionaryConjunctions
           
 boolean useDictionaryConjunctions3
           
 boolean useDictleng
           
 boolean useDictPK2
           
 boolean useDisjShape
           
 boolean useDisjunctive
           
 boolean useDisjunctiveShapeInteraction
           
 boolean useDistSim
           
 boolean useEitherSideDisjunctive
           
 boolean useEitherSideWord
           
 boolean useEntityRule
           
 boolean useEntityTypes
           
 boolean useEntityTypeSequences
           
 boolean useExternal
           
 boolean useExtraTaggySequences
           
 boolean useFeaturesC4gram
           
 boolean useFeaturesC5gram
           
 boolean useFeaturesC6gram
           
 boolean useFeaturesCpC4gram
           
 boolean useFeaturesCpC5gram
           
 boolean useFeaturesCpC6gram
           
 boolean useFilter
           
 boolean useFirstNgram
           
 boolean useFirstWord
           
 boolean useFloat
           
 boolean useFREQ
           
 boolean useGazettePhrases
           
 boolean useGazettes
           
 boolean useGazFeatures
           
 boolean useGenericFeatures
           
 boolean useGENIA
           
 boolean useGoodForNamesCpC
           
 boolean useHeadGov
           
 boolean useHk
           
 boolean useHKChar2
           
 boolean useHKPre1
           
 boolean useHKSuf1
           
 boolean useHuber
           
 boolean useHybrid
           
 boolean useIfInteger
           
 boolean useInna
           
 boolean useInPlaceSGD
           
 boolean useInternal
           
 boolean useIsDateRange
           
 boolean useIsURL
           
 boolean useKBest
           
 boolean useLastNgram
           
 boolean useLastRealWord
           
 boolean useLC
           
 boolean useLemmaAsWord
           
 boolean useLemmas
           
 boolean useLongSequences
           
 boolean useMidDotShape
           
 boolean useMinimalAbbr
           
 boolean useMinimalAbbr1
           
 boolean useMoreAbbr
           
 boolean useMoreGazFeatures
           
 boolean useMoreTags
           
 boolean useMsr
           
 boolean useMSRChar2
           
 boolean useMUCFeatures
           
 boolean useNB
           
 boolean useNegASBCDict2
           
 boolean useNegASBCDict3
           
 boolean useNegASBCDict4
           
 boolean useNegCTBDict2
           
 boolean useNegCTBDict3
           
 boolean useNegCTBDict4
           
 boolean useNegDict2
           
 boolean useNegDict3
           
 boolean useNegDict4
           
 boolean useNegHKDict2
           
 boolean useNegHKDict3
           
 boolean useNegHKDict4
           
 boolean useNegPKDict2
           
 boolean useNegPKDict3
           
 boolean useNegPKDict4
           
 boolean useNERPrior
           
 boolean useNext
           
 boolean useNextRealWord
           
 boolean useNextSequences
           
 boolean useNextVB
           
 boolean useNGrams
           
 boolean useNPGovernor
           
 boolean useNPHead
           
 boolean useNumberFeature
           
 boolean useObservedFeaturesOnly
           
 boolean useObservedSequencesOnly
           
 boolean useOccurrencePatterns
           
 boolean useOnlySeenWeights
           
 boolean useOrdinal
           
 boolean useOutDict2
           
 boolean useParenMatching
           
 boolean usePath
           
 boolean usePhraseFeatures
           
 boolean usePhraseWords
           
 boolean usePhraseWordSpecialTags
           
 boolean usePhraseWordTags
           
 boolean usePk
           
 boolean usePKChar2
           
 boolean usePKPre1
           
 boolean usePKSuf1
           
 boolean usePos
           
 boolean usePosition
           
 boolean usePPVBPairs
           
 boolean usePre
           
 boolean usePrediction
           
 boolean usePrediction2
           
 boolean usePrev
           
 boolean usePrevNextLemmas
           
 boolean usePrevSequences
           
 boolean usePrevVB
           
 boolean useProtoFeatures
           
 boolean useQN
           
 boolean useQuartic
           
 boolean useRad1
           
 boolean useRad2
           
 boolean useRad2b
           
 boolean useRadical
           
 boolean useReverse
           
 boolean useReverseAffix
           
 boolean useRobustQN
           
 boolean useRule
           
 boolean useRule2
           
 boolean useScaledSGD
           
 boolean useSeenFeaturesOnly
           
 boolean useSegmentation
           
 boolean useSemPrior
           
 boolean useSequences
           
 boolean useSGD
           
 boolean useSGDtoQN
           
 boolean useShapeConjunctions
           
 boolean useShapeStrings
           
 boolean useShapeStrings1
           
 boolean useShapeStrings3
           
 boolean useShapeStrings4
           
 boolean useShapeStrings5
           
 boolean useSMD
           
 boolean useStochasticQN
           
 boolean useSuf
           
 boolean useSum
           
 boolean useSVO
           
 boolean useSymTags
           
 boolean useSymWordPairs
          useSymWordPairs Has a small negative effect.
 boolean useTaggySequences
           
 boolean useTaggySequencesShapeInteraction
           
 boolean useTags
           
 boolean useTagsCpC
           
 boolean useTagsCpCp2C
           
 boolean useTagsCpCp2Cp3C
           
 boolean useTagsCpCp2Cp3Cp4C
           
 boolean useTemporalNN
           
 boolean useTitle
           
 boolean useTOK
           
 boolean useTopics
           
 boolean useTypeSeqs
           
 boolean useTypeSeqs2
           
 boolean useTypeSeqs3
           
 boolean useTypeySequences
           
 boolean useUnicodeBlock
           
 boolean useUnicodeType
           
 boolean useUnicodeType4gram
           
 boolean useUnicodeType5gram
           
 boolean useUniformPrior
          If true and doGibbs also true, will do generic Gibbs inference without any priors
 boolean useUnknown
           
 boolean useURLSequences
           
 boolean useVB
           
 boolean useViterbi
           
 boolean useWEB
           
 boolean useWEBFreqDict
           
 boolean useWideDisjunctive
           
 boolean useWord
           
 boolean useWord1
           
 boolean useWord2
           
 boolean useWord3
           
 boolean useWord4
           
 boolean useWordLabelCounts
           
 boolean useWordn
           
 boolean useWordnetFeatures
           
 boolean useWordPairs
           
 boolean useWordShapeConjunctions2
           
 boolean useWordShapeConjunctions3
           
 boolean useWordShapeGaz
           
 boolean useWordTag
           
 boolean useWordUTypeConjunctions2
           
 boolean useWordUTypeConjunctions3
           
 boolean useYetMoreCpCShapes
           
 boolean verboseForTrueCasing
           
 boolean verboseMode
           
 int wideDisjunctionWidth
           
 String wikiFeatureDbFile
           
 int wordShape
           
 String wordShapeGaz
           
 
Constructor Summary
SeqClassifierFlags()
           
SeqClassifierFlags(Properties props)
          Create a new SeqClassifierFlags object and initialize it using values in the Properties object.
 
Method Summary
 void setProperties(Properties props)
          Initialize this object using values in Properties object.
 void setProperties(Properties props, boolean printProps)
          Initialize using values in Properties file.
 String toString()
          Print the properties specified by this object.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

DEFAULT_BACKGROUND_SYMBOL

public static final String DEFAULT_BACKGROUND_SYMBOL
See Also:
Constant Field Values

useNGrams

public boolean useNGrams

conjoinShapeNGrams

public boolean conjoinShapeNGrams

lowercaseNGrams

public boolean lowercaseNGrams

dehyphenateNGrams

public boolean dehyphenateNGrams

usePrev

public boolean usePrev

useNext

public boolean useNext

useTags

public boolean useTags

useWordPairs

public boolean useWordPairs

useGazettes

public boolean useGazettes

useSequences

public boolean useSequences

usePrevSequences

public boolean usePrevSequences

useNextSequences

public boolean useNextSequences

useLongSequences

public boolean useLongSequences

useBoundarySequences

public boolean useBoundarySequences

useTaggySequences

public boolean useTaggySequences

useExtraTaggySequences

public boolean useExtraTaggySequences

dontExtendTaggy

public boolean dontExtendTaggy

useTaggySequencesShapeInteraction

public boolean useTaggySequencesShapeInteraction

strictlyZeroethOrder

public boolean strictlyZeroethOrder

strictlyFirstOrder

public boolean strictlyFirstOrder

strictlySecondOrder

public boolean strictlySecondOrder

strictlyThirdOrder

public boolean strictlyThirdOrder

entitySubclassification

public String entitySubclassification

retainEntitySubclassification

public boolean retainEntitySubclassification

useGazettePhrases

public boolean useGazettePhrases

makeConsistent

public boolean makeConsistent

useWordLabelCounts

public boolean useWordLabelCounts

useViterbi

public boolean useViterbi

binnedLengths

public int[] binnedLengths

verboseMode

public boolean verboseMode

useSum

public boolean useSum

tolerance

public double tolerance

printFeatures

public String printFeatures

useSymTags

public boolean useSymTags

useSymWordPairs

public boolean useSymWordPairs
useSymWordPairs Has a small negative effect.


printClassifier

public String printClassifier

printClassifierParam

public int printClassifierParam

intern

public boolean intern

intern2

public boolean intern2

selfTest

public boolean selfTest

sloppyGazette

public boolean sloppyGazette

cleanGazette

public boolean cleanGazette

noMidNGrams

public boolean noMidNGrams

maxNGramLeng

public int maxNGramLeng

useReverse

public boolean useReverse

greekifyNGrams

public boolean greekifyNGrams

useParenMatching

public boolean useParenMatching

useLemmas

public boolean useLemmas

usePrevNextLemmas

public boolean usePrevNextLemmas

normalizeTerms

public boolean normalizeTerms

normalizeTimex

public boolean normalizeTimex

useNB

public boolean useNB

useQN

public boolean useQN

useFloat

public boolean useFloat

QNsize

public int QNsize

QNsize2

public int QNsize2

maxIterations

public int maxIterations

wordShape

public int wordShape

useShapeStrings

public boolean useShapeStrings

useTypeSeqs

public boolean useTypeSeqs

useTypeSeqs2

public boolean useTypeSeqs2

useTypeSeqs3

public boolean useTypeSeqs3

useDisjunctive

public boolean useDisjunctive

disjunctionWidth

public int disjunctionWidth

useDisjunctiveShapeInteraction

public boolean useDisjunctiveShapeInteraction

useDisjShape

public boolean useDisjShape

useWord

public boolean useWord

useClassFeature

public boolean useClassFeature

useShapeConjunctions

public boolean useShapeConjunctions

useWordTag

public boolean useWordTag

useNPHead

public boolean useNPHead

useNPGovernor

public boolean useNPGovernor

useHeadGov

public boolean useHeadGov

useLastRealWord

public boolean useLastRealWord

useNextRealWord

public boolean useNextRealWord

useOccurrencePatterns

public boolean useOccurrencePatterns

useTypeySequences

public boolean useTypeySequences

justify

public boolean justify

normalize

public boolean normalize

priorType

public String priorType

sigma

public double sigma

epsilon

public double epsilon

beamSize

public int beamSize

maxLeft

public int maxLeft

maxRight

public int maxRight

usePosition

public boolean usePosition

useBeginSent

public boolean useBeginSent

useGazFeatures

public boolean useGazFeatures

useMoreGazFeatures

public boolean useMoreGazFeatures

useAbbr

public boolean useAbbr

useMinimalAbbr

public boolean useMinimalAbbr

useAbbr1

public boolean useAbbr1

useMinimalAbbr1

public boolean useMinimalAbbr1

useMoreAbbr

public boolean useMoreAbbr

deleteBlankLines

public boolean deleteBlankLines

useGENIA

public boolean useGENIA

useTOK

public boolean useTOK

useABSTR

public boolean useABSTR

useABSTRFreqDict

public boolean useABSTRFreqDict

useABSTRFreq

public boolean useABSTRFreq

useFREQ

public boolean useFREQ

useABGENE

public boolean useABGENE

useWEB

public boolean useWEB

useWEBFreqDict

public boolean useWEBFreqDict

useIsURL

public boolean useIsURL

useURLSequences

public boolean useURLSequences

useIsDateRange

public boolean useIsDateRange

useEntityTypes

public boolean useEntityTypes

useEntityTypeSequences

public boolean useEntityTypeSequences

useEntityRule

public boolean useEntityRule

useOrdinal

public boolean useOrdinal

useACR

public boolean useACR

useANTE

public boolean useANTE

useMoreTags

public boolean useMoreTags

useChunks

public boolean useChunks

useChunkySequences

public boolean useChunkySequences

usePrevVB

public boolean usePrevVB

useNextVB

public boolean useNextVB

useVB

public boolean useVB

subCWGaz

public boolean subCWGaz

documentReader

public String documentReader

map

public String map

useWideDisjunctive

public boolean useWideDisjunctive

wideDisjunctionWidth

public int wideDisjunctionWidth

useRadical

public boolean useRadical

useBigramInTwoClique

public boolean useBigramInTwoClique

morphFeatureFile

public String morphFeatureFile

useReverseAffix

public boolean useReverseAffix

charHalfWindow

public int charHalfWindow

useWord1

public boolean useWord1

useWord2

public boolean useWord2

useWord3

public boolean useWord3

useWord4

public boolean useWord4

useRad1

public boolean useRad1

useRad2

public boolean useRad2

useWordn

public boolean useWordn

useCTBPre1

public boolean useCTBPre1

useCTBSuf1

public boolean useCTBSuf1

useASBCPre1

public boolean useASBCPre1

useASBCSuf1

public boolean useASBCSuf1

usePKPre1

public boolean usePKPre1

usePKSuf1

public boolean usePKSuf1

useHKPre1

public boolean useHKPre1

useHKSuf1

public boolean useHKSuf1

useCTBChar2

public boolean useCTBChar2

useASBCChar2

public boolean useASBCChar2

useHKChar2

public boolean useHKChar2

usePKChar2

public boolean usePKChar2

useRule2

public boolean useRule2

useDict2

public boolean useDict2

useOutDict2

public boolean useOutDict2

outDict2

public String outDict2

useDictleng

public boolean useDictleng

useDictCTB2

public boolean useDictCTB2

useDictASBC2

public boolean useDictASBC2

useDictPK2

public boolean useDictPK2

useDictHK2

public boolean useDictHK2

useBig5

public boolean useBig5

useNegDict2

public boolean useNegDict2

useNegDict3

public boolean useNegDict3

useNegDict4

public boolean useNegDict4

useNegCTBDict2

public boolean useNegCTBDict2

useNegCTBDict3

public boolean useNegCTBDict3

useNegCTBDict4

public boolean useNegCTBDict4

useNegASBCDict2

public boolean useNegASBCDict2

useNegASBCDict3

public boolean useNegASBCDict3

useNegASBCDict4

public boolean useNegASBCDict4

useNegHKDict2

public boolean useNegHKDict2

useNegHKDict3

public boolean useNegHKDict3

useNegHKDict4

public boolean useNegHKDict4

useNegPKDict2

public boolean useNegPKDict2

useNegPKDict3

public boolean useNegPKDict3

useNegPKDict4

public boolean useNegPKDict4

usePre

public boolean usePre

useSuf

public boolean useSuf

useRule

public boolean useRule

useHk

public boolean useHk

useMsr

public boolean useMsr

useMSRChar2

public boolean useMSRChar2

usePk

public boolean usePk

useAs

public boolean useAs

useFilter

public boolean useFilter

largeChSegFile

public boolean largeChSegFile

useRad2b

public boolean useRad2b

keepEnglishWhitespaces

public boolean keepEnglishWhitespaces
Keep the whitespace between English words in testFile when printing out answers. Doesn't really change the content of the CoreLabels. (For Chinese segmentation.)


keepAllWhitespaces

public boolean keepAllWhitespaces
Keep all the whitespace words in testFile when printing out answers. Doesn't really change the content of the CoreLabels. (For Chinese segmentation.)


sighanPostProcessing

public boolean sighanPostProcessing

useChPos

public boolean useChPos
use POS information (an "open" feature for Chinese segmentation)


normalizationTable

public String normalizationTable

dictionary

public String dictionary

serializedDictionary

public String serializedDictionary

dictionary2

public String dictionary2

normTableEncoding

public String normTableEncoding

sighanCorporaDict

public String sighanCorporaDict
for Sighan bakeoff 2005, the path to the dictionary of bigrams appeared in corpus


useWordShapeGaz

public boolean useWordShapeGaz

wordShapeGaz

public String wordShapeGaz

splitDocuments

public boolean splitDocuments

printXML

public boolean printXML

useSeenFeaturesOnly

public boolean useSeenFeaturesOnly

lastNameList

public String lastNameList

maleNameList

public String maleNameList

femaleNameList

public String femaleNameList

trainFile

public transient String trainFile

adaptFile

public transient String adaptFile
NER adaptation (Gaussian prior) parameters.


devFile

public transient String devFile

testFile

public transient String testFile

textFile

public transient String textFile

outputFile

public transient String outputFile

loadClassifier

public transient String loadClassifier

loadTextClassifier

public transient String loadTextClassifier

loadJarClassifier

public transient String loadJarClassifier

loadAuxClassifier

public transient String loadAuxClassifier

serializeTo

public transient String serializeTo

serializeToText

public transient String serializeToText

interimOutputFreq

public transient int interimOutputFreq

initialWeights

public transient String initialWeights

gazettes

public transient List<String> gazettes

selfTrainFile

public transient String selfTrainFile

inputEncoding

public String inputEncoding

bioSubmitOutput

public boolean bioSubmitOutput

numRuns

public int numRuns

answerFile

public String answerFile

altAnswerFile

public String altAnswerFile

dropGaz

public String dropGaz

printGazFeatures

public String printGazFeatures

numStartLayers

public int numStartLayers

dump

public boolean dump

mergeTags

public boolean mergeTags

splitOnHead

public boolean splitOnHead

featureCountThreshold

public int featureCountThreshold

featureWeightThreshold

public double featureWeightThreshold

featureFactory

public String featureFactory

backgroundSymbol

public String backgroundSymbol

useObservedSequencesOnly

public boolean useObservedSequencesOnly

maxDocSize

public int maxDocSize

printProbs

public boolean printProbs

printFirstOrderProbs

public boolean printFirstOrderProbs

saveFeatureIndexToDisk

public boolean saveFeatureIndexToDisk

removeBackgroundSingletonFeatures

public boolean removeBackgroundSingletonFeatures

doGibbs

public boolean doGibbs

numSamples

public int numSamples

useNERPrior

public boolean useNERPrior

useAcqPrior

public boolean useAcqPrior

useUniformPrior

public boolean useUniformPrior
If true and doGibbs also true, will do generic Gibbs inference without any priors


useMUCFeatures

public boolean useMUCFeatures

annealingRate

public double annealingRate

annealingType

public String annealingType

loadProcessedData

public String loadProcessedData

initViterbi

public boolean initViterbi

useUnknown

public boolean useUnknown

checkNameList

public boolean checkNameList

useSemPrior

public boolean useSemPrior

useFirstWord

public boolean useFirstWord

useNumberFeature

public boolean useNumberFeature

ocrFold

public int ocrFold

ocrTrain

public transient boolean ocrTrain

classifierType

public String classifierType

svmModelFile

public String svmModelFile

inferenceType

public String inferenceType

useLemmaAsWord

public boolean useLemmaAsWord

type

public String type

readerAndWriter

public String readerAndWriter

comboProps

public List<String> comboProps

usePrediction

public boolean usePrediction

useAltGazFeatures

public boolean useAltGazFeatures

gazFilesFile

public String gazFilesFile

usePrediction2

public boolean usePrediction2

baseTrainDir

public String baseTrainDir

baseTestDir

public String baseTestDir

trainFiles

public String trainFiles

trainFileList

public String trainFileList

testFiles

public String testFiles

trainDirs

public String trainDirs

testDirs

public String testDirs

useOnlySeenWeights

public boolean useOnlySeenWeights

predProp

public String predProp

pad

public CoreLabel pad

useObservedFeaturesOnly

public boolean useObservedFeaturesOnly

distSimLexicon

public String distSimLexicon

useDistSim

public boolean useDistSim

removeTopN

public int removeTopN

numTimesRemoveTopN

public int numTimesRemoveTopN

randomizedRatio

public double randomizedRatio

removeTopNPercent

public double removeTopNPercent

purgeFeatures

public int purgeFeatures

booleanFeatures

public boolean booleanFeatures

iobWrapper

public boolean iobWrapper

iobTags

public boolean iobTags

useSegmentation

public boolean useSegmentation

memoryThrift

public boolean memoryThrift

timitDatum

public boolean timitDatum

serializeDatasetsDir

public String serializeDatasetsDir

loadDatasetsDir

public String loadDatasetsDir

pushDir

public String pushDir

purgeDatasets

public boolean purgeDatasets

keepOBInMemory

public boolean keepOBInMemory

fakeDataset

public boolean fakeDataset

restrictTransitionsTimit

public boolean restrictTransitionsTimit

numDatasetsPerFile

public int numDatasetsPerFile

useTitle

public boolean useTitle

lowerNewgeneThreshold

public boolean lowerNewgeneThreshold

useEitherSideWord

public boolean useEitherSideWord

useEitherSideDisjunctive

public boolean useEitherSideDisjunctive

twoStage

public boolean twoStage

crfType

public String crfType

featureThreshold

public int featureThreshold

featThreshFile

public String featThreshFile

featureDiffThresh

public double featureDiffThresh

numTimesPruneFeatures

public int numTimesPruneFeatures

newgeneThreshold

public double newgeneThreshold

doAdaptation

public boolean doAdaptation

useInternal

public boolean useInternal

useExternal

public boolean useExternal

selfTrainConfidenceThreshold

public double selfTrainConfidenceThreshold

selfTrainIterations

public int selfTrainIterations

selfTrainWindowSize

public int selfTrainWindowSize

useHuber

public boolean useHuber

useQuartic

public boolean useQuartic

adaptSigma

public double adaptSigma

numFolds

public int numFolds

startFold

public int startFold

endFold

public int endFold

cacheNGrams

public boolean cacheNGrams

outputFormat

public String outputFormat

useSMD

public boolean useSMD

useSGDtoQN

public boolean useSGDtoQN

useStochasticQN

public boolean useStochasticQN

useScaledSGD

public boolean useScaledSGD

scaledSGDMethod

public int scaledSGDMethod

SGDPasses

public int SGDPasses

QNPasses

public int QNPasses

tuneSGD

public boolean tuneSGD

stochasticMethod

public StochasticCalculateMethods stochasticMethod

initialGain

public double initialGain

stochasticBatchSize

public int stochasticBatchSize

useSGD

public boolean useSGD

gainSGD

public double gainSGD

useHybrid

public boolean useHybrid

hybridCutoffIteration

public int hybridCutoffIteration

outputIterationsToFile

public boolean outputIterationsToFile

testObjFunction

public boolean testObjFunction

testVariance

public boolean testVariance

SGD2QNhessSamples

public int SGD2QNhessSamples

testHessSamples

public boolean testHessSamples

CRForder

public int CRForder

CRFwindow

public int CRFwindow

estimateInitial

public boolean estimateInitial

biasedTrainFile

public transient String biasedTrainFile

confusionMatrix

public transient String confusionMatrix

outputEncoding

public String outputEncoding

useKBest

public boolean useKBest

searchGraphPrefix

public String searchGraphPrefix

searchGraphPrune

public double searchGraphPrune

kBest

public int kBest

useFeaturesC4gram

public boolean useFeaturesC4gram

useFeaturesC5gram

public boolean useFeaturesC5gram

useFeaturesC6gram

public boolean useFeaturesC6gram

useFeaturesCpC4gram

public boolean useFeaturesCpC4gram

useFeaturesCpC5gram

public boolean useFeaturesCpC5gram

useFeaturesCpC6gram

public boolean useFeaturesCpC6gram

useUnicodeType

public boolean useUnicodeType

useUnicodeType4gram

public boolean useUnicodeType4gram

useUnicodeType5gram

public boolean useUnicodeType5gram

use4Clique

public boolean use4Clique

useUnicodeBlock

public boolean useUnicodeBlock

useShapeStrings1

public boolean useShapeStrings1

useShapeStrings3

public boolean useShapeStrings3

useShapeStrings4

public boolean useShapeStrings4

useShapeStrings5

public boolean useShapeStrings5

useGoodForNamesCpC

public boolean useGoodForNamesCpC

useDictionaryConjunctions

public boolean useDictionaryConjunctions

expandMidDot

public boolean expandMidDot

printFeaturesUpto

public int printFeaturesUpto

useDictionaryConjunctions3

public boolean useDictionaryConjunctions3

useWordUTypeConjunctions2

public boolean useWordUTypeConjunctions2

useWordUTypeConjunctions3

public boolean useWordUTypeConjunctions3

useWordShapeConjunctions2

public boolean useWordShapeConjunctions2

useWordShapeConjunctions3

public boolean useWordShapeConjunctions3

useMidDotShape

public boolean useMidDotShape

augmentedDateChars

public boolean augmentedDateChars

suppressMidDotPostprocessing

public boolean suppressMidDotPostprocessing

printNR

public boolean printNR

classBias

public String classBias

printLabelValue

public boolean printLabelValue

useRobustQN

public boolean useRobustQN

combo

public boolean combo

useGenericFeatures

public boolean useGenericFeatures

verboseForTrueCasing

public boolean verboseForTrueCasing

trainHierarchical

public String trainHierarchical

domain

public String domain

baseline

public boolean baseline

transferSigmas

public String transferSigmas

doFE

public boolean doFE

restrictLabels

public boolean restrictLabels

announceObjectBankEntries

public boolean announceObjectBankEntries

usePos

public boolean usePos

useAgreement

public boolean useAgreement

useAccCase

public boolean useAccCase

useInna

public boolean useInna

useConcord

public boolean useConcord

useFirstNgram

public boolean useFirstNgram

useLastNgram

public boolean useLastNgram

collapseNN

public boolean collapseNN

useConjBreak

public boolean useConjBreak

useAuxPairs

public boolean useAuxPairs

usePPVBPairs

public boolean usePPVBPairs

useAnnexing

public boolean useAnnexing

useTemporalNN

public boolean useTemporalNN

usePath

public boolean usePath

innaPPAttach

public boolean innaPPAttach

markProperNN

public boolean markProperNN

markMasdar

public boolean markMasdar

useSVO

public boolean useSVO

numTags

public int numTags

useTagsCpC

public boolean useTagsCpC

useTagsCpCp2C

public boolean useTagsCpCp2C

useTagsCpCp2Cp3C

public boolean useTagsCpCp2Cp3C

useTagsCpCp2Cp3Cp4C

public boolean useTagsCpCp2Cp3Cp4C

l1reg

public double l1reg

mixedCaseMapFile

public String mixedCaseMapFile

auxTrueCaseModels

public String auxTrueCaseModels

use2W

public boolean use2W

useLC

public boolean useLC

useYetMoreCpCShapes

public boolean useYetMoreCpCShapes

useIfInteger

public boolean useIfInteger

exportFeatures

public String exportFeatures

useInPlaceSGD

public boolean useInPlaceSGD

useTopics

public boolean useTopics

evaluateIters

public int evaluateIters

evalCmd

public String evalCmd

evaluateTrain

public boolean evaluateTrain

tuneSampleSize

public int tuneSampleSize

usePhraseFeatures

public boolean usePhraseFeatures

usePhraseWords

public boolean usePhraseWords

usePhraseWordTags

public boolean usePhraseWordTags

usePhraseWordSpecialTags

public boolean usePhraseWordSpecialTags

useProtoFeatures

public boolean useProtoFeatures

useWordnetFeatures

public boolean useWordnetFeatures

tokenFactory

public String tokenFactory

tokensAnnotationClassName

public String tokensAnnotationClassName

useCorefFeatures

public boolean useCorefFeatures

wikiFeatureDbFile

public String wikiFeatureDbFile

casedDistSim

public boolean casedDistSim
Whether to (not) lowercase tokens before looking them up in distsim lexicon. By default lowercasing was done, but now it doesn't have to be true :-).


distSimFileFormat

public String distSimFileFormat
The format of the distsim file. Known values are: alexClark = TSV file. word TAB clusterNumber [optional other content] terryKoo = TSV file. clusterBitString TAB word TAB frequency


distSimMaxBits

public int distSimMaxBits
If this number is greater than 0, the distSim class is assume to be a bit string and is truncated at this many characters. Normal distSim features will then use this amount of resolution. Extra, special distsim features may work at a coarser level of resolution. Since the lexicon only stores this length of bit string, there is then no way to have finer-grained clusters.


numberEquivalenceDistSim

public boolean numberEquivalenceDistSim
If this is set to true, all digit characters get mapped to '9' in a distsim lexicon and for lookup. This is a simple word shaping that can shrink distsim lexicons and improve their performance.


phraseGazettes

public transient List<String> phraseGazettes

props

public transient Properties props
Constructor Detail

SeqClassifierFlags

public SeqClassifierFlags()

SeqClassifierFlags

public SeqClassifierFlags(Properties props)
Create a new SeqClassifierFlags object and initialize it using values in the Properties object. The properties are printed to stderr as it works.

Parameters:
props - The properties object used for initialization
Method Detail

setProperties

public final void setProperties(Properties props)
Initialize this object using values in Properties object. The properties are printed to stderr as it works.

Parameters:
props - The properties object used for initialization

setProperties

public void setProperties(Properties props,
                          boolean printProps)
Initialize using values in Properties file.

Parameters:
props - The properties object used for initialization
printProps - Whether to print the properties to stderr as it works.

toString

public String toString()
Print the properties specified by this object.

Overrides:
toString in class Object
Returns:
A String describing the properties specified by this object.


Stanford NLP Group