public class Gale2007ChineseSegmenterFeatureFactory<IN extends CoreLabel> extends FeatureFactory<IN>
c is Chinese character ("char"). c means current, n means next and p means previous.
Feature | Templates |
---|---|
Current position clique | |
useWord1 | CONSTANT, cc, nc, pc, pc+cc, if (As|Msr|Pk|Hk) cc+nc, pc,nc |
cliqueC, cliqueCnC, cliqueCp2C, cliqueCp3C, cliqueCp4C, cliqueCp5C, cliqueCpC, cliqueCpCnC, cliqueCpCp2C, cliqueCpCp2Cp3C, cliqueCpCp2Cp3Cp4C, cliqueCpCp2Cp3Cp4Cp5C, flags, knownCliques
Constructor and Description |
---|
Gale2007ChineseSegmenterFeatureFactory() |
Modifier and Type | Method and Description |
---|---|
protected java.util.Collection<java.lang.String> |
featuresC(PaddedList<? extends CoreLabel> cInfo,
int loc) |
protected java.util.Collection<java.lang.String> |
featuresCnC(PaddedList<? extends CoreLabel> cInfo,
int loc)
For a CRF, this shouldn't be necessary, since the features duplicate
those from CpC, but Huihsin found some valuable, presumably becuase
it modified the regularization a bit.
|
protected java.util.Collection<java.lang.String> |
featuresCpC(PaddedList<? extends CoreLabel> cInfo,
int loc) |
protected java.util.Collection<java.lang.String> |
featuresCpCp2C(PaddedList<? extends CoreLabel> cInfo,
int loc)
Second order clique features
|
protected java.util.Collection<java.lang.String> |
featuresCpCp2Cp3C(PaddedList<? extends CoreLabel> cInfo,
int loc) |
java.util.Collection<java.lang.String> |
getCliqueFeatures(PaddedList<IN> cInfo,
int loc,
Clique clique)
Extracts all the features from the input data at a certain index.
|
void |
init(SeqClassifierFlags flags) |
addAllInterningAndSuffixing, eachClique, getCliques, getCliques, getWord
public Gale2007ChineseSegmenterFeatureFactory()
public void init(SeqClassifierFlags flags)
init
in class FeatureFactory<IN extends CoreLabel>
public java.util.Collection<java.lang.String> getCliqueFeatures(PaddedList<IN> cInfo, int loc, Clique clique)
getCliqueFeatures
in class FeatureFactory<IN extends CoreLabel>
cInfo
- The complete data set as a List of WordInfoloc
- The index at which to extract features.clique
- The particular clique for which to extract features. It
should be a member of the knownCliques list.Collection
of the features
calculated for the word at the specified position in info.protected java.util.Collection<java.lang.String> featuresC(PaddedList<? extends CoreLabel> cInfo, int loc)
protected java.util.Collection<java.lang.String> featuresCpC(PaddedList<? extends CoreLabel> cInfo, int loc)
protected java.util.Collection<java.lang.String> featuresCnC(PaddedList<? extends CoreLabel> cInfo, int loc)
cInfo
- The list of charactersloc
- Position of c in listprotected java.util.Collection<java.lang.String> featuresCpCp2C(PaddedList<? extends CoreLabel> cInfo, int loc)
cInfo
- The list of charactersloc
- Position of c in listprotected java.util.Collection<java.lang.String> featuresCpCp2Cp3C(PaddedList<? extends CoreLabel> cInfo, int loc)