Abstract:In Chinese word segmentation with Conditional Random Field (CRF), the size of feature window plays a crucial role in corpus training. To find the proper size of feature window, a group of feature templates were selected for the comparative tests performed on Bakeoff2005 with toolkit CRF++0.53 considering the effective range of context. The results are: (1) contribution of below-context is greater than above-context;(2) size of feature window influencing the segment performance is no larger than 5, the proper size is four or five.
王希杰. 词位标注汉语分词中上下文有效范围定量分析[J]. 计算机应用, 2012, 32(05): 1340-1342.
WANG Xi-jie. Analysis on Effect Range of Context in Chinese Word Segmentation based Word -position Tagging. Journal of Computer Applications, 2012, 32(05): 1340-1342.