[1] SPROAT R, EMERSON T. The first international Chinese word segmentation bakeoff[C]//Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2003, 17: 133-143. [2] 邹纲, 刘洋, 刘群, 等.面向Internet的中文新词语检测[J]. 中文信息学报, 2004, 18(6):1-9.(ZOU G, LIU Y, LIU Q, et al. Internet-oriented Chinese new words detection[J]. Journal of Chinese Information Processing, 2004, 18(6):1-9.) [3] MA W Y, CHEN K J. A bottom-up merging algorithm for Chinese unknown word extraction[C]//Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2003, 17: 31-38. [4] SASANO R, KUROHASHI S, OKUMURA M. A simple approach to unknown word processing in Japanese morphological analysis[J]. Nuclear Physics A, 2014, 21(6): 1183-1205. [5] WANG A, KAN M Y. Mining informal language from Chinese microtext: joint word recognition and segmentation[EB/OL].[2016-01-06]. http://www.aclweb.org/old_anthology/P/P13/P13-1072.pdf. [6] SUN X, WANG H, LI W. Fast online training with frequency-adaptive learning rates for Chinese word segmentation and new word detection[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers. Stroudsburg, PA: Association for Computational Linguistics, 2012, 1: 253-262. [7] HUANG M, YE B, WANG Y, et al. New word detection for sentiment analysis[EB/OL].[2016-01-03]. http://mirror.aclweb.org/acl2014/P14-1/pdf/P14-1050.pdf. [8] 邢恩军, 赵富强.基于上下文词频词汇量指标的新词发现方法[J]. 计算机应用与软件, 2016, 33(6):64-67.(XING E J, ZHAO F Q. A novel approach for Chinese new word identification based on contextual word frequency-contextual word count[J]. Computer Applications and Software, 2016, 33(6): 64-67.) [9] NUO M, LIU H, LONG C, et al. Tibetan unknown word identification from news corpora for supporting lexicon-based Tibetan word segmentation[EB/OL].[2016-01-03]. http://rsr.csdb.cn/serverfiles/csdb/paper/upload/20151021/201510210132497839.pdf. [10] 杜丽萍, 李晓戈, 于根, 等.基于互信息改进算法的新词发现对中文分词系统改进[J]. 北京大学学报(自然科学版), 2016, 52(1):35-40.(DU L P, LI X G, YU G, et al. New word detection based on an improved PMI algorithm for enhancing segmentation system[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2016, 52(1): 35-40.) [11] LI C, XU Y. Based on support vector and word features new word discovery research[M]//Trustworthy Computing and Services. Berlin: Springer, 2013: 287-294. [12] ATTIA M, SAMIH Y, SHAALAN K, et al. The floating Arabic dictionary: an automatic method for updating a lexical database through the detection and lemmatization of unknown words[EB/OL].[2016-01-03]. http://www.aclweb.org/anthology/C12-1006. [13] FRANTZI K, ANANIADOU S, MIMA H. Automatic recognition of multi-word terms: the C-value/NC-value method[J]. International Journal on Digital Libraries, 2000, 3(2): 115-130. [14] HUANG J H, POWERS D. Chinese word segmentation based on contextual entropy[EB/OL].[2016-01-06]. http://www.aclweb.org/website/old_anthology/Y/Y03/Y03-1017.pdf. [15] YE Y, WU Q, LI Y, et al. Unknown Chinese word extraction based on variety of overlapping strings[J]. Information Processing and Management, 2013, 49(2): 497-512. [16] LAFFERTY J D, MCCALLUM A, PEREIRA F C N. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann, 2001: 282-289. [17] LI H, HUANG C, GAO J, et al. The use of SVM for Chinese new word identification[C]//Proceedings of the 1st International Joint Conference on Natural Language Processing. Berlin: Springer, 2004: 723-732. [18] XIA F. The segmentation guidelines for the PENN Chinese treebank (3.0)[EB/OL].[2016-01-07]. http://repository.upenn.edu/cgi/viewcontent.cgi?article=1038&context=ircs_reports. |