计算机应用 ›› 2012, Vol. 32 ›› Issue (05): 1332-1334.

• 人工智能 • 上一篇    下一篇

基于改进k-means算法的中文词义归纳

张宜浩1,2,金澎1,2,孙锐1,2   

  1. 1. 乐山师范学院 计算机科学学院,四川 乐山614004
    2. 乐山师范学院 智能信息处理与应用实验室,四川 乐山614004
  • 收稿日期:2011-11-16 修回日期:2012-01-09 发布日期:2012-05-01 出版日期:2012-05-01
  • 通讯作者: 张宜浩
  • 作者简介:张宜浩(1982-),男,河南信阳人,讲师,博士研究生,主要研究方向:自然语言处理;金澎(1977-),男,河南开封人,副教授,博士,主要研究方向:自然语言处理;孙锐(1977-),男,四川眉山人,讲师,硕士,主要研究方向:自然语言处理。
  • 基金资助:

    国家自然科学基金资助项目(61003206);四川省教育厅科研项目(10ZB025)

Chinese word sense induction based on improved k-means algorithm

ZHANG Yi-hao1,2,JIN Peng1,2,SUN Rui1,2   

  1. 1. Laboratory of Intelligent Information Processing and Application,Leshan Teachers' College,Leshan Sichuan 614004,China
    2. School of Computer Science, Leshan Teachers' College, Leshan Sichuan 614004,China
  • Received:2011-11-16 Revised:2012-01-09 Online:2012-05-01 Published:2012-05-01
  • Contact: ZHANG Yi-hao

摘要: 汉语中一词多义现象普遍存在,词义归纳就是对在不同语境中具有相同语义的词进行归类,本质上是一聚类问题。目前广泛采用无指导的聚类方法对词义归纳进行研究,提出一种改进的k-means算法,该算法主要从初始簇中心的选取以及簇均值的计算两个方面进行改进,在一定程度上克服了其对“噪声”和孤立点数据的敏感。在特征表示上用同义词词林中词的分类编号来降低特征维度。实验表明改进k-means算法在性能上有较大的提升,F-Score达到了75.8%。

关键词: 词义归纳, k-means算法, 聚类, 同义词词林

Abstract: Polysemy is an important and pervasive semantic phenomenon in Chinese; the task of word sense induction is to classify words with the same semantics in different contexts, which is a clustering problem essentially. Currently, unsupervised clustering algorithm has been widely used in its research. In this paper, an improved method of k-means was proposed, which mainly improved the selection of initial cluster centers and the calculation of cluster centroid and overcame the “noise” and the sensitivity of isolated point in data to some extent. Another idea was to use the classification coding of word in Tongyici Cilin to reduce the feature dimension. The experimental results show that the performance has great improvement with the improved k-means, of which the F-Score reached 75.8%.

Key words: word sense induction, k-means algorithm, clustering, Tongyici Cilin

中图分类号: