计算机应用 ›› 2013, Vol. 33 ›› Issue (03): 838-840.DOI: 10.3724/SP.J.1087.2013.00838

• 先进计算 • 上一篇    下一篇

协同过滤在中文维基百科类别推荐上的应用

王静1,2,何婷婷1,2*,衣马木艾山·阿布都力克木3   

  1. 1.华中师范大学 计算机学院, 武汉 430079;
    2.国家语言资源监测与研究中心 网络媒体语言分中心, 武汉 430079;
    3.国家数字化学习工程技术研究中心(华中师范大学), 武汉 430079
  • 收稿日期:2012-09-18 修回日期:2012-11-12 出版日期:2013-03-01 发布日期:2013-03-01
  • 通讯作者: 王静
  • 作者简介:王静(1989-),女,湖北荆州人,硕士研究生,主要研究方向:自然语言处理; 何婷婷(1964-),女,湖北黄冈人,教授,博士,主要研究方向:自然语言处理、数据库; 衣马木艾山·阿布都力克木(1973-),男(维吾尔族),新疆伊宁人,副教授,博士,主要研究方向:网络信息检索。
  • 基金资助:

    国家自然科学基金资助项目(90920005, 61003192); 国家语委“十二五”重点项目(ZDI125-1); 国家“十二五”科技支撑计划项目(2012BAK24B01); 教育部/国家外国专家局高等学校学科创新引智计划项目(B07042); 湖北省自然科学基金资助项目(2011CDA034); 华中师范大学中央高校基本科研业务费专项资金资助项目(CCNU10A02009, CCNU10C01005)。

Application of cooperative filtering in categories recommendation of Chinese Wikipedia

WANG Jing1,2, HE Tingting1,2*, Yimamu'aishan ABUDOULIKEMU3   

  1. 1.School of Computer Science, Central China Normal University, Wuhan Hubei 430079, China;
    2.Network Media Branch, National Language Resources Monitoring and Research Center, Wuhan Hubei 430079, China;
    3.National Engineering Research Center for E-Learning (Central China Normal University), Wuhan Hubei 430079, China
  • Received:2012-09-18 Revised:2012-11-12 Online:2013-03-01 Published:2013-03-01
  • Contact: Jing WANG
  • Supported by:

    The Major Research Plan of National Natural Science Foundation of China;The Major Project of State Language Commission in the Twelfth Five-year Plan Period;Project in the National Science & Technology Pillar Program in the Twelfth Five-year Plan Period;Program of Introducing Talents of Discipline to Universities;Natural Science Foundation of Hubei Province;the self-determined research funds of CCNU from the colleges’ basic research and operation of MOE

摘要: 针对传统人工编辑导致大量类别信息重复和不规范的问题,提出了应用协同过滤技术为中文维基百科文章自动推荐类别。利用中文维基百科中的四个重要语义特征即链入、链出、链入的类别和链出的类别来表示维基百科文章,得到与目标文章相似的前若干篇文章的所有类别后,通过查询返回的相似度值计算各个类别的权重,选择前面的若干个类别作为推荐结果返回给目标文章。实验结果表明了这四个语义特征能较好地表征一篇维基百科文章,同时也验证了协同过滤方法在中文维基百科自动推荐类别中的有效性。

关键词: 协同过滤, 中文维基百科, 类别推荐, 语义特征

Abstract: Collaborative filtering was applied to automatically recommend categories for a Chinese Wikipedia article. Four typical semantic features namely incoming link, outgoing link, incoming link categories and outgoing link categories, were adopted to represent articles. Among all the categories of articles similar to target article, several most similar categories were chosen as the recommendation results to the target article, via calculating the similarity value between them. The experimental results show that the four semantic features have efficient performance in Wikipedia article representation. And the collaborative filtering method is also proved to be effective in recommending proper categories for Chinese Wikipedia articles.

Key words: collaborative filtering, Chinese Wikipedia, category recommendation, semantic feature

中图分类号: