计算机应用 ›› 2013, Vol. 33 ›› Issue (01): 189-191.DOI: 10.3724/SP.J.1087.2013.00189

• 人工智能 • 上一篇    下一篇

基于形式概念分析的博客社区发现

刘兆庆,伏玉琛,凌兴宏,熊湘云   

  1. 苏州大学 计算机科学与技术学院, 江苏 苏州 215006
  • 收稿日期:2012-07-30 修回日期:2012-08-27 出版日期:2013-01-01 发布日期:2013-01-09
  • 通讯作者: 刘兆庆
  • 作者简介:刘兆庆(1987-),男,江苏徐州人,硕士研究生,主要研究方向:Web数据挖掘、社区发现;伏玉琛(1968-),男,江苏徐州人,副教授,博士,主要研究方向:机器学习、智能信息处理;凌兴宏(1968-),男,江苏宝应人,副教授,博士,主要研究方向:语义Web、机器学习;熊湘云(1987-),女,江苏连云港人,硕士研究生,主要研究方向:社区结构、Web数据挖掘。
  • 基金资助:

    国家自然科学基金资助项目(61070122)

Blog community detection based on formal concept analysis

LIU Zhaoqing,FU Yuchen,LING Xinghong,XIONG Xiangyun   

  1. School of Computer Science and Technology, Soochow University, Suzhou Jiangsu 215006, China
  • Received:2012-07-30 Revised:2012-08-27 Online:2013-01-01 Published:2013-01-09
  • Contact: LIU Zhaoqing

摘要: 针对拖网算法存在的发现Web社区数量过多、社区间页面重复率较高以及严格的社区定义形成孤立社区等问题,提出一种基于形式概念分析(FCA)的博客社区发现算法。根据博客网络之间的链接关系构造概念格,通过格的代数消解对原始概念格进行等价划分,度量每个划分中概念间外延和内涵的结构相似性进而合并社区核心形成社区。实验结果表明:测试数据集中社区核心的网络密度大于40%的占全部的83.420%,合并社区的网络直径为3,且社区内容丰富程度得到提高。所提算法可以有效地运用于博客、微博等社交网络的社区发现,具有显著的应用价值和现实意义。

关键词: 博客社区, 社区发现, 形式概念分析, 链接分析, 社交网络

Abstract: Several problems exist in trawling algorithm, such as too many Web communities, high repetition rate between community-cores and isolated community formed by strict definition of community. Thus, an algorithm detecting Blog community based on Formal Concept Analysis (FCA) was proposed. Firstly, concept lattice was formed according to the linkage relations between Blogs,then clusters were divided from the lattice based on equivalence relation, finally communities were clustered in each cluster based on the similarity of concepts. The experimental results show that, the community-cores, which network density is greater than 40%, occupied 83.420% of all in testing data set, the network diameter of combined community is 3, and the content of community gets enriched significantly. The proposed algorithm can be effectively used to detect communities in Blog, micro-Blog and other social networks, and it has significant application value and practical meaning.

Key words: Blog community, community detection, Formal Concept Analysis (FCA), link analysis, social network

中图分类号: