计算机应用 ›› 2017, Vol. 37 ›› Issue (9): 2671-2677.DOI: 10.11772/j.issn.1001-9081.2017.09.2671

• 数据科学与技术 • 上一篇    下一篇

综合社区与关联序列挖掘的电子政务推荐算法

黄亚坤1,2, 王杨1, 王明星1   

  1. 1. 安徽师范大学 数学计算机科学学院, 安徽 芜湖 241000;
    2. 安徽讯飞智能科技有限公司, 安徽 芜湖 241000
  • 收稿日期:2017-04-19 修回日期:2017-07-16 出版日期:2017-09-10 发布日期:2017-09-13
  • 通讯作者: 郭志刚,hyk_it@foxmail.com
  • 作者简介:黄亚坤(1992-),男,安徽合肥人,硕士研究生,CCF会员,主要研究方向:个性化推荐、数据挖掘;王杨(1971-),男,安徽灵璧人,教授,博士,主要研究方向:数据挖掘、机器学习、智能Agent;王明星(1992-),男,安徽合肥人,硕士研究生,CCF会员,主要研究方向:数据挖掘,社交网络。
  • 基金资助:
    国家自然科学基金资助项目(61572036); 安徽省人文社科重大专项(SK2014ZD033)。

E-government recommendation algorithm combining community and association sequence mining

HUANG Yakun1,2, WANG Yang1, WANG Mingxing1   

  1. 1. School of Mathematics & Computer Science, Anhui Normal University, Wuhu Anhui 241000, China;
    2. Anhui IFLYTEK Intelligent Technology Corporation, Wuhu Anhui 241000, China
  • Received:2017-04-19 Revised:2017-07-16 Online:2017-09-10 Published:2017-09-13
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61572036), the Key Grant Project of Humanities and Social Sciences of Anhui Province (SK2014ZD033).

摘要: 个性化推荐作为一种有效的信息获取手段已成功应用于电商、音乐和电影等领域。已有研究多数聚焦于推荐的精度,缺乏对推荐结果的多样性考虑,忽略了应用领域中被推荐项目的流程特性(如"互联网+政务"中办事项的推荐)。为此提出一种综合用户社区与关联序列挖掘(CAS-UC)的电子政务推荐算法,优先向用户推送利益关联最大的办事项。首先,对用户和办事项的静态基本属性以及动态行为属性分别进行特征建模;其次,基于用户的历史办事记录和属性相似度进行用户社区发现,预筛选出与目标用户最为相似的用户集,提高推荐结果的多样性,减少核心推荐过程的计算量;最后,办事项的关联序列挖掘充分考虑了电子政务的业务特性,加入时间维度的办事项序列挖掘,进一步提高了推荐结果的精度。以芜湖市易户网为平台载体,基于Spark计算平台对用户脱敏后的信息进行仿真,实验结果表明,CAS-UC适用于被推荐项目具有序列或流程特性领域的推荐,与传统推荐算法如协同过滤推荐、矩阵分解以及基于语义相似度的推荐算法相比,具有更高的推荐精度,用户的多社区归属因素增加了推荐结果的多样性。

关键词: 用户社区, 关联序列挖掘, Spark平台, 多样性, 电子政务推荐

Abstract: Personalized recommendation as an effective means of information gathering has been successfully applied to e-commerce, music and film and other fields. Most of the studies have focused on the recommended accuracy, lack of consideration of the diversity of recommended results, and neglected the process characteristics of the recommended items in the application area (e. g. "Internet of Things plus E-government"). Aiming at this problem, an e-government recommendation algorithm Combining User Community and Associated Sequence mining (CAS-UC) was proposed to recommend the items most associated with users. Firstly, the static basic attributes and dynamic behavior attributes of the users and items were modeled separately. Secondly, based on the user's historical record and attribute similarity for user community discovery, the user set most similar to the target user was pre-filtered to improve the diversity of the recommended results and reduce the computational amount of the core recommendation process. Finally, the associated sequence mining of the items was taken full account of the business characteristics of e-government, and the item sequence mining with time dimension was added to further improve the accuracy of the recommended results. The simulation experiments were carried out with the information after desensitization of users on the Spark platform of ewoho.com in Wuhu. The experimental results show that CAS-UC is suitable for the recommendation of items with sequence or process characteristics, and has higher recommendation accuracy compared with traditional recommendation algorithms such as cooperative filtering recommendation, matrix factorization and recommendation algorithm based on semantic similarity. The multi-community attribution factor of the user increases the diversity of the recommended results.

Key words: user community, associated sequence mining, Spark framework, diversity, e-government recommendation

中图分类号: