计算机应用 ›› 2017, Vol. 37 ›› Issue (6): 1697-1701.DOI: 10.11772/j.issn.1001-9081.2017.06.1697

• 人工智能 • 上一篇    下一篇

基于用户身份特征的多标签分类算法

郑晓雪1,2, 张大方1,2, 刁祖龙1,2   

  1. 1. 湖南大学 信息科学与工程学院, 长沙 410082;
    2. 湖南大学 可信系统与网络实验室, 长沙 410082
  • 收稿日期:2016-11-14 修回日期:2017-01-16 出版日期:2017-06-10 发布日期:2017-06-14
  • 通讯作者: 郑晓雪
  • 作者简介:郑晓雪(1990-),女,吉林松原人,硕士研究生,主要研究方向:数据挖掘、机器学习;张大方(1959-),男,上海人,教授,博士,CCF会员,主要研究方向:可信系统与网络、软件测试、下一代互联网;刁祖龙(1988-),男,湖南株洲人,博士研究生,主要研究方向:大数据、数据挖掘、机器学习。

Multi-label classification algorithm based on user identity

ZHENG Xiaoxue1,2, ZHANG Dafang1,2, DIAO Zulong1,2   

  1. 1. College of Computer Science and Electronic Engineering, Hunan University, Changsha Hunan 410082, China;
    2. Laboratory of Dependable Systems and Network, Hunan University, Changsha Hunan 410082, China
  • Received:2016-11-14 Revised:2017-01-16 Online:2017-06-10 Published:2017-06-14

摘要: 目前对于智慧校园中的家校沟通,缺乏一种衡量和参考的方法。针对智慧校园中特有的聊天特点即存在明显的身份特征,提出了一种基于用户身份特征的多标签分类算法——Adaboost.ML。首先,新增加了启发式规则;然后,引入Adaboost.MH算法,同时摒弃了把数据集进行分片的概念;最后,直接利用单条数据作为分析的焦点,减少了由于时间片边缘带来的误差和推断时间,综合决策出聊天用户之间的关联关系。实验结果表明,与基于规则的启发式方法相比,所提算法在智慧校园数据集上的误报率、漏报率分别降低了53%、66%,同时在微信数据集上也具有良好的分类效果。该算法已应用到智慧校园项目中,能够迅速并准确地了解到家校沟通的情况。

关键词: 社会网络, 智慧校园, 启发式规则, 多标签判断, 集成学习

Abstract: At present there lacks a way to measure home-school communication in a smart campus. Concerning the obvious identity characteristics when chatting in a smart campus, a new multi-label classification algorithm named Adaboost.ML (Multiclass, multi-label version of Adaboost based on user identity) was proposed. Firstly, the heuristic rule was added for the proposed algorithm. Then, the Adaboost.MH (Multiclass,multi-label version of Adaboost based on Hamming loss) algorithm was introduced, and the concept of dataset sharding was discarded. Finally, the single data was used as the focus of analysis, which reduced the inference time and the error caused by the edge of the time slice. The comprehensive decision-making about the relationship between the chat users was made out. The experimental results show that, compared with the heuristic algorithm based on rules, the false positive rate of the proposed algorithm is decreased by 53% while its false negative rate is reduced by 66% on the dataset of smart campus. The proposed algorithm also has good classification results on the dataset of WeChat. At present, the proposed algorithm has been applied to the smart campus project, and it can get home-school communication quickly and accurately.

Key words: social network, smart campus, heuristic rule, multi-label judgment, ensemble learning

中图分类号: