计算机应用 ›› 2016, Vol. 36 ›› Issue (10): 2777-2783.DOI: 10.11772/j.issn.1001-9081.2016.10.2777

• 人工智能 • 上一篇    下一篇

基于用户主题精确感知大数据群体计算任务分配算法

王青1, 谭良1,2   

  1. 1. 四川师范大学 计算机科学学院, 成都 610101;
    2. 中国科学院 计算技术研究所, 北京 100190
  • 收稿日期:2016-03-15 修回日期:2016-06-21 出版日期:2016-10-10 发布日期:2016-10-10
  • 通讯作者: 王青,E-mail:wq920121@126.com
  • 作者简介:王青(1992—),女,湖南衡阳人,硕士研究生,主要研究方向:大数据处理、数据挖掘、机器学习;谭良(1972—),男,四川成都人,教授,博士,CCF高级会员,主要研究方向:可信计算、网络安全、云计算、大数据处理。
  • 基金资助:
    国家自然科学基金资助项目(61373162);四川省科技支撑计划项目(2014GZ007)。

Optimization algorithm for accurately theme-aware task assignment in crowd computing on big data

WANG Qing1, TAN Liang1,2   

  1. 1. College of Computer Science, Sichuan Normal University, Chengdu Sichuan 610101, China;
    2. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
  • Received:2016-03-15 Revised:2016-06-21 Online:2016-10-10 Published:2016-10-10
  • Supported by:
    BackgroundThis work is partially supported by the National Natural Science Foundation of China (61373162), the Science and Technology Support Project of Sichuan Province (2014GZ007).

摘要: 针对大数据任务中海量数据分析需求、复杂认知推理挑战和传统计算中随机分配任务算法的低效性以及互联网用户的虚拟性、不确定性等问题,提出了一种基于用户主题精准感知的迭代式任务分配算法。首先,通过基于自适应模糊聚类与主题提取模型相结合的方法提取已发布群体任务的主题,然后构建特定任务模型和用户模型计算各关联度,再利用已提交高质量答案的历史任务迭代地检测新用户的真实主题并计算初始准确率;其次,通过逻辑回归(LR)方法预测用户能参与到某类任务的可能性并得到参与用户候选序列,在充分了解用户真实主题和对应主题上的准确率以及用户诚信度的情况下进行精准分配。通过与随机算法在模拟实验中对准确率进行比较,实验结果表明所提算法准确率比随机算法高20个百分点以上,并随着训练数据量的增加而提高,在相似任务上的准确率更是接近100%。实验验证所提算法更精准、尤其适用于大数据环境,并一定程度上节约了随机算法需多次重复分配确保准确率的花销。

关键词: 群体计算, 主题匹配, 大数据, 逻辑回归, 人机协作

Abstract: Aiming at the problems of massive data analysis requirement, complex cognitive inference in big data tasks, low efficiency of random assignment algorithm and virtual property and uncertainty of Internet users, an optimization algorithm for accurately theme-aware task assignment in crowd computing on big data was proposed. Firstly, the themes in crowd computing were extracted by method which combined with theme extraction model with fuzzy-kmeans adaptation, then the correlations were computed through task model and user model. Secondly, new users' real theme and initial accuracy were tested by historical tasks with high quality answers. Lastly, the probability that a user can participate in a certain kind of task was calculated and a sequence of candidate sequences was predicted by Logistic Regression (LR), and then the appropriate workers were assigned accurately to the tasks. Compared with random algorithm, the accuracy of the proposed algorithm was more than 20 percentage points higher, which increases with the increase of the training data, and the accuracy was nearly close to 100% especially in correlation tasks through full training. The simulation results show that the proposed algorithm has a higher accuracy with more cost-effective and performance in big data environment.

Key words: crowd computing, theme match, big data, Logistic Regression (LR), human-computer cooperation

中图分类号: