基于用户主题精确感知大数据群体计算任务分配算法

doi:10.11772/j.issn.1001-9081.2016.10.2777

计算机应用 ›› 2016, Vol. 36 ›› Issue (10): 2777-2783.DOI: 10.11772/j.issn.1001-9081.2016.10.2777

基于用户主题精确感知大数据群体计算任务分配算法

王青¹, 谭良^1,2

1. 四川师范大学计算机科学学院, 成都 610101;
2. 中国科学院计算技术研究所, 北京 100190

收稿日期:2016-03-15 修回日期:2016-06-21 发布日期:2016-10-10 出版日期:2016-10-10
通讯作者: 王青,E-mail:wq920121@126.com
作者简介:王青(1992—),女,湖南衡阳人,硕士研究生,主要研究方向:大数据处理、数据挖掘、机器学习;谭良(1972—),男,四川成都人,教授,博士,CCF高级会员,主要研究方向:可信计算、网络安全、云计算、大数据处理。
基金资助:
国家自然科学基金资助项目（61373162）；四川省科技支撑计划项目（2014GZ007）。

Optimization algorithm for accurately theme-aware task assignment in crowd computing on big data

WANG Qing¹, TAN Liang^1,2

1. College of Computer Science, Sichuan Normal University, Chengdu Sichuan 610101, China;
2. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

Received:2016-03-15 Revised:2016-06-21 Online:2016-10-10 Published:2016-10-10
Supported by:
BackgroundThis work is partially supported by the National Natural Science Foundation of China (61373162), the Science and Technology Support Project of Sichuan Province (2014GZ007).

摘要/Abstract

摘要： 针对大数据任务中海量数据分析需求、复杂认知推理挑战和传统计算中随机分配任务算法的低效性以及互联网用户的虚拟性、不确定性等问题，提出了一种基于用户主题精准感知的迭代式任务分配算法。首先，通过基于自适应模糊聚类与主题提取模型相结合的方法提取已发布群体任务的主题，然后构建特定任务模型和用户模型计算各关联度，再利用已提交高质量答案的历史任务迭代地检测新用户的真实主题并计算初始准确率；其次，通过逻辑回归（LR）方法预测用户能参与到某类任务的可能性并得到参与用户候选序列，在充分了解用户真实主题和对应主题上的准确率以及用户诚信度的情况下进行精准分配。通过与随机算法在模拟实验中对准确率进行比较，实验结果表明所提算法准确率比随机算法高20个百分点以上，并随着训练数据量的增加而提高，在相似任务上的准确率更是接近100%。实验验证所提算法更精准、尤其适用于大数据环境，并一定程度上节约了随机算法需多次重复分配确保准确率的花销。

关键词: 群体计算, 主题匹配, 大数据, 逻辑回归, 人机协作

Abstract: Aiming at the problems of massive data analysis requirement, complex cognitive inference in big data tasks, low efficiency of random assignment algorithm and virtual property and uncertainty of Internet users, an optimization algorithm for accurately theme-aware task assignment in crowd computing on big data was proposed. Firstly, the themes in crowd computing were extracted by method which combined with theme extraction model with fuzzy-kmeans adaptation, then the correlations were computed through task model and user model. Secondly, new users' real theme and initial accuracy were tested by historical tasks with high quality answers. Lastly, the probability that a user can participate in a certain kind of task was calculated and a sequence of candidate sequences was predicted by Logistic Regression (LR), and then the appropriate workers were assigned accurately to the tasks. Compared with random algorithm, the accuracy of the proposed algorithm was more than 20 percentage points higher, which increases with the increase of the training data, and the accuracy was nearly close to 100% especially in correlation tasks through full training. The simulation results show that the proposed algorithm has a higher accuracy with more cost-effective and performance in big data environment.

Key words: crowd computing, theme match, big data, Logistic Regression (LR), human-computer cooperation

中图分类号:

TP391.1

王青, 谭良. 基于用户主题精确感知大数据群体计算任务分配算法[J]. 计算机应用, 2016, 36(10): 2777-2783.

WANG Qing, TAN Liang. Optimization algorithm for accurately theme-aware task assignment in crowd computing on big data[J]. Journal of Computer Applications, 2016, 36(10): 2777-2783.

参考文献

[1] 孟小峰, 李勇, 祝建华.社会计算:大数据时代的机遇与挑战[J]. 计算机研究与发展, 2013, 50(12):2483-2491.(MENG X F, LI Y, ZHU J H. Social computing in the era of big data: opportunities and challenges[J]. Journal of Computer Research and Development, 2013, 50(12): 2483-2491.)
[2] HOSIO S, GONCALVES J, KOSTAKOS V, et al. Crowdsourcing public opinion using urban pervasive technologies: lessons from real-life experiments in Oulu[J]. Policy & Internet, 2015, 7(2): 203-222.
[3] DOUGLAS V A, AULTMAN BECKER A. Encouraging better graphic design in libraries: a creative commons crowdsourcing approach[J]. Journal of Library Administration, 2015, 55(6): 459-472.
[4] 张晓航, 李国良, 冯建华.大数据群体计算中用户主题感知的任务分配[J]. 计算机研究与发展, 2015, 52(2):309-317.(ZHANG X H, LI G L, FENG J H. Theme-aware task assignment in crowd computing on big data [J]. Journal of Computer Research and Development, 2015, 52(2): 309-317.)
[5] 张引, 陈敏, 廖小飞. 大数据应用的现状与展望[J]. 计算机研究与发展, 2013, 50(增刊2):216-233.(ZHANG Y, CHEN M, LIAO X F. Big data applications: a survey [J]. Journal of Computer Research and Development, 2013, 50(S2):216-233.)
[6] 孟韬, 张媛, 董大海.基于威客模式的众包参与行为影响因素研究[J]. 中国软科学, 2014(12):112-123.(MENG T, ZHANG Y, DONG D H. The research on influencing factors of crowdsourcing participating behavior based on wickey model[J]. China Soft Science, 2014(12):112-123.)
[7] 叶晨, 王宏志, 周小田, 等. 基于众包的电子商务数据实体分类系统[J]. 计算机研究与发展, 2013, 50(增刊1):405-409.(YE C, WANG H Z, ZHOU X T, et al. Codesourcing-based e-commerce entity classification system[J]. Journal of Computer Research and Development, 2013, 50(S1):405-409.)
[8] 叶伟巍, 朱凌.面向创新的网络众包模式特征及实现路径研究[J]. 科学学研究, 2012(1):145-151.(YE W W, ZHU L. Study on the characteristics and achieving pathes for crowdsourcing innovation [J]. Studies in Science of Science, 2012(1):145-151.)
[9] 冯剑红, 李国良, 冯建华. 众包技术研究综述[J]. 计算机学报, 2015, 38(9):1713-1726.(FEND J H, LI G L, FEND J H. A survey on crowdsourcing[J]. Chinese Journal of Computers, 2015, 38(9):1713-1726.)
[10] 王飞跃, 王晓, 袁勇, 等.社会计算与计算社会:智慧社会的基础与必然[J]. 科学通报, 2015,60(增刊1):460-469.(WANG F Y, WANG X, YUAN Y, et al. Social computing and computational societies: the foundation and consequence of smart societies [J]. Science China Press, 2015,60(S1):460-469.)
[11] 刘云浩.群智感知计算[J]. 中国计算机学会通讯, 2012, 8(10):38-41.(LIU Y H. Crowd sensing computing [J]. Communications of the China Computer Federation, 2012, 8(10):38-41.)
[12] 赵妍妍, 秦兵, 刘挺.文本情感分析[J]. 软件学报, 2010, 21(8):1834-1848.(ZHAO Y Y, QIN B, LIU T. Sentiment analysis [J]. Journal of Software, 2010, 21(8):1834-1848.)
[13] 岳德君, 于戈, 申德荣, 等.基于投票一致性的众包质量评估策略[J]. 东北大学学报(自然科学版), 2014, 35(8): 1097-1101.(YUE D J, YU G, SHEN D R, et al. Crowdsourcing quality evaluation strategies based consistency on voting [J]. Journal of Northeastern University (Natural Science), 2014, 35(8): 1097-1101.)
[14] 韩清池, 赵国杰. 基于众包的开放式创新研究:现状与发展方向[J]. 科技进步与对策, 2014, 31(21): 11-16.(HAN Q C, ZHAO G J. Research on the open innovation based on crowdsourcing: state of the art and future directions[J]. Science & Technology Progress and Policy, 2014, 31(21): 11-16.)
[15] 朱小宁.支持任务推送的众包系统的研究与实现[D]. 北京:北京邮电大学, 2015:24-95.(ZHU X N. Research and implementation of a crowdsourcing system supporting task routing [D]. Beijing: Beijing University of Posts and Telecommunications, 2015:24-95.)
[16] 张志强, 逄居升, 谢晓芹, 等. 众包质量控制策略及评估算法研究[J]. 计算机学报, 2013, 36(8): 1636-1649.(ZHANG Z Q, PANG J S, XIE X Q, et al. Research on crowdsourcing quality control strategies and evaluation algorithm [J]. Chinese Journal of Computers, 2013, 36(8): 1636-1649.)
[17] 李勇军, 郭基凤, 缑西梅. 软件"众包"任务分配方法[J]. 计算机系统应用, 2015, 24(2): 1-6.(LI Y J, GUO J F, GOU X M. Software task allocation method in crowdsourcing[J]. Computer Systems & Applications, 2015, 24(2): 1-6.)
[18] 潘尔顺, 金垚, 叶亮. 基于逻辑回归的计数型质量特性健壮参数谨慎控制策略[J]. 上海交通大学学报(自然科学版), 2010,44(12):1711-1715.(PAN E S, JIN Y, YE L. Robust parameter control approach with cautious control strategy for attributes quality characteristics based on logistic regression model [J]. Journal of Shanghai Jiaotong University, 2010,44(12):1711-1715.)
[19] 毛林, 陆全华, 程涛. 基于高维数据的集成逻辑回归分类算法的研究与应用[J]. 科技通报, 2013,29(12):64-66.(MAO L, LU Q H, CHENG T. The research and application of ensemble logistic regression classification algorithm based on high dimensional data [J]. Bulletin of Science and Technology, 2013,29(12):64-66.)
[20] 孙爱程. 基于熵距离的离群点检测及其应用[J]. 无线电工程, 2012,42(6):45-47, 51.(SUN A C. Entropy distance-based outlier detection and its application [J]. Radio Engineering, 2012,42(6):45-47, 51.).
[21] LICHMAN, M. UCI machine learning repository [D]. Irvine, CA: University of California, School of Information and Computer Science, 2013.
[22] ASUNCION A, WELLING M, SMYTH P, et al. On smoothing and inference for topic models[C]//Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. Quebec: AUAI Press, 2009: 27-34.

基于用户主题精确感知大数据群体计算任务分配算法

Optimization algorithm for accurately theme-aware task assignment in crowd computing on big data

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	李旭, 何玉林, 崔来中, 黄哲学, PHILIPPE Fournier‑Viger. 基于大数据随机样本划分的分布式观测点分类器[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1727-1733.
[2]	曹萌, 余孙婕, 曾辉, 史红周. 基于区块链的医疗数据分级访问控制与共享系统[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1518-1526.
[3]	杨力, 陈建廷, 向阳. 基于HBase的工业时序大数据分布式存储性能优化策略[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 759-766.
[4]	凌宇, 单志龙. 基于兴趣增强的知识概念推荐系统[J]. 《计算机应用》唯一官方网站, 2023, 43(12): 3697-3702.
[5]	周翔, 翟俊海, 黄雅婕, 申瑞彩, 侯璎真. 基于随机森林和投票机制的大数据样例选择算法[J]. 计算机应用, 2021, 41(1): 74-80.
[6]	曹策俊, 刘桔. 灾害运作管理中应急组织决策建模方法综述[J]. 计算机应用, 2020, 40(7): 2142-2149.
[7]	朱小杰, 赵子豪, 杜一. 模型驱动的大数据流水线框架PiFlow[J]. 计算机应用, 2020, 40(6): 1638-1647.
[8]	吴文莉, 刘国华, 张君宝. 大数据上函数查询解答的复杂度分析[J]. 《计算机应用》唯一官方网站, 2020, 40(2): 416-419.
[9]	李孜颖, 石振国. 面向大数据任务的调度方法[J]. 计算机应用, 2020, 40(10): 2923-2928.
[10]	王雅萍, 张正军, 颜子寒, 金亚洲. 基于改进的迁移率模型的生物地理学优化算法[J]. 计算机应用, 2019, 39(9): 2511-2516.
[11]	章永来, 周耀鉴. 聚类算法综述[J]. 计算机应用, 2019, 39(7): 1869-1882.
[12]	马建刚, 马应龙. 语义驱动的司法文档学习分类方法[J]. 计算机应用, 2019, 39(6): 1696-1700.
[13]	纪丽娜, 陈凯, 于彦伟, 宋鹏, 王淑莹, 王成锐. 基于城市交通大数据的车辆类别挖掘及应用分析[J]. 计算机应用, 2019, 39(5): 1343-1350.
[14]	张译天, 于炯, 鲁亮, 李梓杨. 大数据流式计算框架Heron环境下的流分类任务调度策略[J]. 计算机应用, 2019, 39(4): 1106-1116.
[15]	王鑫, 李可, 徐明君, 宁晨. 改进的基于深度学习的遥感图像分类算法[J]. 计算机应用, 2019, 39(2): 382-387.