计算机应用 ›› 2019, Vol. 39 ›› Issue (5): 1351-1356.DOI: 10.11772/j.issn.1001-9081.2018112496
收稿日期:
2018-12-04
修回日期:
2018-12-18
出版日期:
2019-05-10
发布日期:
2019-05-14
通讯作者:
林煜明
作者简介:
赵威(1995-),男,河南商丘人,硕士研究生,主要研究方向:知识抽取与融合;林煜明(1978-),男,广西合浦人,副研究员,博士,CCF会员,主要研究方向:海量数据管理、知识图谱;黄涛贻(1994-),男,江苏无锡人,硕士研究生,主要研究方向:知识抽取与融合;李优(1978-),女,安徽涡阳人,副教授,硕士,主要研究方向:文本挖掘。
基金资助:
ZHAO Wei1, LIN Yuming1, HUANG Taoyi1, LI You2
Received:
2018-12-04
Revised:
2018-12-18
Online:
2019-05-10
Published:
2019-05-14
Supported by:
摘要: 用户评论包含了丰富的用户观点信息,对潜在的顾客和商家具有重要的参考价值。观点目标和观点词作为用户评论中的核心对象,它们的自动抽取是用户评论智能化应用的一项核心工作。目前主要采用有监督的抽取方法解决该问题,这些方法依赖于利用高质量的标注样本进行模型训练,而传统人工标注样本的方法不仅耗时费力,且标注成本高。众包计算为构建高质量训练样本集提供了一种有效途径,然而,众包工作者由于知识背景等因素使得标注结果的质量参差不齐。为了在有限的成本下获取高质量的标注样本,提出一种基于工作者专业水平评估的自适应众包标注方法,构建可靠的观点目标-观点词数据集。首先,通过小成本挖掘出高专业水平的工作者;然后,设计一种基于工作者可靠性的任务分发机制;最后,利用观点目标和观点词间的依赖关系设计了一种有效的标注结果融合算法,通过整合不同工作者的标注结果生成最终可靠的结果。在真实数据集上进行了一系列实验表明,与GLAD模型和多数投票(MV)算法方法相比,所提方法能够在成本预算较小的情况下将构建出的高质量观点目标-观点词数据集的可靠性提高10%左右。
中图分类号:
赵威, 林煜明, 黄涛贻, 李优. 成本约束下自适应众包标注的用户观点抽取[J]. 计算机应用, 2019, 39(5): 1351-1356.
ZHAO Wei, LIN Yuming, HUANG Taoyi, LI You. User opinion extraction based on adaptive crowd labeling with cost constrain[J]. Journal of Computer Applications, 2019, 39(5): 1351-1356.
[1] WANG H, WANG H, YIN H Z, et al. A unified framework for fine-grained opinion mining from online reviews[C]// Proceedings of the 201649th Hawaii International Conference on System Sciences. Piscataway, NJ: IEEE, 2016:1134-1143. [2] TANG D Y, QIN B, FENG X C, et al. Effective LSTMs for target-dependent sentiment classification[J/OL]. arXiv Preprint, 2015, 2015: arXiv:1512.01100(2015-12-03)[2016-09-26]. https://arxiv.org/abs/1512.01100. [3] LIN Y M, JIANG X X, LI Y, et al. Collective extraction for opinion targets and opinion words from online reviews[C]// Proceedings of the 20167th International Conference on Cloud Computing and Big Data. Washington, DC: IEEE Computer Society, 2017: 3949-3958. [4] KRIZHEVSKY A, SUTSKEVER I, HINTON G. ImageNet classification with deep convolutional neural networks[C]// Proceedings of the 25th International Conference on Neural Information Processing Systems. New York: Curran Associates, 2012:1097-1105. [5] LEASE M, ALONSO O. Crowdsourcing for search evaluation and social-algorithmic search[C]// Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2012:1180. [6] CHANG J C, AMERSHI S, KAMAR E. Revolt: collaborative crowdsourcing for labeling machine learning datasets[C]// Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2017:2334-2346. [7] MITRA T, HUTTO C J, GILBERT E. Comparing person-and process-centric strategies for obtaining quality data on Amazon mechanical turk[C]// Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. New York: ACM, 2015:1345-1354. [8] RAYKAR V C, VIKAS C. Supervised learning from multiple experts: whom to trust when everyone lies a bit[C]// Proceedings of the 26th Annual International Conference on Machine Learning. New York: ACM, 2009:889-896. [9] DONMEZ, PINAR, CARBONELL J G, et al. Efficiently learning the accuracy of labeling sources for selective sampling[C]// Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York: ACM, 2009:259-268. [10] XI C, LIN Q H, ZHOU D Y. Optimistic knowledge gradient policy for optimal budget allocation in crowdsourcing[C]// Proceedings of the 2013 International Conference on Machine Learning. Cambridge: MIT Press, 2013:64-72. [11] 冯剑红, 李国良, 冯建华. 众包技术研究综述[J]. 计算机学报, 2015, 38(9):1713-1726.(FENG J H, LI G L, FENG J H. A survey on crowdsourcing[J]. Chinese Journal of Computers, 2015, 38(9):1713-1726.) [12] 毛莺池, 穆超, 包威. 空间众包中多类型任务的分配与调度方法[J]. 计算机应用, 2018,38(1):6-12.(MAO Y C,MU C,BAO W. Multi-type task assignment and scheduling oriented to spatial crowdsourcing[J]. Journal of Computer Applications,2018, 38(1):6-12.) [13] 施战, 辛煜, 孙玉娥. 基于用户可靠性的众包系统任务分配机制[J]. 计算机应用, 2017, 37(9):2449-2453.(SHI Z, XIN Y, SUN Y E. Task allocation mechanism for crowdsourcing system based on reliability of users[J]. Journal of Computer Applications, 2017, 37(9):2449-2453.) [14] LIU X, LU M Y, OOI B C, et al. CDAS: a crowdsourcing data analytics system[J]. Proceedings of the VLDB Endowment, 2012, 5(10):1040-1051. [15] OMAR F Z, CHRIS C B. Crowdsourcing translation: professional quality from non-professionals[C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2011:1220-1229. [16] JACOB W, PAUL R, WU T F, et al. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise[C]// Proceedings of the 22nd International Conference on Neural Information Processing Systems. New York: Curran Associates, 2009: 2035-2043. [17] SNOW R, CONNOR B O, JURAFSKY D, et al. Cheap and fast — but is it good? evaluating non-expert annotations for natural language tasks[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2008: 254-263. [18] SARMA A D, PARAMESWARAN A, WIDOM J. Towards globally optimal crowdsourcing quality management: the uniform worker setting[C]// Proceedings of the 2016 International Conference on Management of Data. New York: ACM, 2016:47-62. [19] FENG J, LI G, WANG H, et al. Incremental quality inference in crowdsourcing[C]// DASFAA 2014: International Conference on Database Systems for Advanced Applications. Berlin: Springer, 2014:453-467. [20] DEMARTINI G, DIFALLAH D E, MAUROUX P C. ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking[C]// Proceedings of the 21st International Conference on World Wide Web. New York: ACM, 2012: 469-478. [21] McCALLUM D R, PETERSON J L. Computer-based readability indexes[C]// Proceedings of the ACM'82 Conference. New York: ACM, 1982: 44-48. [22] HU M, LIU B. Mining opinion features in customer reviews[C]// Proceedings of the 19th National Conference on Artifical Intelligence. Menlo Park: AAAI Press, 2004:755-760. |
[1] | 吴军 欧阳艾嘉 张琳. 基于影响度的统计显著序列模式挖掘算法[J]. 计算机应用, 0, (): 0-0. |
[2] | 张璐 方春 祝铭. 基于Res2Net-YOLACT和融合特征的室内跌倒检测算法[J]. 计算机应用, 0, (): 0-0. |
[3] | 殷雨昌 王洪元 陈莉 冯尊登 肖宇. 基于单标注样本的多损失学习与联合度量视频行人重识别[J]. 计算机应用, 0, (): 0-0. |
[4] | 胡军 许正康 刘立 钟福金 张清华. 融合多粒度社区信息的网络嵌入方法[J]. 计算机应用, 0, (): 0-0. |
[5] | 李润泽 孙雪姣. 基于时间条件提取序列的数据流偏好查询[J]. 计算机应用, 0, (): 0-0. |
[6] | 罗圣钦 陈金怡 李洪均. 基于注意力机制的多尺度残差UNet实现乳腺癌灶分割[J]. 计算机应用, 0, (): 0-0. |
[7] | 曹一珉 蔡磊 高敬阳. 基于生成对抗网络的基因数据生成方法[J]. 计算机应用, 0, (): 0-0. |
[8] | 陈冲 闫珠 赵继轩 何为 梁华庆. 基于集合经验模态分解和长短期记忆网络的催化裂化装置NOx排放预测[J]. 计算机应用, 0, (): 0-0. |
[9] | 徐光柱 林文杰 陈莎 匡婉 雷帮军 周军. U-Net与自适应阈值脉冲耦合神经网络相结合的眼底血管分割方法[J]. 计算机应用, 0, (): 0-0. |
[10] | 杨鼎康 黄帅 王顺利 翟鹏 李一丹 张立华. 基于对抗生成网络和网络集成的面部表情识别方法EE-GAN[J]. 计算机应用, 0, (): 0-0. |
[11] | 李讷 徐光柱 雷帮军 马国亮 石勇涛. 交通道路行驶车辆车标识别算法[J]. 计算机应用, 0, (): 0-0. |
[12] | 孟杰 王莉 杨延杰 廉飚. 基于多模态深度融合的虚假信息检测[J]. 计算机应用, 0, (): 0-0. |
[13] | 秦庭威 赵鹏程 秦品乐 曾建朝 柴锐 黄永琦. 基于残差注意力机制的点云配准算法[J]. 计算机应用, 0, (): 0-0. |
[14] | 鲁永帅 唐英杰 马鑫然. 基于深度特征融合的无纺布低对比度浆丝缺陷检测方法[J]. 计算机应用, 0, (): 0-0. |
[15] | 王宇航 周永霞 吴良武. 基于高斯函数的池化算法[J]. 计算机应用, 0, (): 0-0. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||