计算机应用 ›› 2014, Vol. 34 ›› Issue (9): 2639-2644.DOI: 10.11772/j.issn.1001-9081.2014.09.2639
收稿日期:
2014-04-09
修回日期:
2014-05-28
出版日期:
2014-09-01
发布日期:
2014-09-30
通讯作者:
苏赢彬
作者简介:
基金资助:
国家863计划项目
SU Yingbin1,2,DU Xuehui1,2,XIA Chuntao1,2,LI Haihua3
Received:
2014-04-09
Revised:
2014-05-28
Online:
2014-09-01
Published:
2014-09-30
Contact:
SU Yingbin
摘要:
由于办公终端可能出现敏感信息泄露的风险,对终端上的文档进行敏感信息检测就显得十分重要,但现有敏感信息检测方法中存在上下文信息无关的索引导致文档建模不准确、查询语义扩展不充分的问题。为此,首先提出基于上下文的文档索引平滑算法,构建尽可能保留文档信息的索引;然后改进查询语义扩展算法,结合领域本体中概念敏感度适当扩大敏感信息检测范围;最后将文档平滑和查询扩展融合于语言模型,在其基础上提出了文档敏感信息检测方法。将采用不同索引机制、查询关键字扩展算法及检测模型的四种方法进行比较,所提出的算法在文档敏感信息检测中的查全率、准确率和F值分别为0.798,0.786和0.792,各项性能指标均明显优于对比算法。结果表明该算法是一种能更有效检测敏感信息的方法。
中图分类号:
苏赢彬 杜学绘 夏春涛 李海华. 基于文档平滑和查询扩展的文档敏感信息检测方法[J]. 计算机应用, 2014, 34(9): 2639-2644.
SU Yingbin DU Xuehui XIA Chuntao LI Haihua. Sensitive information detection approach for documents based on document smoothing and query expansion[J]. Journal of Computer Applications, 2014, 34(9): 2639-2644.
[1]LI W, SUN L, NUO M, et al.Sensitive information filtering based on kernel method[J]. Journal on Communications, 2008, 29(4): 57-62. (李文波,孙乐,诺明花,等.基于核方法的敏感信息过滤的研究[J].通信学报,2008, 29(4):57-62.)
[2]YU S. Design and implementation of sensitive military information search system [D]. Chengdu: University of Electronic Science and Technology of China, 2012. (喻世玺. 军事敏感信息搜索系统的设计与实现[D].成都:电子科技大学, 2012.)
[3]TAO T, ZHAI C. Regularized estimation of mixture models for robust pseudo-relevance feedback [C]// Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2006: 162-169.
[4]CHEN H, DU X, XIA C, et al.Query expansion model based on interest ontology [C]// ICIII 2012: Proceedings of the 2012 International Conference on Information Management, Innovation Management and Industrial Engineering. Piscataway: IEEE, 2012, 3: 474-478.
[5]CHAUHAN R, GOUDAR R, SHARMA R, et al.Domain ontology based semantic search for efficient information retrieval through automatic query expansion [C]// ISSP 2013: Proceedings of the 2013 International Conference on Intelligent Systems and Signal Processing. Piscataway: IEEE, 2013: 397-402.
[6]GOYAL P, BEHERA L, MCGINNITY T M. A novel neighborhood based document smoothing model for information retrieval [J]. Information Retrieval, 2013, 16(3): 391-425.
[7]DAMANI O P. Improving Pointwise Mutual Information (PMI) by incorporating significant co-occurrence [EB/OL]. [2014-03-01]. http://arxiv.org/pdf/1307.0596v1.pdf.
〖BP(〗【arXiv preprint arXiv:1307.0596, 2013.〖BP)〗
[8]LIANG S. VSM information retrieval data sparseness problem analysis and avoidance strategies [J]. Library and Information Service, 2013, 57(1): 142-146. (梁士金. VSM 信息检索中的数据稀疏问题分析与规避策略[J].图书情报工作, 2013, 57(1): 142-146.)
〖HJ1.85mm〗[9]BAI J, NIE J Y, CAO G, et al.Using query contexts in information retrieval [C]// Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2007: 15-22.
[10]YAN C, GAO K, LI M. Processing natural language based query and context sensitive spelling suggestion in information retrieval [C]// ICMIC 2013: Proceedings of the 2013 International Conference on Modeling, Identification & Control. Piscataway: IEEE, 2013: 269-274.
[11]LI W, ZHAO T, WANG X. Context-sensitive query expansion [J]. Journal of Computer Research and Development, 2010 (2): 300-304. (李卫疆, 赵铁军, 王宪刚. 基于上下文的查询扩展[J]. 计算机研究与发展, 2010 (2): 300-304.)
[12]GOYAL P, BEHERA L, MCGINNITY T M. A Context-based word indexing model for document summarization [J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(8): 1693-1705.
[13]PONTE J M, CROFT W B. A language modeling approach to information retrieval [C]// Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 1998: 275-281.
[14]ZHAI C, LAFFERTY J. A study of smoothing methods for language models applied to information retrieval [J]. ACM Transactions on Information Systems, 2004, 22(2): 179-214. |
[1] | 毕文婷 林海涛 张立群. 基于多阶段演化信号博弈模型的移动目标防御决策算法[J]. 计算机应用, 0, (): 0-0. |
[2] | 朱玉娜, 张玉涛, 闫少阁, 范钰丹, 陈韩托. 基于半监督子空间聚类的协议识别方法[J]. 计算机应用, 2021, 41(10): 2900-2904. |
[3] | 杨书新 许景峰. 基于反向影响采样的积极影响力最大化[J]. 计算机应用, 0, (): 0-0. |
[4] | 郭棉, 张锦友. 移动边缘计算环境中面向机器学习的计算迁移策略[J]. 计算机应用, 2021, 41(9): 2639-2645. |
[5] | 倪萍, 陈伟. 基于模糊测试的反射型跨站脚本漏洞检测[J]. 计算机应用, 2021, 41(9): 2594-2601. |
[6] | 曾续玲 李陶深 巩健 杜利俊. 无线供能移动边缘计算系统的安全卸载优化[J]. 计算机应用, 0, (): 0-0. |
[7] | 谢家贵 李志平 金键. 基于星火区块链的跨链机制[J]. 计算机应用, 0, (): 0-0. |
[8] | 张立群 林海涛 郇文明 毕文婷. 基于OpenFlow的软件定义网络流规则冲突检测系统的设计与仿真[J]. 计算机应用, 0, (): 0-0. |
[9] | 赖涵光 李清 江勇. 基于场景变化的传输控制协议拥塞控制切换方案[J]. 计算机应用, 0, (): 0-0. |
[10] | 陈葳葳, 曹利, 顾翔. 基于区块链的车联网电子取证模型[J]. 计算机应用, 2021, 41(7): 1989-1995. |
[11] | 肖跃雷, 邓小凡. 基于证书的有线局域网安全关联方案改进与分析[J]. 计算机应用, 2021, 41(7): 1970-1976. |
[12] | 邓伟健 陈曦. 基于时变资源的容器化虚拟网络映射算法[J]. 计算机应用, 0, (): 0-0. |
[13] | 董文涛, 李卓, 陈昕. 基于联邦学习的在线短视频内容分发策略[J]. 计算机应用, 2021, 41(6): 1551-1556. |
[14] | 施安妮, 李陶深, 王哲, 何璐. 基于缓存辅助的全双工无线携能通信系统的中继选择策略[J]. 计算机应用, 2021, 41(6): 1539-1545. |
[15] | 葛丽娜, 胡雨谷, 张桂芬, 陈园园. 云计算环境基于客体属性匹配的逆向混合访问控制方案[J]. 计算机应用, 2021, 41(6): 1604-1610. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||