Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (4): 1170-1177.DOI: 10.11772/j.issn.1001-9081.2021071248
Special Issue: CCF第36届中国计算机应用大会 (CCF NCCA 2021)
• The 36 CCF National Conference of Computer Applications (CCF NCCA 2020) • Previous Articles Next Articles
Xingshuo DING1, Xiang LI1(), Qian XIE2,3
Received:
2021-07-16
Revised:
2021-09-01
Accepted:
2021-09-14
Online:
2021-09-18
Published:
2022-04-10
Contact:
Xiang LI
About author:
DING Xingshuo, born in 1996, M. S. candidate. His research interests include data mining, enterprise portrait.Supported by:
通讯作者:
李翔
作者简介:
丁行硕(1996—),男,山东枣庄人,硕士研究生,主要研究方向:数据挖掘、企业画像基金资助:
CLC Number:
Xingshuo DING, Xiang LI, Qian XIE. Enterprise portrait construction method based on label layering and deepening modeling[J]. Journal of Computer Applications, 2022, 42(4): 1170-1177.
丁行硕, 李翔, 谢乾. 基于标签分层延深建模的企业画像构建方法[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1170-1177.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021071248
标签类型 | 一级指标 | 二级指标 | 标签描述 | 模糊标记 |
---|---|---|---|---|
统计类标签 | 基本信息 | 公司简介、地理信息、工商信息 | 当前企业注册信息 | 行业类型出现贸易、批发业、零售业等 |
科研信息 | 软著信息、专利信息、著作权 | 当前申请的科研信息 | 所属领域、类型、研究热点等 | |
其他信息 | 企业荣誉、资质证书 | 当前获得的荣誉和资质 | 不易出现 | |
规则类标签 | 企业规模 | 员工数量、企业资金 | 当前员工数量和资金情况 | 不易出现 |
企业价值 | x1,x2,…,xn | 制定评价指标综合评价 | ||
风险等级 | 处罚信息 | 统计处罚信息 | ||
挖掘类标签 | 企业需求 | 招聘信息 | 挖掘当前招聘信息 | 企业特点、需求不明确等 |
企业特点 | y1,y2,…,yn | 使用TF-IDF关键词提取 |
Tab. 1 Fuzzy label index system
标签类型 | 一级指标 | 二级指标 | 标签描述 | 模糊标记 |
---|---|---|---|---|
统计类标签 | 基本信息 | 公司简介、地理信息、工商信息 | 当前企业注册信息 | 行业类型出现贸易、批发业、零售业等 |
科研信息 | 软著信息、专利信息、著作权 | 当前申请的科研信息 | 所属领域、类型、研究热点等 | |
其他信息 | 企业荣誉、资质证书 | 当前获得的荣誉和资质 | 不易出现 | |
规则类标签 | 企业规模 | 员工数量、企业资金 | 当前员工数量和资金情况 | 不易出现 |
企业价值 | x1,x2,…,xn | 制定评价指标综合评价 | ||
风险等级 | 处罚信息 | 统计处罚信息 | ||
挖掘类标签 | 企业需求 | 招聘信息 | 挖掘当前招聘信息 | 企业特点、需求不明确等 |
企业特点 | y1,y2,…,yn | 使用TF-IDF关键词提取 |
数据集 | 企业文档篇数 | 句子数 |
---|---|---|
训练集 | 146 934 | 151 488 |
验证集 | 48 978 | 50 485 |
测试集 | 48 978 | 50 606 |
Tab. 2 Detail information of experimental datasets
数据集 | 企业文档篇数 | 句子数 |
---|---|---|
训练集 | 146 934 | 151 488 |
验证集 | 48 978 | 50 485 |
测试集 | 48 978 | 50 606 |
模型 | 精确率 | 召回率 | F1_score |
---|---|---|---|
KNN | 79.70 | 80.23 | 79.83 |
NBC | 81.30 | 82.05 | 81.43 |
CNN | 88.31 | 88.47 | 88.38 |
Deep CNN | 89.18 | 89.37 | 89.26 |
BiGRU | 89.41 | 89.59 | 89.48 |
BiLSTM+Attention | 89.45 | 89.45 | 89.37 |
RNN+CNN | 89.68 | 89.69 | 89.68 |
BERT+Deep CNN | 90.47 | 90.47 | 90.45 |
EPLLD | 91.11 | 91.12 | 91.08 |
Tab. 3 Comparative experimental results
模型 | 精确率 | 召回率 | F1_score |
---|---|---|---|
KNN | 79.70 | 80.23 | 79.83 |
NBC | 81.30 | 82.05 | 81.43 |
CNN | 88.31 | 88.47 | 88.38 |
Deep CNN | 89.18 | 89.37 | 89.26 |
BiGRU | 89.41 | 89.59 | 89.48 |
BiLSTM+Attention | 89.45 | 89.45 | 89.37 |
RNN+CNN | 89.68 | 89.69 | 89.68 |
BERT+Deep CNN | 90.47 | 90.47 | 90.45 |
EPLLD | 91.11 | 91.12 | 91.08 |
建模方法 | 一级标签 | 二级标签 | 三级标签 | 分词标签 |
---|---|---|---|---|
EPLLD模型 | 批发 | 商贸 | 化工、钢材、工程 | 橡胶制品、建筑、施工等 |
传统分析处理模型 | 化工、工程 | 橡胶制品、建筑、施工等 |
Tab. 4 Label modeling comparison results
建模方法 | 一级标签 | 二级标签 | 三级标签 | 分词标签 |
---|---|---|---|---|
EPLLD模型 | 批发 | 商贸 | 化工、钢材、工程 | 橡胶制品、建筑、施工等 |
传统分析处理模型 | 化工、工程 | 橡胶制品、建筑、施工等 |
1 | 王洋,丁志刚,郑树泉,等. 一种用户画像系统的设计与实现[J]. 计算机应用与软件, 2018, 35(3):8-14. 10.3969/j.issn.1000-386x.2018.03.002 |
WANG Y, DING Z G, ZHENG S Q, et al. Design and implementation of user portrait system[J]. Computer Applications and Software, 2018, 35(3): 8-14. 10.3969/j.issn.1000-386x.2018.03.002 | |
2 | ASMUSSEN C B, MØLLER C. Enabling supply chain analytics for enterprise information systems: a topic modelling literature review and future research agenda[J]. Enterprise Information Systems, 2020, 14(5): 563-610. 10.1080/17517575.2020.1734240 |
3 | GOLEMATI M, KATIFORI A, VASSILAKIS C, et al. Creating an ontology for the user profile: method and applications[C]// Proceedings of the First International Conference on Research Challenges in Information Science. Cham:Springer,2007:1-7. 10.1109/iv.2008.90 |
4 | ADOMAVICIUS G, TUZHILIN A. Using data mining methods to build customer profiles[J]. Computer, 2001, 34(2): 74-82. 10.1109/2.901170 |
5 | NAWAZ W, KHAN K U, LEE Y K. A multi-user perspective for personalized email communities[J]. Expert Systems with Applications, 2016, 54: 265-283. 10.1016/j.eswa.2016.01.046 |
6 | WANG G, ZHANG X Y, TANG S L, et al. Unsupervised clickstream clustering for user behavior analysis[C]// Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2016: 225-236. 10.1145/2858036.2858107 |
7 | 张丽,吕康银. 智慧城市公共服务数据画像及应用模式研究[J]. 情报科学, 2020, 38(10):61-67, 89. |
ZHANG L, LYU K Y. On the public service data portrait and application model of smart city[J]. Information Science, 2020, 38(10): 61-67, 89. | |
8 | CHEN X H, PANG J, XUE R. Constructing and comparing user mobility profiles[J]. ACM Transactions on the Web, 2014, 8(4): No.21. 10.1145/2637483 |
9 | CHEN T Q, GUESTRIN C. XGBoost: a scalable tree boosting system[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016: 785-794. 10.1145/2939672.2939785 |
10 | CARMAGNOLA F, CENA F, GENA C. User modeling in the social web[C]// Proceedings of the 2007 International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, LNCS 4694. Berlin: Springer, 2007: 745-752. |
11 | SHIN Y, LEE S J, PARK J. Composition pattern oriented tag extraction from short documents using a structural learning method[J]. Knowledge and Information Systems, 2014, 38(2): 447-468. 10.1007/s10115-012-0594-6 |
12 | 陈志明,胡震云. UGC网站用户画像研究[J]. 计算机系统应用, 2017, 26(1):24-30. |
CHEN Z M, HU Z Y. User portrait study on UGC website[J]. Computer Systems and Applications, 2017, 26(1): 24-30. | |
13 | CHUN R, DAVIES G. The influence of corporate character on customers and employees: exploring similarities and differences[J]. Journal of the Academy of Marketing Science, 2006, 34(2): 138-146. 10.1177/0092070305284975 |
14 | MAŤOVÁ H, DZIAN M, TRIZNOVÁ, et al. Corporate image profile[J]. Procedia Economics and Finance, 2015, 34: 225-230. 10.1016/s2212-5671(15)01623-8 |
15 | 杨沛安,刘宝旭,杜翔宇. 面向攻击识别的威胁情报画像分析[J]. 计算机工程, 2020, 46(1):136-143. |
YANG P A, LIU B X, DU X Y. Portrait analysis of threat intelligence for attack recognition[J]. Computer Engineering, 2020, 46(1): 136-143. | |
16 | 李晓敏,熊回香,杜瑾,等. 智慧图书馆中基于用户画像的图书推荐研究[J]. 情报科学, 2021, 39(7):15-22. |
LI X M, XIONG H X, DU J, et al. Book recommendation based on user portraits in smart library[J]. Information Science, 2021, 39(7): 15-22. | |
17 | CALEGARI S, PASI G. Personal ontologies: generation of user profiles based on the YAGO ontology[J]. Information Processing and Management, 2013, 49(3): 640-658. 10.1016/j.ipm.2012.07.010 |
18 | 刘海鸥,孙晶晶,苏妍嫄,等. 国内外用户画像研究综述[J]. 情报理论与实践, 2018, 41(11):155-160. 10.16353/j.cnki.1000-7490.2018.11.028 |
LIU H O, SUN J J, SU Y Q, et al. Literature review of persona at home and abroad[J]. Information Studies: Theory and Application, 2018, 41(11): 155-160. 10.16353/j.cnki.1000-7490.2018.11.028 | |
19 | CHE X Y, CHEN D G, MI J S. Feature distribution-based label correlation in multi-label classification[J]. International Journal of Machine and Cybernetics, 2021, 12(6): 1705-1719. 10.1007/s13042-020-01268-3 |
20 | LIN C Y, LIAW S Y, CHEN C C, et al. A computer-based approach for analyzing consumer demands in electronic word-of-mouth[J]. Electronic Markets, 2017, 27(3): 225-242. 10.1007/s12525-017-0262-5 |
21 | PAN Y H, HUO Y F, TANG J, et al. Exploiting relational tag expansion for dynamic user profile in a tag-aware ranking recommender system[J]. Information Sciences, 2021, 545: 448-464. 10.1016/j.ins.2020.09.001 |
22 | 黄晓斌,张明鑫. 融合多源数据的企业竞争对手画像构建[J]. 现代情报, 2020, 40(11):13-21, 33. 10.3969/j.issn.1008-0821.2020.11.002 |
HUANG X B, ZHANG M X. Construction of enterprise competitor portrait based on multi-source data[J]. Journal of Modern Information, 2020, 40(11): 13-21, 33. 10.3969/j.issn.1008-0821.2020.11.002 | |
23 | AN J S, CHO H, KWAK H, et al. Towards automatic persona generation using social media[C]// Proceedings of the IEEE 4th International Conference on Future Internet of Things and Cloud Workshops. Piscataway: IEEE, 2016:206-211. 10.1109/w-ficloud.2016.51 |
24 | ZHANG X, BROWN H F, SHANKAR A. Data-driven personas: constructing archetypal users with clickstreams and user telemetry[C]// Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2016: 5350-5359. 10.1145/2858036.2858523 |
25 | 车冰倩,周栋. 融合网络结构信息及文本内容的标签推荐方法[J]. 计算机应用, 2021, 41(4):976-983. |
CHE B Q, ZHOU D. Tag recommendation method combining network structure information and text content[J]. Journal of Computer Applications, 2021, 41(4): 976-983. | |
26 | SEVERYN A, MOSCHITTI A. Learning to rank short text pairs with convolutional deep neural networks[C]// Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2015: 373-382. 10.1145/2766462.2767738 |
27 | 邱云飞,张伟竹. 基于网络结构和文本内容的群体画像构建方法研究[J]. 图书情报工作, 2019, 63(22):21-30. |
QIU Y F, ZHANG W Z. Study for the construction method of group profile based on network structure and text content[J]. Library and Information Service, 2019, 63(22): 21-30. | |
28 | MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2013: 3111-3119. |
29 | PENNINGTON J, SOCHER R, MANNING C D. GloVe: global vectors for word representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2014: 1532-1543. 10.3115/v1/d14-1162 |
30 | PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1. Stroudsburg, PA: Association for Computational Linguistics, 2018: 2227-2237. |
31 | DU Q, XIE H R, CAI Y, et al. Folksonomy-based personalized search by hybrid user profiles in multiple levels[J]. Neurocomputing, 2016, 204: 142-152. 10.1016/j.neucom.2015.10.135 |
32 | BORDOLOI M, CHATTERJEE P C, BISWAS S K, et al. Keyword extraction using supervised cumulative TextRank[J]. Multimedia Tools and Applications, 2020, 79(41/42):31467-31496. 10.1007/s11042-020-09335-1 |
33 | NASAR Z, JAFFRY S W, MALIK M K. Textual keyword extraction and summarization: state-of-the-art[J]. Information Processing and Management, 2019, 56(6): No.102088. 10.1016/j.ipm.2019.102088 |
34 | GOUTTE C, GAUSSIER E. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation[C]// Proceedings of the 2005 European Conference on Information Retrieval, LNCS 3408. Berlin: Springer, 2005: 345-359. |
[1] | Xin YANG, Xueni CHEN, Chunjiang WU, Shijie ZHOU. Short-term traffic flow prediction of urban highway based on variant residual model and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2947-2951. |
[2] | Shuai FU, Xiaoying GUO, Ruyi BAI, Tao YAN, Bin CHEN. Age estimation method combining improved CloFormer model and ordinal regression [J]. Journal of Computer Applications, 2024, 44(8): 2372-2380. |
[3] | Tong CHEN, Fengyu YANG, Yu XIONG, Hong YAN, Fuxing QIU. Construction method of voiceprint library based on multi-scale frequency-channel attention fusion [J]. Journal of Computer Applications, 2024, 44(8): 2407-2413. |
[4] | Wudan LONG, Bo PENG, Jie HU, Ying SHEN, Danni DING. Road damage detection algorithm based on enhanced feature extraction [J]. Journal of Computer Applications, 2024, 44(7): 2264-2270. |
[5] | Ruihua LIU, Zihe HAO, Yangyang ZOU. Gait recognition algorithm based on multi-layer refined feature fusion [J]. Journal of Computer Applications, 2024, 44(7): 2250-2257. |
[6] | Zhihao WU, Ziqiu CHI, Ting XIAO, Zhe WANG. Meta-learning adaption for few-shot text-to-speech [J]. Journal of Computer Applications, 2024, 44(5): 1629-1635. |
[7] | Chenhui CUI, Suzhen LIN, Dawei LI, Xiaofei LU, Jie WU. Infrared dim small target tracking method based on Siamese network and Transformer [J]. Journal of Computer Applications, 2024, 44(2): 563-571. |
[8] | Wenjie YAN, Dongyue DANG. Broad quantum state tomography model based on adaptive feature extraction [J]. Journal of Computer Applications, 2024, 44(12): 3861-3866. |
[9] | Yiyang FAN, Yang ZHANG, Shang ZENG, Yu ZENG, Maoli FU. Multivariate long-term series forecasting model based on decomposition and frequency domain feature extraction [J]. Journal of Computer Applications, 2024, 44(11): 3442-3448. |
[10] | Pei ZHAO, Yan QIAO, Rongyao HU, Xinyu YUAN, Minyue LI, Benchu ZHANG. Multivariate time series anomaly detection based on multi-domain feature extraction [J]. Journal of Computer Applications, 2024, 44(11): 3419-3426. |
[11] | Tao LIU, Shihong JU, Yimeng GAO. Small object detection algorithm from drone perspective based on improved YOLOv8n [J]. Journal of Computer Applications, 2024, 44(11): 3603-3609. |
[12] | Xiaoyu HUA, Dongfen LI, You FU, Kejun BI, Shi YING, Ruijin WANG. Industrial chain risk assessment and early warning model combining hierarchical graph neural network and long short-term memory [J]. Journal of Computer Applications, 2024, 44(10): 3223-3231. |
[13] | Yuning ZHANG, Abudukelimu ABULIZI, Tisheng MEI, Chun XU, Maierdana MAIMAITIREYIMU, Halidanmu ABUDUKELIMU, Yutao HOU. Anomaly detection method for skeletal X-ray images based on self-supervised feature extraction [J]. Journal of Computer Applications, 2024, 44(1): 175-181. |
[14] | Mu LI, Yuheng YANG, Xizheng KE. Emotion recognition model based on hybrid-mel gama frequency cross-attention transformer modal [J]. Journal of Computer Applications, 2024, 44(1): 86-93. |
[15] | Yuelin TIAN, Ruizhang HUANG, Lina REN. Scholar fine-grained information extraction method fused with local semantic features [J]. Journal of Computer Applications, 2023, 43(9): 2707-2714. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||