《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (4): 1170-1177.DOI: 10.11772/j.issn.1001-9081.2021071248

• CCF第36届中国计算机应用大会 (CCF NCCA 2021) • 上一篇    

基于标签分层延深建模的企业画像构建方法

丁行硕1, 李翔1(), 谢乾2,3   

  1. 1.淮阴工学院 计算机与软件工程学院, 江苏 淮安 223003
    2.江苏卓易信息科技股份有限公司, 江苏 宜兴 214200
    3.南京百敖软件有限公司, 南京 210032
  • 收稿日期:2021-07-16 修回日期:2021-09-01 接受日期:2021-09-14 发布日期:2021-09-18 出版日期:2022-04-10
  • 通讯作者: 李翔
  • 作者简介:丁行硕(1996—),男,山东枣庄人,硕士研究生,主要研究方向:数据挖掘、企业画像
    谢乾(1976—),男,江苏无锡人,教授,硕士,主要研究方向:云计算、大数据。
  • 基金资助:
    国家自然科学基金资助项目(71874067);江苏省产学研合作项目(BY2020067);淮阴工学院研究生科技创新计划项目(HGYK202120)

Enterprise portrait construction method based on label layering and deepening modeling

Xingshuo DING1, Xiang LI1(), Qian XIE2,3   

  1. 1.Faculty of Computer and Software,Huaiyin Institute of Technology,Huaian Jiangsu 223003,China
    2.Jiangsu Eazytec Company Limited,Yixing Jiangsu 214200,China
    3.Nanjing Byosoft Company Limited,Nanjing Jiangsu 210032,China
  • Received:2021-07-16 Revised:2021-09-01 Accepted:2021-09-14 Online:2021-09-18 Published:2022-04-10
  • Contact: Xiang LI
  • About author:DING Xingshuo, born in 1996, M. S. candidate. His research interests include data mining, enterprise portrait.
    XIE Qian, born in 1976, M. S., professor. His research interests include cloud computing, big data.
  • Supported by:
    National Natural Science Foundation of China(71874067);Jiangsu Industry University Research Cooperation Project(BY2020067);Huaiyin Institute of Technology Graduate Science and Technology Innovation Program(HGYK202120)

摘要:

标签建模是标签体系建设和画像构建的基本任务。而传统标签建模方法存在模糊标签处理难、标签提取不合理,以及无法有效融合多模实体和多维关系等问题。针对以上问题提出了一种基于标签分层延深建模的企业画像构建方法EPLLD。首先,通过多源信息融合获取多特征信息,并对企业模糊标签(如批发、零售等行业中的不能完整概括企业特点的标签)进行统计和筛选;然后,建立专业领域词库进行特征拓展,并结合BERT语言模型进行多特征提取;其次,利用双向长短期记忆(BiLSTM)网络获取模糊标签延深结果;最后,通过TF-IDF、TextRank、隐含狄利克雷分布(LDA)模型提取关键词,从而实现标签的分层延深建模。在同一企业数据集上进行实验分析,结果表明在模糊标签延深任务中EPLLD的精确率达到91.11%,高于BiLSTM+Attention、BERT+Deep CNN等8种标签处理方法。

关键词: 企业画像, 标签建模, 多源信息融合, 模糊标签, 特征提取

Abstract:

Label modeling is the basic task of label system construction and portrait construction. Traditional label modeling methods have problems such as difficulty in processing fuzzy labels, unreasonable label extraction, and ineffective integration of multi-modal entities and multi-dimensional relationships. Aiming at these problems, an enterprise profile construction method based on label layering and deepening modeling, called EPLLD (Enterprise Portrait of Label Layering and Deepening), was proposed. Firstly, the multi-characteristic information was extracted through multi-source information fusion, and the fuzzy labels of enterprises (such as labels in wholesale and retail industries that cannot fully summarize the characteristics of enterprises) were counted and screened. Secondly, the professional domain lexicon was established for feature expansion, and the BERT (Bidirectional Encoder Representation from Transformers) language model was combined for multi-feature extraction. Thirdly, Bi-directional Long Short-Term Memory (BiLSTM) was used to obtain fuzzy label deepening results. Finally, the keywords were extracted through TF-IDF (Term Frequency-Inverse Document Frequency), TextRank, and Latent Dirichlet Allocation (LDA) model to achieve label layering and deepening modeling. Experimental analysis on the same enterprise dataset shows that the precision of EPLLD in the fuzzy label deepening task is 91.11%, which is higher than those of 8 label processing methods such as BiLSTM+Attention and BERT+Deep CNN.

Key words: enterprise portrait, label modeling, multi-source information fusion, fuzzy label, feature extraction

中图分类号: