《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (6): 1699-1705.DOI: 10.11772/j.issn.1001-9081.2023060850

• CCF第38届中国计算机应用大会 (CCF NCCA 2023) • 上一篇    

基于业务流程的认知图谱

刘耀1(), 李雨萌2, 宋苗苗2   

  1. 1.中国科学技术信息研究所 信息技术支持中心,北京 100038
    2.北京大学 软件与微电子学院,北京 102600
  • 收稿日期:2023-06-29 修回日期:2023-08-30 接受日期:2023-08-31 发布日期:2023-10-09 出版日期:2024-06-10
  • 通讯作者: 刘耀
  • 作者简介:李雨萌(1994—),女,河南新乡人,硕士,主要研究方向:自然语言处理
    宋苗苗(1999—),女,陕西汉中人,硕士研究生,CCF会员,主要研究方向:自然语言处理。
  • 基金资助:
    国家社会科学基金资助项目(21BTQ011)

Cognitive graph based on business process

Yao LIU1(), Yumeng LI2, Miaomiao SONG2   

  1. 1.Information Technology Support Center,Institute of Scientific and Technical Information of China,Beijing 100038,China
    2.School of Software & Microelectronics,Peking University,Beijing 102600,China
  • Received:2023-06-29 Revised:2023-08-30 Accepted:2023-08-31 Online:2023-10-09 Published:2024-06-10
  • Contact: Yao LIU
  • About author:LI Yumeng, born in 1994, M. S. Her research interests include natural language processing.
    SONG Miaomiao, born in 1999, M. S. candidate. Her research interests include natural language processing.
  • Supported by:
    National Social Science Foundation of China(21BTQ011)

摘要:

针对目前软件项目开发过程中无法充分利用已有业务资源,进而导致开发效率低、能力弱等问题,通过研究业务资源之间的关联,提出一种基于业务流程的认知图谱。首先,通过正式文档抽取业务知识,提出建立知识层级的方法并修正;其次,通过代码特征挖掘与代码实体相似度判断构建代码网络表示模型;最后,利用实际业务数据进行实验验证,并与向量空间模型(VSM)、多样化排序和深度学习等方法进行对比。最终构建的基于业务流程的认知图谱在代码检索方面优于目前基于文本匹配的方法和深度学习算法,分别在前5准确率(precision@5)、平均精度均值(mAP)、归一化折扣增益值(?-NDCG)这3项指标上高过多样化排序的代码检索方法4.30、0.38和2.74个百分点,有效解决了潜在业务词汇识别、业务认知推理表示等多个问题,提升了代码检索效果与业务资源利用率。

关键词: 认知图谱, 业务知识, 网络表示模型, 自然语言处理, 软件开发过程

Abstract:

Concerning the inability to make full use of existing business resources in the current software project development process, which leads to low development efficiency and weak capabilities, a cognitive graph based on software development process was proposed by studying the interrelations among business resources. First, a method for building knowledge hierarchy by extracting business knowledge from formal documents was developed and corrected. Second, a network representation model for software codes was constructed through code feature extraction and code entity similarity investigation. Finally, the model was tested using real business data and was compared with three other methods: Vector Space Model (VSM), diverse ranking method and deep learning. Experimental results show that the established cognitive graph method based on business process is superior to current text matching and deep learning algorithms in code retrieval; the cognitive graph method improves precision@5, mean Average Precision (mAP) and Normalized Discounted Cumulative Gain (?-NDCG) by 4.30, 0.38 and 2.74 percentage points respectively compared with ranking-based code search effectively method, solving many problems such as potential business vocabulary identification and business cognitive reasoning representation, and improving the code retrieval effect and business resource utilization.

Key words: cognitive graph, business knowledge, network representation model, natural language processing, software development process

中图分类号: