《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (6): 1768-1778.DOI: 10.11772/j.issn.1001-9081.2022060944

• CCF第37届中国计算机应用大会 (CCF NCCA 2022) • 上一篇    下一篇

面向业务需求的算法路径自组配模型

刘耀1(), 童昕2, 陈一风2   

  1. 1.中国科学技术信息研究所 信息技术支持中心,北京 100038
    2.北京大学 软件与微电子学院,北京 102600
  • 收稿日期:2022-06-29 修回日期:2022-08-27 接受日期:2022-08-31 发布日期:2022-09-22 出版日期:2023-06-10
  • 通讯作者: 刘耀
  • 作者简介:刘耀(1972—),男,山东菏泽人,研究员,博士,CCF杰出会员,主要研究方向:自然语言处理、知识工程Email:liuy@istic.ac.cn
    童昕(1993—),女,湖北荆州人,硕士,主要研究方向:自然语言处理、知识工程
    陈一风(1995—),女,山西吕梁人,硕士研究生,主要研究方向:自然语言处理、知识工程。
  • 基金资助:
    国家社会科学基金资助项目(21BTQ011)

Algorithm path self-assembling model for business requirements

Yao LIU1(), Xin TONG2, Yifeng CHEN2   

  1. 1.Information Technology Support Center,Institute of Scientific and Technical Information of China,Beijing 100038,China
    2.School of Software and Microelectronics,Peking University,Beijing 102600,China
  • Received:2022-06-29 Revised:2022-08-27 Accepted:2022-08-31 Online:2022-09-22 Published:2023-06-10
  • Contact: Yao LIU
  • About author:TONG Xin, born in 1993, M. S. Her research interests include natural language processing, knowledge engineering.
    CHEN Yifeng, born in 1995, M. S. candidate. Her research interests include natural language processing, knowledge engineering.
  • Supported by:
    National Social Science Foundation of China(21BTQ011)

摘要:

算法平台作为自动机器学习的实现方式近年来受到广泛关注,然而这些平台的业务流程均需要人工搭建,且这些平台存在模型调用不灵活以及无法针对特定业务定制化的自动算法构建的问题。针对这些问题,提出了一种面向业务需求的算法路径自组配模型。首先,基于图卷积网络(GCN)与word2vec表示对代码的序列特征与结构特征同时建模;然后,进一步通过聚类模型发现算法集合中的功能,并基于得到的功能子集为子集间算法组件的路径发现作准备;最后,基于先验知识训练得到关系发现模型与排序模型,挖掘候选代码组件的自组织路径,从而实现算法代码自组配。使用所提评价指标进行对比分析,所提模型的最好结果为0.8,而Okapi BM25+word2vec基线模型的最好结果为0.21。所提模型在一定程度上解决了传统代码表示方法中代码结构与语义信息缺失的问题,并为精细化算法流程自组织和算法管道自动构建的研究奠定了基础。

关键词: 自然语言处理, 排序学习, 代码解析, 代码资源结构化, 代码表示

Abstract:

The algorithm platform, as the implementation way of automatic machine learning, has attracted the wide attention in recent years. However, the business processes of these platforms need to be built manually, and these platforms are faced with inflexible model calling and the incapability of customized automatic algorithm construction for specific business requirements. To address these problems, an algorithm path self-assembling model for business requirements was proposed. Firstly, the sequence features and structural features of code were modeled simultaneously based on Graph Convolutional Network (GCN) and word2vec representation. Secondly, functions in the algorithm set were further discovered through a clustering model, and the obtained function subsets were used for the preparation of the path discovery of algorithm components between subsets. Finally, based on the relationship discovery model and ranking model trained with prior knowledge, the self-assembled paths of candidate code components were mined, thus realizing the algorithm code self-assembling. Using the proposed evaluation indicators for comparison and analysis, the best result of the proposed algorithm path self-assembling model is 0.8, while that of the baseline model Okapi BM25+word2vec is 0.21. To a certain extent, the proposed model solves the problem of missing code structure and semantic information in traditional code representation methods and lays the foundation for the research of refinement of algorithm process self-assembling and automatic construction of algorithm pipelines.

Key words: Natural Language Processing (NLP), learning to rank, code parsing, code resource structuring, code representation

中图分类号: