Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (4): 1072-1079.DOI: 10.11772/j.issn.1001-9081.2023040532

• Artificial intelligence • Previous Articles    

Technology term recognition with comprehensive constituency parsing

Junjie ZHU1, Li YU2, Shengwen LI1(), Changzheng ZHOU3   

  1. 1.School of Computer Science,China University of Geosciences,Wuhan Hubei 430078,China
    2.Center for Strategic Research on Frontier and Interdisciplinary Engineering Science and Technology,(Beijing Institute of Technology),Beijing 100081,China
    3.Shiyan Juneng Electric Power Design Company Limited,Shiyan Hubei 442012,China
  • Received:2023-05-04 Revised:2023-10-12 Accepted:2023-10-12 Online:2024-04-22 Published:2024-04-10
  • Contact: Shengwen LI
  • About author:ZHU Junjie, born in 1999, M. S. candidate. His research interests include named entity recognition.
    YU Li, born in 1986, Ph. D., associate research fellow. Her research interests include technology forecast, knowledge graph.
    LI Shengwen, born in 1978, Ph. D., associate professor. His research interests include natural language processing, knowledge graph.
    ZHOU Changzheng, born in 1978, M. S., senior engineer. His research interests include big data processing.
  • Supported by:
    National Natural Science Foundation of China(42071382)

综合成分句法分析的技术名称识别

朱俊杰1, 余丽2, 李圣文1(), 周长征3   

  1. 1.中国地质大学(武汉) 计算机学院, 武汉 430078
    2.中国工程科技前沿交叉战略研究中心 (北京理工大学), 北京 100081
    3.十堰巨能电力设计有限公司, 湖北 十堰 442012
  • 通讯作者: 李圣文
  • 作者简介:朱俊杰(1999—),男,湖北武汉人,硕士研究生,主要研究方向:命名实体识别
    余丽(1986—),女,湖北保康人,副研究员,博士,主要研究方向:技术预见、知识图谱
    李圣文(1978—),男,山东济宁人,副教授,博士,CCF会员,主要研究方向:自然语言处理、知识图谱 swli@cug.edu.cn
    周长征(1978—),男,山东济宁人,高级工程师,硕士,主要研究方向:大数据处理。

Abstract:

Technology terms are used to communicate information accurately in the field of science and technology. Automatically recognizing technology terms from text can help experts and the public to discover, recognize, and apply new technologies, which is great of value, but unsupervised technology term recognition methods still have some limitations, such as complex rules and poor adaptability. To enhance the ability to recognize technology terms from text, an unsupervised technology term recognition method was proposed. Firstly, a syntactic structure tree was constructed through constituency parsing. Then, the candidate technology terms were extracted from both top-down and bottom-up perspectives. Finally, the statistical frequency and semantic information were combined to determine the most appropriate technology terms. Besides, a technology term dataset was constructed to validate the effectiveness of the proposed method. Experimental results on the proposed dataset show that the proposed method with top-down extraction has the F1 score improved by 4.55 percentage points compared to the dependency-based method. Meanwhile, the analysis results conducted on case study in the field of 3D printing show that the recognized technology terms by the proposed method are in line with the development of the field, which can be used to trace the development process of technology and depict the evolution path of technology, so as to provide references for understanding, discovering, and exploring future technologies of the field.

Key words: technology term recognition, constituency parsing, unsupervised method, constituency parsing tree, term extraction

摘要:

技术名称是科技领域中用于准确交流信息的术语,自动识别技术名称可以帮助专家和大众发现、认知、应用新技术,具有重要价值;而基于无监督的方法在识别技术名称时存在规则复杂、适应性差等问题。为了提升从文本中识别技术名称的能力,提出一种综合成分句法的技术名称识别方法。首先,通过成分句法分析构造句法结构树;其次,从自上而下和自下而上这两个角度抽取候选技术名称;最后,融合统计频次和语义信息,以选取最优技术名称。此外,构建一个技术术语数据集以验证所提方法的有效性。在该数据集上的实验结果表明,相较于基于依存关系的方法,所提基于自下而上的方法的F1值提高了4.55个百分点;同时在3D打印领域进行了案例分析,发现所提方法识别的技术名称与该名称对应领域的发展契合,可用于回溯技术的发展历程和描绘技术的演化路径,为理解、发现、探索领域未来技术提供参考。

关键词: 技术名称识别, 成分句法分析, 无监督方法, 成分句法树, 术语抽取

CLC Number: