《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (8): 2386-2393.DOI: 10.11772/j.issn.1001-9081.2021060924

• 人工智能 • 上一篇    

领域相关的数学文本语义抽取

陈肖宇1,2,3, 王伟1,2()   

  1. 1.北京航空航天大学 数学科学学院, 北京 100191
    2.数学、信息与行为教育部重点实验室(北京航空航天大学), 北京 100191
    3.北京航空航天大学 大数据科学与脑机智能高精尖创新中心, 北京 100191
  • 收稿日期:2021-06-02 修回日期:2021-07-27 接受日期:2021-08-06 发布日期:2022-08-09 出版日期:2022-08-10
  • 通讯作者: 王伟
  • 作者简介:陈肖宇(1982—),男,辽宁沈阳人,讲师,博士,CCF会员,主要研究方向:知识表示、机器学习;
    王伟(1997—),男,安徽宿州人,硕士研究生,主要研究方向:数学知识管理、自然语言处理。

Semantic extraction of domain-dependent mathematical text

Xiaoyu CHEN1,2,3, Wei WANG1,2()   

  1. 1.School of Mathematical Sciences,Beihang University,Beijing 100191,China
    2.Key Laboratory of Mathematics,Informatics and Behavioral Semantics,Ministry of Education (Beihang University),Beijing 100191,China
    3.Beijing Advanced Innovation Center for Big Data and Brain Computing,Beihang University,Beijing 100191,China
  • Received:2021-06-02 Revised:2021-07-27 Accepted:2021-08-06 Online:2022-08-09 Published:2022-08-10
  • Contact: Wei WANG
  • About author:CHEN Xiaoyu, born in 1982, Ph. D., lecturer. His research interests include knowledge representation, machine learning.
    WANG Wei, born in 1997, M. S. candidate. His research interests include mathematical knowledge management, natural language processing.

摘要:

针对科技领域文档语义信息获取不充分的问题,提出一套基于规则的数学领域相关文本的语义抽取方法。首先从文本中提取领域概念并实现数学实体与领域概念之间的语义映射;然后对数学符号的上下文进行分析,获取数学符号的实体指代或文字描述,进而抽取其语义;最后基于已抽取的数学符号语义实现表达式的语义分析。以线性代数文本为研究实例,构建了一个语义标注数据集并进行实验,实验结果表明所提方法对标识符、线性代数实体以及表达式的语义抽取具有93%以上的精确率和91%以上的召回率。

关键词: 语义抽取, 实体指代, 上下文分析, 数学语言处理, 数学文本理解

Abstract:

Aiming at the problem of insufficient acquisition of document semantic information in the field of science and technology,a set of rule-based methods for extracting semantics from domain-dependent mathematical text were proposed. Firstly, domain concepts were extracted from the text and semantic mapping between mathematical entities and domain concepts were realized. Secondly, through context analysis for mathematical symbols, entity mentions or corresponding text descriptions of mathematical symbols were obtained and the semantics of the symbols were extracted. Finally, the semantic analysis of expressions was completed based on the extracted semantics of mathematical symbols. Taking linear algebra texts as research examples, a semantic tagging dataset was constructed for experiments. Experimental results show that the proposed methods achieve a precision higher than 93% and a recall higher than 91% on semantic extraction of identifiers, linear algebra entities and expressions.

Key words: semantic extraction, entity mention, context analysis, mathematical language processing, mathematical text understanding

中图分类号: