《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (6): 1888-1894.DOI: 10.11772/j.issn.1001-9081.2024060898

• 数据科学与技术 • 上一篇    

科研论文的可比性评估与比较性引文生成方法

李翔宇1, 陈景强1,2()   

  1. 1.南京邮电大学 计算机学院,南京 210023
    2.江苏省大数据安全与智能处理重点实验室(南京邮电大学),南京 210023
  • 收稿日期:2024-06-28 修回日期:2024-09-10 接受日期:2024-09-12 发布日期:2024-09-25 出版日期:2025-06-10
  • 通讯作者: 陈景强
  • 作者简介:李翔宇(2001—),男,山东潍坊人,硕士研究生,主要研究方向:自然语言处理、文本生成
    陈景强(1983—),男,浙江温州人,副教授,博士,主要研究方向:文本摘要、自然语言处理、人工智能。cjq@njupt.edu.cn
  • 基金资助:
    国家自然科学基金青年科学基金资助项目(62102192)

Comparability assessment and comparative citation generation method for scientific papers

Xiangyu LI1, Jingqiang CHEN1,2()   

  1. 1.School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing Jiangsu 210023,China
    2.Jiangsu Key Laboratory of Big Data Security and Intelligent Processing (Nanjing University of Posts and Telecommunications),Nanjing Jiangsu 210023,China
  • Received:2024-06-28 Revised:2024-09-10 Accepted:2024-09-12 Online:2024-09-25 Published:2025-06-10
  • Contact: Jingqiang CHEN
  • About author:LI Xiangyu, born in 2001, M. S. candidate. His research interests include natural language processing, text generation.
    CHEN Jingqiang, born in 1983, Ph. D., associate professor. His research interests include text summarization, natural language processing, artificial intelligence.
  • Supported by:
    Young Scientists Fund of National Natural Science Foundation of China(62102192)

摘要:

针对比较性引文生成中面临的两大挑战——准确判定论文间的可比性及生成具有比较性的句子,提出科研论文的可比性评估(CA)与比较性引文生成方法SciCACG(Scientific Comparability Assessment and Citation Generation)。该方法构建了3个核心模块:用于判断2篇论文是否具备可比性的CA模块、负责从论文与参考文献中抽取出具体的比较对象的比较对象抽取(CE)模块和用于生成相应的比较性引用句子的比较引文生成模块。首先,利用SciBERT (Scientific BERT)模型处理输入的2篇文章,并通过CA模块进行可比性的评估;其次,对于被判定为可比的文章,采用CE模块识别并抽取出关键的比较对象;最后,使用比较引文生成模块生成包含这些比较对象的比较性引文。实验结果显示,在CA阶段,所提方法在平均倒数排名(MRR)上达到了0.532,在召回率@10(R@10)上达到了0.731,较之前的SciBERT-FNN(Scientific Bidirectional Encoder Representations from Transformers-Feedforward Neural Network)方法在各个数据集上均有提升;在比较性引文生成中,相较于次优的BART-Large(Bidirectional and Auto-Regressive Transformers-Large)方法,所提方法的ROUGE(Recall-Oriented Understudy for Gisting Evaluation)-1、ROUGE-2和ROUGE-L的F1分数分别提高了1.90、1.29和2.55个百分点。此外,实验结果验证了科学文献自动化比较与分析技术对引文句子生成任务具有重要意义,特别是在提高比较信息的可追溯性和确保引用句子信息的全面性方面,展现出极大的实用价值。

关键词: 比较性引文, 可比性评估, 引文生成, 文本生成, 文本分类, 比较对象抽取

Abstract:

To address the two major challenges in comparative citation generation — determining the comparability between papers accurately and generating comparative sentences, a Comparability Assessment (CA) and comparative citation generation method for scientific papers, named SciCACG(Scientific Comparability Assessment and Citation Generation), was proposed. Three core modules were constructed in the proposed method: a CA module, which was used to determine whether two papers were comparable; a Comparison object Extraction (CE) module, which was employed to extract specific comparison objects from the papers and references, and a comparative citation generation module, which was responsible for generating the corresponding comparative citation sentences. Firstly, the SciBERT (Scientific BERT) model was used to process the two input papers, and the comparability was assessed through the CA module. Then, for papers determined to be comparable, the CE module was used to identify and extract key comparison objects. Finally, the comparative citation generation module was utilized to generate comparative citations containing these objects. Experimental results show that in the CA stage, the proposed method achieves 0.532 in Mean Reciprocal Rank (MRR) and 0.731 in Recall@10 (R@10), and outperforms the previous SciBERT-FNN (Scientific Bidirectional Encoder Representations from Transformers-Feedforward Neural Network) method on all the datasets; in the comparative citation generation, Compared to the suboptimal BART-Large (Bidirectional and Auto-Progressive Transformers-Large) method, the F1 scores of ROUGE (Recall-Oriented Understudy for Gisting Evaluation)-1, ROUGE-2, and ROUGE-L in the proposed method have increased by 1.90, 1.29, and 2.55 percentage points, respectively. Additionally, the results confirm that the technologies of automated comparison and analysis of scientific literature are crucial for citation sentence generation tasks; particularly, in enhancing the traceability of comparative information and ensuring the comprehensiveness of citation sentences, these technologies demonstrate substantial practical value.

Key words: comparative citation, Comparability Assessment (CA), citation generation, text generation, text classification, Comparison object Extraction (CE)

中图分类号: