Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (6): 1888-1894.DOI: 10.11772/j.issn.1001-9081.2024060898
• Data science and technology • Previous Articles
Xiangyu LI1, Jingqiang CHEN1,2()
Received:
2024-06-28
Revised:
2024-09-10
Accepted:
2024-09-12
Online:
2024-09-25
Published:
2025-06-10
Contact:
Jingqiang CHEN
About author:
LI Xiangyu, born in 2001, M. S. candidate. His research interests include natural language processing, text generation.Supported by:
通讯作者:
陈景强
作者简介:
李翔宇(2001—),男,山东潍坊人,硕士研究生,主要研究方向:自然语言处理、文本生成基金资助:
CLC Number:
Xiangyu LI, Jingqiang CHEN. Comparability assessment and comparative citation generation method for scientific papers[J]. Journal of Computer Applications, 2025, 45(6): 1888-1894.
李翔宇, 陈景强. 科研论文的可比性评估与比较性引文生成方法[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1888-1894.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2024060898
实验名称 | 准确率 | 召回率 | F1分数 |
---|---|---|---|
10倍交叉验证 | 93.24 | 93.24 | 94.32 |
外部测试集 | 90.87 | 91.02 | 90.87 |
Tab. 1 Experimental results of 10-fold cross-validation and external test
实验名称 | 准确率 | 召回率 | F1分数 |
---|---|---|---|
10倍交叉验证 | 93.24 | 93.24 | 94.32 |
外部测试集 | 90.87 | 91.02 | 90.87 |
标签 | 训练集样本数 | 验证集样本数 | 测试集样本数 |
---|---|---|---|
可比 | 27 416 | 1 403 | 1 523 |
不可比 | 82 306 | 4 396 | 4 536 |
Tab. 2 Sample distribution of training, validation and test set for CA dataset
标签 | 训练集样本数 | 验证集样本数 | 测试集样本数 |
---|---|---|---|
可比 | 27 416 | 1 403 | 1 523 |
不可比 | 82 306 | 4 396 | 4 536 |
方法 | ACL‑200 | FullTextPeerRead | CA | |||
---|---|---|---|---|---|---|
MRR | R@10 | MRR | R@10 | MRR | R@10 | |
DualCon | 0.335 | 0.647 | ||||
DualEnh | 0.366 | 0.703 | ||||
BERT-FNN | 0.482 | 0.736 | 0.458 | 0.706 | 0.508 | 0.716 |
SciBERT-FNN | 0.531 | 0.779 | 0.536 | 0.773 | 0.521 | 0.724 |
SciBERT-CNN | 0.541 | 0.781 | 0.539 | 0.780 | 0.525 | 0.726 |
SciBERT-LSTM | 0.545 | 0.785 | 0.542 | 0.781 | 0.525 | 0.728 |
SciCACG | 0.552 | 0.787 | 0.545 | 0.783 | 0.532 | 0.731 |
Tab. 3 Results of CA experiments
方法 | ACL‑200 | FullTextPeerRead | CA | |||
---|---|---|---|---|---|---|
MRR | R@10 | MRR | R@10 | MRR | R@10 | |
DualCon | 0.335 | 0.647 | ||||
DualEnh | 0.366 | 0.703 | ||||
BERT-FNN | 0.482 | 0.736 | 0.458 | 0.706 | 0.508 | 0.716 |
SciBERT-FNN | 0.531 | 0.779 | 0.536 | 0.773 | 0.521 | 0.724 |
SciBERT-CNN | 0.541 | 0.781 | 0.539 | 0.780 | 0.525 | 0.726 |
SciBERT-LSTM | 0.545 | 0.785 | 0.542 | 0.781 | 0.525 | 0.728 |
SciCACG | 0.552 | 0.787 | 0.545 | 0.783 | 0.532 | 0.731 |
方法 | F1分数 | ||
---|---|---|---|
R‑1 | R‑2 | R‑L | |
EXT-Oracle | 22.21 | 4.96 | 16.84 |
PTGEN | 24.60 | 6.16 | 19.19 |
PTGEN-Cross | 27.08 | 7.14 | 20.61 |
BART-Large | 29.62 | 9.86 | 24.51 |
SciCACG | 31.52 | 11.15 | 27.06 |
Tab. 4 Results of comparative citation generation experiments
方法 | F1分数 | ||
---|---|---|---|
R‑1 | R‑2 | R‑L | |
EXT-Oracle | 22.21 | 4.96 | 16.84 |
PTGEN | 24.60 | 6.16 | 19.19 |
PTGEN-Cross | 27.08 | 7.14 | 20.61 |
BART-Large | 29.62 | 9.86 | 24.51 |
SciCACG | 31.52 | 11.15 | 27.06 |
模型 | F1分数 | ||
---|---|---|---|
R-1 | R-2 | R-L | |
SciCACG | 31.52 | 11.15 | 27.06 |
-w/o CA | 30.92 | 10.70 | 26.71 |
-w/o CE | 30.79 | 11.09 | 26.68 |
Tab. 5 Ablation experiment results
模型 | F1分数 | ||
---|---|---|---|
R-1 | R-2 | R-L | |
SciCACG | 31.52 | 11.15 | 27.06 |
-w/o CA | 30.92 | 10.70 | 26.71 |
-w/o CE | 30.79 | 11.09 | 26.68 |
方法 | 流畅性 | 相关性 | 连贯性 | 总体质量 |
---|---|---|---|---|
原引文句子 | 4.76 | 4.55 | 4.82 | 4.69 |
BART-large | 3.70 | 3.37 | 2.80 | 3.09 |
SciCACG | 3.68 | 3.46 | 2.94 | 3.14 |
Tab. 6 Human evaluation results
方法 | 流畅性 | 相关性 | 连贯性 | 总体质量 |
---|---|---|---|---|
原引文句子 | 4.76 | 4.55 | 4.82 | 4.69 |
BART-large | 3.70 | 3.37 | 2.80 | 3.09 |
SciCACG | 3.68 | 3.46 | 2.94 | 3.14 |
1 | BORNMANN L, MUTZ R. Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references[J]. Journal of The Association for Information Science and Technology, 2015, 66(11): 2215-2222. |
2 | TEUFEL S, SIDDHARTHAN A, TIDHAR D. Automatic classification of citation function[C]// Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2006: 103-110. |
3 | XING X, FAN X, WAN X. Automatic generation of citation texts in scholarly papers: a pilot study[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 6181-6190. |
4 | LUHN H P. The automatic creation of literature abstracts[J]. IBM Journal of Research and Development, 1958, 2(2): 159-165. |
5 | EDMUNDSON H P. New methods in automatic extracting[J]. Journal of the ACM, 1969, 16(2): 264-285. |
6 | QAZVINIAN V, RADEV D R. Scientific paper summarization using citation summary networks[C]// Proceedings of the 22nd International Conference on Computational Linguistics. [S.l.]: Coling 2008 Organizing Committee, 2008: 689-696. |
7 | MEI Q, ZHAI C. Generating impact-based summaries for scientific literature[C]// Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2008: 816-824. |
8 | McNEE S M, ALBERT I, COSLEY D, et al. On the recommending of citations for research papers[C]// Proceedings of the 2002 ACM Conference on Computer Supported Cooperative Work. New York: ACM, 2002: 116-125. |
9 | BHAGAVATULA C, FELDMAN S, POWER R, et al. Content-based citation recommendation[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). Stroudsburg: ACL, 2018: 238-251. |
10 | MEDIĆ Z, ŠNAJDER J. Improved local citation recommendation based on context enhanced with global information[C]// Proceedings of the 1st Workshop on Scholarly Document Processing. Stroudsburg: ACL, 2020: 97-103. |
11 | GU N, GAO Y, HAHNLOSER R H R. Local citation recommendation with hierarchical-attention text encoder and SciBERT-based reranking [C]// Proceedings of the 2022 European Conference on Information Retrieval, LNCS 13185. Cham: Springer, 2022: 274-288. |
12 | GE Y, DINH L, LIU X, et al. BACO: a background knowledge-and content-based framework for citing sentence generation[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Stroudsburg: ACL, 2021: 1466-1478. |
13 | BELTAGY I, LO K, COHAN A. SciBERT: a pretrained language model for scientific text [C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg: ACL, 2019: 3615-3620. |
14 | COHAN A, GOHARIAN N. Scientific article summarization using citation-context and article’s discourse structure[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2015: 390-400. |
15 | YASUNAGA M, KASAI J, ZHANG R, et al. ScisummNet: a large annotated corpus and content-impact models for scientific paper summarization with citation networks[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2019: 7386-7393. |
16 | JURGENS D, KUMAR S, HOOVER R, et al. Measuring the evolution of a scientific field through citation frames[J]. Transactions of the Association for Computational Linguistics, 2018, 6: 391-406. |
17 | COHAN A, AMMAR W, VAN ZUYLEN M, et al. Structural scaffolds for citation intent classification in scientific publications[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long and Short Papers). Stroudsburg: ACL, 2019: 3586-3596. |
18 | 王心玥,赵丹群. 引文情感识别研究进展及评述[J]. 情报理论与实践, 2024,47(1): 173-181, 189. |
WANG X Y, ZHAO D Q. Review on progress of citation sentiment identification[J]. Information Studies: Theory and Application, 2024, 47(1): 173-181, 189. | |
19 | 廖君华,刘自强,白如江,等. 基于引文内容分析的引用情感识别研究[J]. 图书情报工作, 2018, 62(15): 112-121. |
LIAO J H, LIU Z Q, BAI R J, et al. Citation sentiment recognition method based on citation content analysis[J]. Library and Information Service, 2018, 62(15): 112-121. | |
20 | SEE A, LIU P J, MANNING C D. Get to the point: summarization with pointer-generator networks[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2017: 1073-1083. |
21 | LUU K, WU X, KONCEL-KEDZIORSKI R, et al. Explaining relationships between scientific documents [C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Stroudsburg: ACL, 2021: 2130-2144. |
22 | RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners [EB/OL]. [2024-02-16].. |
23 | CHEN J, CAI C, JIANG X, et al. Comparative graph-based summarization of scientific papers guided by comparative citations[C]// Proceedings of the 29th International Conference on Computational Linguistics. Stroudsburg: ACL, 2022: 5978-5988. |
24 | GU N, HAHNLOSER R H R. Controllable citation text generation[EB/OL]. [2024-02-16].. |
25 | LEWIS M, LIU Y, GOYAL N, et al. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 7871-7880. |
26 | ZHAO H, LUO Z, FENG C, et al. A context-based framework for modeling the role and function of on-line resource citations in scientific literature[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg: ACL, 2019: 5206-5215. |
27 | KINGMA D P, BA J L. Adam: a method for stochastic optimization[EB/OL]. [2024-02-16].. |
28 | LIN C Y. ROUGE: a package for automatic evaluation of summaries[C]// Proceedings of the ACL-04 Workshop: Text Summarization Branches Out. Stroudsburg: ACL, 2004: 74-81. |
29 | PARIKH A P, TÄCKSTRÖM O, DAS D, et al. A decomposable attention model for natural language inference[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2016: 2249-2255. |
30 | CHEN Q, ZHU X, LING Z H, et al. Enhanced LSTM for natural language inference[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2017: 1657-1668. |
31 | JEONG C, JANG S, PARK E, et al. A context-aware citation recommendation model with BERT and graph convolutional networks [J]. Scientometrics, 2020, 124(3): 1907-1922. |
[1] | Mingfeng YU, Yongbin QIN, Ruizhang HUANG, Yanping CHEN, Chuan LIN. Multi-label text classification method based on contrastive learning enhanced dual-attention mechanism [J]. Journal of Computer Applications, 2025, 45(6): 1732-1740. |
[2] | Jiaxin LI, Site MO. Power work order classification in substation area based on MiniRBT-LSTM-GAT and label smoothing [J]. Journal of Computer Applications, 2025, 45(4): 1356-1362. |
[3] | Haitao SUN, Jiayu LIN, Zuhong LIANG, Jie GUO. Data augmentation technique incorporating label confusion for Chinese text classification [J]. Journal of Computer Applications, 2025, 45(4): 1113-1119. |
[4] | Qi SHUAI, Hairui WANG, Guifu ZHU. Chinese story ending generation model based on bidirectional contrastive training [J]. Journal of Computer Applications, 2024, 44(9): 2683-2688. |
[5] | Chenyang LI, Long ZHANG, Qiusheng ZHENG, Shaohua QIAN. Multivariate controllable text generation based on diffusion sequences [J]. Journal of Computer Applications, 2024, 44(8): 2414-2420. |
[6] | Xun YAO, Zhongzheng QIN, Jie YANG. Generative label adversarial text classification model [J]. Journal of Computer Applications, 2024, 44(6): 1781-1785. |
[7] | Xinyan YU, Cheng ZENG, Qian WANG, Peng HE, Xiaoyu DING. Few-shot news topic classification method based on knowledge enhancement and prompt learning [J]. Journal of Computer Applications, 2024, 44(6): 1767-1774. |
[8] | Hang YU, Yanling ZHOU, Mengxin ZHAI, Han LIU. Text classification based on pre-training model and label fusion [J]. Journal of Computer Applications, 2024, 44(3): 709-714. |
[9] | Jiawei ZHANG, Guandong GAO, Ke XIAO, Shengzun SONG. Violent crime hierarchy algorithm by joint modeling of improved hierarchical attention network and TextCNN [J]. Journal of Computer Applications, 2024, 44(2): 403-410. |
[10] | Kaitian WANG, Qing YE, Chunlei CHENG. Classification method for traditional Chinese medicine electronic medical records based on heterogeneous graph representation [J]. Journal of Computer Applications, 2024, 44(2): 411-417. |
[11] | Bihui YU, Xingye CAI, Jingxuan WEI. Few-shot text classification method based on prompt learning [J]. Journal of Computer Applications, 2023, 43(9): 2735-2740. |
[12] | Yumeng CUI, Jingya WANG, Xiaowen LIU, Shangyi YAN, Zhizhong TAO. General text classification model combining attention and cropping mechanism [J]. Journal of Computer Applications, 2023, 43(8): 2396-2405. |
[13] | Senqi YANG, Xuliang DUAN, Zhan XIAO, Songsong LANG, Zhiyong LI. Text classification of agricultural news based on ERNIE+DPCNN+BiGRU [J]. Journal of Computer Applications, 2023, 43(5): 1461-1466. |
[14] | Xu ZHANG, Long SHENG, Haifang ZHANG, Feng TIAN, Wei WANG. Pre-hospital emergency text classification model based on label confusion [J]. Journal of Computer Applications, 2023, 43(4): 1050-1055. |
[15] | Yongbing GAO, Juntian GAO, Rong MA, Lidong YANG. User granularity-level personalized social text generation model [J]. Journal of Computer Applications, 2023, 43(4): 1021-1028. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||