《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (6): 1879-1887.DOI: 10.11772/j.issn.1001-9081.2024050708

• 数据科学与技术 • 上一篇    

混合信息增强的论文推荐方法

郭盼盼, 周刚(), 卢记仓, 李珠峰, 祝涛杰   

  1. 信息工程大学 数据与目标工程学院,郑州 450001
  • 收稿日期:2024-05-28 修回日期:2024-08-01 接受日期:2024-08-08 发布日期:2024-08-26 出版日期:2025-06-10
  • 通讯作者: 周刚
  • 作者简介:郭盼盼(1992—),女,河南周口人,博士研究生,主要研究方向:推荐系统
    周刚(1974—),男,江苏常州人,教授,博士,主要研究方向:大数据分析、大数据智能、知识图谱 zhougang_ieu@126.com
    卢记仓(1985—),男,河北大名人,副教授,博士,主要研究方向:数据智能、社交媒体挖掘
    李珠峰(1983—),男,河南郑州人,讲师,博士,主要研究方向:大数据分析、自然语言处理
    祝涛杰(1995—),男,河南周口人,助教,硕士,主要研究方向:知识图谱、数据挖掘。
  • 基金资助:
    河南省自然科学基金资助项目(222300420590)

Paper recommendation method with mixed information enhancement

Panpan GUO, Gang ZHOU(), Jicang LU, Zhufeng LI, Taojie ZHU   

  1. School of Data and Target Engineering,Information Engineering University,Zhengzhou Henan 450001,China
  • Received:2024-05-28 Revised:2024-08-01 Accepted:2024-08-08 Online:2024-08-26 Published:2025-06-10
  • Contact: Gang ZHOU
  • About author:GUO Panpan, born in 1992, Ph. D. candidate. Her research interests include recommendation system
    ZHOU Gang, born in 1974, Ph. D., professor. His research interests include big data analysis, big data intelligence, knowledge graph.
    LU Jicang, born in 1985, Ph. D., associate professor. His research interests include data intelligence, social media mining.
    LI Zhufeng, born in 1983, Ph. D., lecturer. His research interests include big data analysis, natural language processing.
    ZHU Taojie, born in 1995, M. S., teaching assistant. His research interests include knowledge graph, data mining.
  • Supported by:
    Natural Science Foundation of Henan Province(222300420590)

摘要:

针对传统协同过滤(CF)存在的数据稀疏和冷启动的问题以及在矩阵分解方法生成结果矩阵的过程中由于各种变换产生误差的问题,提出一种混合信息增强的低秩稀疏矩阵分解(LSMF)论文推荐方法。首先,利用预训练的文档级表示学习和引文感知转换器SPECTER(Scientific Paper Embeddings using Citation-informed TransformERs)学习论文的表示,计算并构造文章之间的相似度矩阵,将相似度矩阵与引文矩阵相加得到一个混合信息矩阵;其次,通过矩阵乘法将内容相似信息与引用信息融入到论文-作者矩阵中;最后,利用LSMF模型分解论文-作者矩阵以得到推荐列表。在ACL文集网络(AAN)和DBLP数据集上的实验结果表明,所提方法取得了较好的推荐性能,且所提方法引入内容信息与引用信息的方式同样适用于其他矩阵分解模型。对于非负矩阵分解(NMF)、奇异值分解(SVD)、低秩稀疏矩阵补全(LSMC)和去分解(GoDec),利用混合信息后的模型比未利用混合信息的原模型在2个数据集上的前30个推荐结果的召回率(R@30)分别提升了18.72、7.43、11.53、14.62和20.58、2.11、7.91、5.01个百分点。

关键词: 论文推荐, 协同过滤, 数据稀疏, 冷启动, 低秩稀疏矩阵分解

Abstract:

To address the problems of data sparsity and cold start in traditional Collaborative Filtering (CF), as well as errors caused by various transformations in the process of generating result matrices using matrix factorization methods, a Low-rank and Sparse Matrix Factorization (LSMF) paper recommendation method with mixed information enhancement was proposed. Firstly, pre-trained document-level representation learning and citation aware converter — SPECTER (Scientific Paper Embeddings using Citation-informed TransformERs) was used to learn the representation of papers, and then the similarity matrix among papers was calculated and constructed. Secondly, the similarity matrix and citation matrix were added together to form a mixed information matrix, and then the content similarity information and citation information were integrated into the paper-author matrix through matrix multiplication. Finally, the recommendation list was obtained by using LSMF model to decompose the paper-author matrix. Experimental results on ACL Anthology Network (AAN) and DBLP datasets show that the proposed method achieves better recommendation performance, and the way of introducing content information and citation information in the proposed method can be equally applicable to other matrix factorization models. For Non-negative Matrix Factorization (NMF), Singular Value Decomposition (SVD), Low-rank and Sparse Matrix Completion (LSMC), and Go Decomposition (GoDec), the Recall values of the top 30 recommended results (R@30) of these models with mixed information are increased by 18.72,7.43,11.53,14.62 and 20.58, 2.11, 7.91, 5.01 percentage points, respectively, compared with those of the original models on the two datasets.

Key words: paper recommendation, Collaborative Filtering (CF), data sparsity, cold start, Low-rank and Sparse Matrix Factorization (LSMF)

中图分类号: