《计算机应用》唯一官方网站 ›› 2020, Vol. 40 ›› Issue (2): 595-601.DOI: 10.11772/j.issn.1001-9081.2019071222
收稿日期:
2019-07-15
修回日期:
2019-09-06
接受日期:
2019-09-06
发布日期:
2019-10-25
出版日期:
2020-02-10
通讯作者:
许维胜
作者简介:
陈曦(1995—),女,安徽芜湖人,硕士研究生,主要研究方向:数据挖掘、自然语言处理基金资助:
Xi CHEN1, Guang MEI1, Jinjin ZHANG2, Weisheng XU1,3()
Received:
2019-07-15
Revised:
2019-09-06
Accepted:
2019-09-06
Online:
2019-10-25
Published:
2020-02-10
Contact:
Weisheng XU
About author:
CHEN Xi, born in 1995, M. S. candidate. Her research interests include data mining, natural language processing.Supported by:
摘要:
针对高等教育本科教学场景中的学生成绩预测问题,提出了一种基于课程知识图谱(KG)的预测算法。首先,构造一个表示课程信息的课程知识图谱。然后,分别使用基于邻节点的方法和基于知识图谱表示学习的方法基于知识图谱计算课程在知识层面的相似度,并将课程的知识相似度集成到传统的成绩预测框架协同过滤(CF)中。最后,通过实验对比了融合知识图谱的算法和常见成绩预测算法在不同数据稀疏度场景下的性能。实验结果显示,在数据稀疏场景下,基于邻节点的算法和传统协同过滤算法相比,均方根误差(RMSE)下降约11%,平均绝对误差(MAE)下降约9%;基于图谱表示学习的算法与协同过滤算法相比RMSE下降17.55%,MAE下降11.40%。实验结果表明,运用知识图谱的协同过滤算法可使预测误差显著下降,验证了知识图谱可以作为历史数据缺乏场景下的信息补足,从而帮助协同过滤获得更好的预测效果。
中图分类号:
陈曦, 梅广, 张金金, 许维胜. 融合知识图谱和协同过滤的学生成绩预测方法[J]. 计算机应用, 2020, 40(2): 595-601.
Xi CHEN, Guang MEI, Jinjin ZHANG, Weisheng XU. Student grade prediction method based on knowledge graph and collaborative filtering[J]. Journal of Computer Applications, 2020, 40(2): 595-601.
实体名称 | 数量 | 实体名称 | 数量 |
---|---|---|---|
课程 | 5 378 | 参考书 | 2 063 |
院系 | 601 | 知识点 | 7 779 |
教材 | 2 187 | 教学模式 | 3 |
表1 实体类型及其数量
Tab. 1 Types and numbers of entities
实体名称 | 数量 | 实体名称 | 数量 |
---|---|---|---|
课程 | 5 378 | 参考书 | 2 063 |
院系 | 601 | 知识点 | 7 779 |
教材 | 2 187 | 教学模式 | 3 |
关系名称 | 数量 | 关系名称 | 数量 |
---|---|---|---|
院系-OFFER-课程 | 5 378 | 课程-TAKE-教材 | 2 581 |
课程-COVER-知识点 | 58 939 | 课程-REFER-参考书 | 2 063 |
课程-UTILIZE-教学模式 | 336 |
表2 关系类型及其数量
Tab. 2 Types and numbers of relationships
关系名称 | 数量 | 关系名称 | 数量 |
---|---|---|---|
院系-OFFER-课程 | 5 378 | 课程-TAKE-教材 | 2 581 |
课程-COVER-知识点 | 58 939 | 课程-REFER-参考书 | 2 063 |
课程-UTILIZE-教学模式 | 336 |
场景序号 | 算法名称 | RMSE | RMSE下降率/% | MAE | MAE下降率/% |
---|---|---|---|---|---|
1 | Normal Prediction | 1.175 1 | 0.931 7 | ||
MF | 0.889 8 | 0.678 8 | |||
Item-Based CF | 0.821 5 | 0.415 9 | |||
Same Community | 0.779 5 | 5.11 | 0.397 9 | 4.33 | |
Adamic Adar | 0.729 3 | 11.22 | 0.377 3 | 9.28 | |
Common Neighbor | 0.729 0 | 11.26 | 0.377 3 | 9.28 | |
Prefer Attachment | 0.857 6 | -4.39 | 0.477 1 | -14.72 | |
Resource Allocation | 0.729 8 | 11.16 | 0.378 1 | 9.09 | |
Total Neighbors | 0.845 9 | -2.97 | 0.470 2 | -13.06 | |
2 | Normal Prediction | 0.978 2 | 0.821 8 | ||
MF | 0.737 8 | 0.400 2 | |||
Item-Based CF | 0.688 4 | 0.351 5 | |||
Same Community | 0.651 9 | 5.30 | 0.333 1 | 5.23 | |
Adamic Adar | 0.626 6 | 8.98 | 0.318 6 | 9.36 | |
Common Neighbor | 0.625 9 | 9.08 | 0.318 3 | 9.45 | |
Prefer Attachment | 0.730 0 | -6.04 | 0.397 7 | -13.14 | |
Resource Allocation | 0.629 9 | 8.50 | 0.321 4 | 8.56 | |
Total Neighbors | 0.720 5 | -4.66 | 0.392 6 | -11.69 | |
3 | Normal Prediction | 0.887 3 | 0.790 6 | ||
MF | 0.681 8 | 0.417 6 | |||
Item-Based CF | 0.549 7 | 0.341 2 | |||
Same Community | 0.584 2 | -6.28 | 0.384 3 | -12.63 | |
Adamic Adar | 0.529 6 | 3.66 | 0.331 4 | 2.87 | |
Common Neighbor | 0.531 6 | 3.29 | 0.332 1 | 2.67 | |
Prefer Attachment | 0.601 8 | -9.48 | 0.367 6 | -7.74 | |
Resource Allocation | 0.593 5 | -7.97 | 0.360 6 | -5.69 | |
Total Neighbors | 0.551 8 | -0.38 | 0.339 6 | 0.47 |
表3 基于邻节点的算法多场景下的性能
Tab. 3 Performance of neighbor-based algorithms in multiple scenarios
场景序号 | 算法名称 | RMSE | RMSE下降率/% | MAE | MAE下降率/% |
---|---|---|---|---|---|
1 | Normal Prediction | 1.175 1 | 0.931 7 | ||
MF | 0.889 8 | 0.678 8 | |||
Item-Based CF | 0.821 5 | 0.415 9 | |||
Same Community | 0.779 5 | 5.11 | 0.397 9 | 4.33 | |
Adamic Adar | 0.729 3 | 11.22 | 0.377 3 | 9.28 | |
Common Neighbor | 0.729 0 | 11.26 | 0.377 3 | 9.28 | |
Prefer Attachment | 0.857 6 | -4.39 | 0.477 1 | -14.72 | |
Resource Allocation | 0.729 8 | 11.16 | 0.378 1 | 9.09 | |
Total Neighbors | 0.845 9 | -2.97 | 0.470 2 | -13.06 | |
2 | Normal Prediction | 0.978 2 | 0.821 8 | ||
MF | 0.737 8 | 0.400 2 | |||
Item-Based CF | 0.688 4 | 0.351 5 | |||
Same Community | 0.651 9 | 5.30 | 0.333 1 | 5.23 | |
Adamic Adar | 0.626 6 | 8.98 | 0.318 6 | 9.36 | |
Common Neighbor | 0.625 9 | 9.08 | 0.318 3 | 9.45 | |
Prefer Attachment | 0.730 0 | -6.04 | 0.397 7 | -13.14 | |
Resource Allocation | 0.629 9 | 8.50 | 0.321 4 | 8.56 | |
Total Neighbors | 0.720 5 | -4.66 | 0.392 6 | -11.69 | |
3 | Normal Prediction | 0.887 3 | 0.790 6 | ||
MF | 0.681 8 | 0.417 6 | |||
Item-Based CF | 0.549 7 | 0.341 2 | |||
Same Community | 0.584 2 | -6.28 | 0.384 3 | -12.63 | |
Adamic Adar | 0.529 6 | 3.66 | 0.331 4 | 2.87 | |
Common Neighbor | 0.531 6 | 3.29 | 0.332 1 | 2.67 | |
Prefer Attachment | 0.601 8 | -9.48 | 0.367 6 | -7.74 | |
Resource Allocation | 0.593 5 | -7.97 | 0.360 6 | -5.69 | |
Total Neighbors | 0.551 8 | -0.38 | 0.339 6 | 0.47 |
方法 | MRR | Hit@10/% | ||
---|---|---|---|---|
训练集 | 测试集 | 训练集 | 测试集 | |
TransE | 0.196 2 | 0.146 2 | 90.00 | 69.80 |
DistMult | 0.754 1 | 0.499 2 | 98.00 | 84.65 |
表4 TransE和DistMult的评价
Tab. 4 Evaluation of TransE and DistMult
方法 | MRR | Hit@10/% | ||
---|---|---|---|---|
训练集 | 测试集 | 训练集 | 测试集 | |
TransE | 0.196 2 | 0.146 2 | 90.00 | 69.80 |
DistMult | 0.754 1 | 0.499 2 | 98.00 | 84.65 |
场景序号 | 算法名称 | RMSE | RMSE下降率/% | MAE | MAE下降率/% |
---|---|---|---|---|---|
1 | Normal Prediction | 1.175 1 | 0.931 7 | ||
MF | 0.889 8 | 0.678 8 | |||
Item-Based CF | 0.821 5 | 0.415 9 | |||
TransE | 0.677 3 | 17.55 | 0.368 5 | 11.40 | |
DistMult | 0.771 3 | 6.11 | 0.401 3 | 3.51 | |
2 | Normal Prediction | 0.978 2 | 0.821 8 | ||
MF | 0.737 8 | 0.400 2 | |||
Item-Based CF | 0.688 4 | 0.351 5 | |||
TransE | 0.592 0 | 14.00 | 0.312 6 | 11.07 | |
DistMult | 0.655 9 | 4.72 | 0.331 9 | 5.58 | |
3 | Normal Prediction | 0.887 3 | 0.790 6 | ||
MF | 0.681 8 | 0.417 6 | |||
Item-Based CF | 0.549 7 | 0.341 2 | |||
TransE | 0.521 8 | 5.08 | 0.319 6 | 6.33 | |
DistMult | 0.523 7 | 4.73 | 0.300 5 | 5.98 |
表5 基于图谱表示学习的算法在多场景下的性能
Tab. 5 Performance of KG representation-based algorithms in multiple scenarios
场景序号 | 算法名称 | RMSE | RMSE下降率/% | MAE | MAE下降率/% |
---|---|---|---|---|---|
1 | Normal Prediction | 1.175 1 | 0.931 7 | ||
MF | 0.889 8 | 0.678 8 | |||
Item-Based CF | 0.821 5 | 0.415 9 | |||
TransE | 0.677 3 | 17.55 | 0.368 5 | 11.40 | |
DistMult | 0.771 3 | 6.11 | 0.401 3 | 3.51 | |
2 | Normal Prediction | 0.978 2 | 0.821 8 | ||
MF | 0.737 8 | 0.400 2 | |||
Item-Based CF | 0.688 4 | 0.351 5 | |||
TransE | 0.592 0 | 14.00 | 0.312 6 | 11.07 | |
DistMult | 0.655 9 | 4.72 | 0.331 9 | 5.58 | |
3 | Normal Prediction | 0.887 3 | 0.790 6 | ||
MF | 0.681 8 | 0.417 6 | |||
Item-Based CF | 0.549 7 | 0.341 2 | |||
TransE | 0.521 8 | 5.08 | 0.319 6 | 6.33 | |
DistMult | 0.523 7 | 4.73 | 0.300 5 | 5.98 |
1 | MCFARLAND J, HUSSAR B, ZHANG J, et al. The condition of education 2019[EB/OL]. [2019-05-01]. ?pubid=2019144. |
2 | GRAYSON A, MILLER H, CLARKE D D. Identifying barriers to help-seeking: a qualitative analysis of students’ preparedness to seek help from tutors[J]. British Journal of Guidance and Counselling, 1998, 26(2): 237-253. 10.1080/03069889808259704 |
3 | ROMERO C, VENTURA S. Educational data mining: a survey from 1995 to 2005[J]. Expert Systems with Applications, 2007, 33(1): 135-146. 10.1016/j.eswa.2006.04.005 |
4 | CASTRO F, VELLIDO A, NEBOT À, et al. Applying data mining techniques to e-learning problems[M]// JAIN L C, TEDMAN R A, TEDMAN D K. Evolution of Teaching and Learning Paradigms in Intelligent Environment, SCI62. Berlin: Springer, 2007: 183-221. |
5 | MEIER Y, XU J, ATAN O, et al. Predicting grades[J]. IEEE Transactions on Signal Processing, 2016, 64(4): 959-972. 10.1109/tsp.2015.2496278 |
6 | MÁRQUEZ-VERA C, ROMERO C, VENTURA S. Predicting school failure using data mining[C]// Proceedings of the 4th International Conference on Educational Data Mining. Eindhoven, Netherlands: International Educational Data Mining Society, 2011:271-276. |
7 | 刘志妩. 基于决策树算法的学生成绩的预测分析[J]. 计算机应用与软件, 2012, 29(11):312-314, 330. |
LIU Z W. Forecast and analysis of students’ marks based on decision tree algorithm[J]. Computer Applications and Software, 2012, 29(11): 312-314, 330. | |
8 | BURMAN I, SOM S. Predicting students academic performance using support vector machine[C]// Proceedings of the 2019 Amity International Conference on Artificial Intelligence. Piscataway: IEEE, 2019: 756-759. 10.1109/aicai.2019.8701260 |
9 | CAZAREZ R L U, MARTIN C L. Neural networks for predicting student performance in online education[J]. IEEE Latin America Transactions, 2018, 16(7): 2053-2060. 10.1109/tla.2018.8447376 |
10 | 黄建明. 贝叶斯网络在学生成绩预测中的应用[J]. 计算机科学, 2012, 39(S3):280-282. 10.3969/j.issn.1002-137X.2012.z3.075 |
HUANG J M. Application of Bayesian network to predicting students’ achievement[J]. Computer Science, 2012, 39(11A): 280-282. 10.3969/j.issn.1002-137X.2012.z3.075 | |
11 | BYDŽOVSKÁ H. Are collaborative filtering methods suitable for student performance prediction?[C]// Proceedings of the 2015 Portuguese Conference on Artificial Intelligence, LNCS9273. Cham: Springer, 2015: 425-430. |
12 | BYDŽOVSKÁ H. A comparative analysis of techniques for predicting student performance[C]// Proceedings of the 2016 International Conference on Educational Data Mining. Raleigh, NC: International Educational Data Mining Society, 2016: 306-311. |
13 | HUANG L, WANG C, CHAO H, et al. A score prediction approach for optional course recommendation via cross-user-domain collaborative filtering[J]. IEEE Access, 2019, 7: 19550-19563. 10.1109/access.2019.2897979 |
14 | SWEENEY M, RANGWALA H, LESTER J, et al. Next-term student performance prediction: a recommender systems approach[EB/OL]. [2019-05-01]. . 10.1109/bigdata.2015.7363847 |
15 | ALMUTAIRI F M, SIDIROPOULOS N D, KARYPIS G. Context-aware recommendation-based learning analytics using tensor and coupled matrix factorization[J]. IEEE Journal of Selected Topics in Signal Processing, 2017, 11(5): 729-741. 10.1109/jstsp.2017.2705581 |
16 | ELBADRAWY A, POLYZOU A, REN Z, et al. Predicting student performance using personalized analytics[J]. Computer, 2016, 49(4): 61-69. 10.1109/mc.2016.119 |
17 | XU J, MOON K H, SCHAAR M VAN DER. A machine learning approach for tracking and predicting student performance in degree programs[J]. IEEE Journal of Selected Topics in Signal Processing, 2017, 11(5): 742-753. 10.1109/jstsp.2017.2692560 |
18 | MIHALCEA R, TARAU P. TextRank: bringing order into text[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2004: 404-411. 10.3115/1220355.1220517 |
19 | ADAMIC L A, ADAR E. Friends and neighbors on the Web[J]. Social Networks, 2003, 25(3): 211-230. 10.1016/s0378-8733(03)00009-1 |
20 | JEONG H, NÉDA Z, BARABÁSI A L. Measuring preferential attachment in evolving networks[J]. Europhysics Letters, 2003, 61(4): 567-572. 10.1209/epl/i2003-00166-9 |
21 | ZHOU T, LÜ L, ZHANG Y. Predicting missing links via local information[J]. The European Physical Journal B, 2009, 71(4): 623-630. 10.1140/epjb/e2009-00335-8 |
22 | BORDES A, USUNIER N, GARCIA-DURÁN A, et al. Translating embeddings for modeling multi-relational data[C]// Proceedings of the 2013 Conference on Neural Information Processing Systems. New York: ACM, 2013: 2787-2795. 10.1007/978-3-662-44848-9_28 |
23 | YANG B, YIH W T, HE X, et al. Embedding entities and relations for learning and inference in knowledge bases[EB/OL]. [2019-05-01]. . |
24 | YANG Y, LIU H, CARBONELL J, et al. Concept graph learning from educational data[C]// Proceedings of the 8th ACM International Conference on Web Search and Data Mining. New York: ACM, 2015: 159-168. 10.1145/2684822.2685292 |
25 | LARRAÑAGA M, CONDE A, CALVO I, et al. Automatic generation of the domain module from electronic textbooks: method and validation[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(1): 69-82. 10.1109/tkde.2013.36 |
26 | SALTON G, BUCKLEY C. Term-weighting approaches in automatic text retrieval[J]. Information Processing and Management, 1988, 24(5): 513-523. 10.1016/0306-4573(88)90021-0 |
27 | 侯俊萌. 基于MOOC的高等教育知识图谱的构建[D]. 北京:北京邮电大学, 2017: 1-65. 10.7763/ijiet.2016.v6.672 |
HOU J M. Construction of higher education knowledge map based on MOOC[D]. Beijing: Beijing University of Posts and Telecommunications, 2017: 1-65. 10.7763/ijiet.2016.v6.672 | |
28 | CHEN P, LU Y, ZHENG V W, et al. KnowEdu: a system to construct knowledge graph for education[J]. IEEE Access, 2018, 6: 31553-31563. 10.1109/access.2018.2839607 |
29 | WANG S, LIANG C, WU Z, et al. Concept hierarchy extraction from textbooks[C]// Proceedings of the 2015 ACM Symposium on Document Engineering. New York: ACM, 2015: 147-156. 10.1145/2682571.2797062 |
30 | TROUILLON T, WELBL J, RIEDEL S, et al. Complex embeddings for simple link prediction[C]// Proceedings of the 33rd International Conference on Machine Learning. New York: International Machine Learning Society, 2016: 2071-2080. |
31 | PAGE L, BRIN S, MOTWANI R, et al. The PageRank citation ranking: bringing order to the Web[R]. Brisbane, Australia: Stanford InfoLab, 1999. |
32 | KOREN Y, BELL R, VOLINSKY C. Matrix factorization techniques for recommender systems[J]. Computer, 2009, 42(8): 30-37. 10.1109/mc.2009.263 |
[1] | 薛桂香, 王辉, 周卫峰, 刘瑜, 李岩. 基于知识图谱和时空扩散图卷积网络的港口交通流量预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2952-2957. |
[2] | 武杰, 张安思, 吴茂东, 张仪宗, 王从宝. 知识图谱在装备故障诊断领域的研究与应用综述[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2651-2659. |
[3] | 杨兴耀, 陈羽, 于炯, 张祖莲, 陈嘉颖, 王东晓. 结合自我特征和对比学习的推荐模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2704-2710. |
[4] | 赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2421-2429. |
[5] | 于右任, 张仰森, 蒋玉茹, 黄改娟. 融合多粒度语言知识与层级信息的中文命名实体识别模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1706-1712. |
[6] | 李健京, 李贯峰, 秦飞舟, 李卫军. 基于不确定知识图谱嵌入的多关系近似推理模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1751-1759. |
[7] | 赵晓焱, 匡燕, 王梦含, 袁培燕. 基于知识图谱的端到端内容共享机制[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 995-1001. |
[8] | 郭洁, 林佳瑜, 梁祖红, 罗孝波, 孙海涛. 基于知识感知和跨层次对比学习的推荐方法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1121-1127. |
[9] | 王利琴, 张特, 许智宏, 董永峰, 杨国伟. 融合实体语义及结构信息的知识图谱推理[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3371-3378. |
[10] | 周北京, 王海荣, 王怡梦, 张丽丝, 马赫. 图谱嵌入传播的推荐方法[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3252-3259. |
[11] | 蒋汶娟, 过弋, 付娇娇. 融合图注意力的复杂时序知识图谱推理问答模型[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3047-3057. |
[12] | 王红斌, 房晓, 江虹. 融入三维语义特征的常识推理问答方法[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 138-144. |
[13] | 王春雷, 王肖, 刘凯. 多模态知识图谱表示学习综述[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 1-15. |
[14] | 潘润超, 虞启山, 熊泓霏, 刘智慧. 基于深度图神经网络的协同推荐算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2741-2746. |
[15] | 郑浩东, 马华, 谢颖超, 唐文胜. 融合遗忘因素与记忆门的图神经网络知识追踪模型[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2747-2752. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||