Classification method for traditional Chinese medicine electronic medical records based on heterogeneous graph representation

doi:10.11772/j.issn.1001-9081.2023030260

Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (2): 411-417.DOI: 10.11772/j.issn.1001-9081.2023030260

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

Classification method for traditional Chinese medicine electronic medical records based on heterogeneous graph representation

Kaitian WANG¹, Qing YE¹^,²(), Chunlei CHENG¹^,²

^1.College of Computer Science，Jiangxi University of Chinese Medicine，Nanchang Jiangxi 330004，China
^2.Jiangxi Province Key Research Laboratory of Artificial Intelligence in Traditional Chinese Medicine （Jiangxi University of Chinese Medicine），Nanchang Jiangxi 330004，China

Received:2023-03-16 Revised:2023-05-25 Accepted:2023-05-26 Online:2023-07-05 Published:2024-02-10
Contact: Qing YE
About author:WANG Kaitian， born in 1999， M. S. candidate. His research interests include natural language processing， data mining.
CHENG Chunlei， born in 1976， Ph. D.， associate professor. His research interests include machine learning， knowledge representation and learning， knowledge graph.
Supported by:
National Natural Science Foundation of China(82260988);Jiangxi Provincial Natural Science Foundation(20224BAB206102);Science and Technology Research Key Project of Jiangxi Provincial Department of Education(GJJ201204)

基于异构图表示的中医电子病历分类方法

王楷天¹, 叶青¹^,²(), 程春雷¹^,²

^1.江西中医药大学计算机学院，南昌 330004
^2.江西省中医人工智能重点研究室（江西中医药大学），南昌 330004

通讯作者: 叶青
作者简介:王楷天（1999—），男，黑龙江牡丹江人，硕士研究生，主要研究方向：自然语言处理、数据挖掘
程春雷（1976—），男，江西南昌人，副教授，博士，主要研究方向：机器学习、知识表示与学习、知识图谱。
基金资助:
国家自然科学基金资助项目(82260988);江西省自然科学基金资助项目(20224BAB206102);江西省教育厅科学技术研究重点项目(GJJ201204)

Abstract

Abstract:

Traditional Chinese Medicine （TCM） electronic medical records face challenges in data mining， low utilization rates， and difficulty in extracting meaningful information due to their complex and diverse structures， as well as non-standard diagnosis and treatment terminology. To address these issues， a TCM electronic medical record classification model called TCM-GCN was proposed based on Linguistically-motivated bidirectional Encoder Representation from Transformer （LERT） pre-training model and Graph Convolutional Network （GCN）， and represented by a heterogeneous graph. The model was used to improve the extraction and classification of effective features in TCM electronic medical records. Firstly， the medical records were converted into sentence vectors using the word embedding method of the LERT layer and integrated into the heterogeneous graph to complement the overall semantic features that were missing in the graph structure. Next， to mitigate the negative impact of the structural characteristics on feature extraction， keywords were added to the nodes of the heterogeneous graph. The BM25 and Pointwise Mutual Information （PMI） algorithms were employed to construct edges representing the features of medical records， such as “medical record - keyword” and “keyword - keyword”. Finally， the task of medical record classification was completed by TCM-GCN， relying on the heterogeneous graph constructed by using LERT-BM25-PMI to aggregate and extract the feature relationships between medical records. Experimental results on the TCM electronic medical record dataset show that， compared to the suboptimal LERT， TCM-GCN achieves improvements of 2.24%， 2.38%， and 2.32% in accuracy， recall， and F1 value， respectively， after applying a weighted average， which confirms the effectiveness of the algorithm in capturing hidden features in medical records and classifying TCM electronic medical records.

Key words: heterogeneous graph, Graph Convolutional Network (GCN), pre-training model, text classification, Natural Language Processing (NLP), Traditional Chinese Medicine (TCM) electronic medical record

摘要：

中医（TCM）电子病历由于结构复杂多样与诊疗术语不规范的特点导致数据挖掘难度大、利用率低、难以抽取到有效信息。针对上述问题，提出基于LERT（Linguistically-motivated bidirectional Encoder Representation from Transformer）预训练模型与图卷积网络（GCN）并用异构图表示的中医电子病历分类模型TCM-GCN，用于改善中医电子病历特征有效表征的提取与分类。首先，利用LERT层词嵌入的方式将病历转换为句向量融入异构图中，以补全图结构缺失的病历整体语义特征；随后，为了缓解中医电子病历结构特点对特征提取产生的负面影响，异构图将关键词加入节点，使用BM25与点间互信息（PMI）算法构建图中“病历-关键词”“关键词-关键词”的边以表达病历的特征；最后，TCM-GCN依靠LERT-BM25-PMI构建的异构图对病历之间的特征关系进行聚合与抽取，完成病历分类的任务。在中医电子病历数据集上的实验结果表明，相较于次优的LERT，TCM-GCN加权平均后的准确率、召回率、F1值分别提升了2.24%、2.38%、2.32%，验证了算法在捕捉病历间隐含特征与中医电子病历分类工作上的有效性。

关键词: 异构图, 图卷积网络, 预训练模型, 文本分类, 自然语言处理, 中医电子病历

CLC Number:

TP391.1

Kaitian WANG, Qing YE, Chunlei CHENG. Classification method for traditional Chinese medicine electronic medical records based on heterogeneous graph representation[J]. Journal of Computer Applications, 2024, 44(2): 411-417.

王楷天, 叶青, 程春雷. 基于异构图表示的中医电子病历分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 411-417.

Figures/Tables 10

References 22

1	俞华，陶正玄，赵英英.围手术期智能临床辅助决策系统的构建与应用［J］.中国卫生信息管理杂志，2022，19（6）：911-917. 10.3969/j.issn.1672-5166.2022.06.022
	YU H， TAO Z X， ZHAO Y Y. Construction and application of preoperative period intelligent clinical decision support system［J］. Chinese Journal of Health Informatics and Management，2022，19（6）：911-917. 10.3969/j.issn.1672-5166.2022.06.022
2	张文博，陈希，张美霞，等.考虑适应性达成过程的慢性疾病个性化辅助决策方法［J］.系统工程， 2023， 41（4）： 127-136.
	ZHANG W B， CHEN X， ZHANG M X， et al. A personalized assistant-decision making method for chronic diseases considering the adaptive reaching process［J］. Systems Engineering， 2023， 41（4）： 127-136.
3	王红迁，汪鹏，王飞，等.多元数据融合的临床辅助决策系统的研究与应用［J］.中国数字医学，2019，14（11）：18-20. 10.3969/j.issn.1673-7571.2019.11.005
	WANG H Q， WANG P， WANG F， et al. Research and application of clinical decision support system based on multivariate data fusion［J］. China Digital Medicine， 2019，14（11）：18-20. 10.3969/j.issn.1673-7571.2019.11.005
4	YANG R， YE Q， CHENG C， et al. Decision-making system for the diagnosis of syndrome based on traditional Chinese medicine knowledge graph［J］. Evidence-Based Complementary and Alternative Medicine， 2022， 2022： No.8693937. 10.1155/2022/8693937
5	RUAN C， WU Y， YANG Y， et al. Semantic-aware graph convolutional networks for clinical auxiliary diagnosis and treatment of traditional Chinese medicine［J］. IEEE Access， 2021， 9： 8797-8807. 10.1109/access.2020.3048932
6	张玉洁，白如江，许海云，等.融合多自然语言处理任务的中医辅助诊疗方案研究——以糖尿病为例［J］.数据分析与知识发现，2022，6（1）：122-133.
	ZHANG Y J， BAI R J， XU H Y， et al. Assisted TCM diagnosis and treatment for diabetes with multi NLP tasks［J］. Data Analysis and Knowledge Discovery， 2022， 6（1）： 122-133.
7	蔡秀军，林辉，乔凯，等.智能辅助决策支持系统在临床诊疗决策中的应用研究［J］.中国数字医学， 2019， 14（3）： 111-113. 10.3969/j.issn.1673-7571.2019.03.031
	CAI X J， LIN H， QIAO K， et al. Research on the application of the intelligent decision-aid support system in clinical diagnosis and treatment decision support［J］. China Digital Medicine， 2019， 14（3）： 111-113. 10.3969/j.issn.1673-7571.2019.03.031
8	CUI Y， CHE W， WANG S， et al. LERT： a linguistically-motivated pre-trained language model ［EB/OL］. ［2023-02-17］. .
9	DOGRA V， VERMA S， CHATTERJEE P， et al. A complete process of text classification system using state-of-the-art NLP models［J］. Computational Intelligence and Neuroscience， 2022， 2022： No.1883698. 10.1155/2022/1883698
10	KIM Y. Convolutional neural networks for sentence classification［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsberg： ACL， 2014： 1746-1751. 10.3115/v1/d14-1181
11	DEVLIN J， CHANG M-W， LEE K， et al. BERT： Pre-training of deep bidirectional transformers for language understanding ［EB/OL］. ［2022-08-17］. . 10.18653/v1/n18-2
12	KIPF T N， WELLING M. Semi-supervised classification with graph convolutional networks ［EB/OL］. ［2022-10-21］. . 10.48550/arXiv.1609.02907
13	YAO L， MAO C， LUO Y. Graph convolutional networks for text classification［C］// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2019： 7370-7377. 10.1609/aaai.v33i01.33017370
14	LIN Y， MENG Y， SUN X， et al. BertGCN： Transductive text classification by combining GCN and BERT ［EB/OL］. ［2023-01-22］. . 10.18653/v1/2021.findings-acl.126
15	李明浩，刘忠，姚远哲.基于LSTM-CRF的中医医案症状术语识别［J］.计算机应用，2018，38（S2）：42-46.
	LI M H， LIU Z， YAO Y Z. LSTM-CRF based symptom term recognition on traditional Chinese medical case［J］. Journal of Computer Applications， 2018， 38（S2）： 42-46.
16	杜琳，曹东，林树元，等.基于BERT与Bi-LSTM融合注意力机制的中医病历文本的提取与自动分类［J］.计算机科学，2020，47（S2）：416-420. 10.11896/jsjkx.200200020
	DU L， CAO D， LIN S Y， et al. Extraction and automatic classification of TCM medical records based on attention mechanism of BERT and Bi-LSTM［J］. Computer Science， 2020，47（S2）：416-420. 10.11896/jsjkx.200200020
17	WU F， SOUZA A， ZHANG T， et al. Simplifying graph convolutional networks［C］// Proceedings of the 36th International Conference on Machine Learning. New York： PMLR， 2019， 97： 6861-6871. 10.48550/arXiv.1902.07153
18	CHE W， FENG Y， QIN L， et al. N-LTP： an open-source neural language technology platform for Chinese ［EB/OL］. ［2022-11-19］. . 10.18653/v1/2021.emnlp-demo.6
19	LOSHCHILOV I， HUTTER F. SGDR： stochastic gradient descent with restarts ［EB/OL］. ［2022-03-19］. .
20	VINYALS O， BLUNDELL C， LILLICRAP T， et al. Matching networks for one shot learning［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2016： 3637-3645.
21	LIU P， QIU X， HUANG X. Recurrent neural network for text classification with multi-task learning ［EB/OL］. ［2023-01-22］. . 10.18653/v1/d16-1012
22	JOULIN A， GRAVE E， BOJANOWSKI P， et al. Bag of tricks for efficient text classification ［EB/OL］. ［2022-12-29］. . 10.18653/v1/e17-2068

信息类别	内容
性别	女
望诊	面色偏滞；形体稍胖，神情平静，语速偏快、话多
脉诊	脉略滑，右脉边界欠清，左关稍旺，左寸上略浮
舌诊	舌质偏暗青，苔淡黄稍厚
查体	咽壁滤泡，分泌物多
主诉	右耳鸣半月余
中医诊断	感冒

信息类别	内容
性别	女
望诊	面色偏滞；形体稍胖，神情平静，语速偏快、话多
脉诊	脉略滑，右脉边界欠清，左关稍旺，左寸上略浮
舌诊	舌质偏暗青，苔淡黄稍厚
查体	咽壁滤泡，分泌物多
主诉	右耳鸣半月余
中医诊断	感冒

真实情况	预测情况
真实情况	预测为该疾病	预测不为该疾病
预测准确	TP	TN
预测错误	FP	FN

真实情况	预测情况
真实情况	预测为该疾病	预测不为该疾病
预测准确	TP	TN
预测错误	FP	FN

参数名	符号	值
GCN学习率	gcn_lr	0.05
GCN权重衰减	gcn_weight_decay	10^-5
GCN隐藏层特征维度	n_hidden	32
PMI阈值	p	0
LERT学习率	lert_lr	10^-5
LERT权重衰减	lert_weight_decay	10^-4
batch大小	batch_size	128
权重	λ	0.5
迭代次数	epoch	200

Classification method for traditional Chinese medicine electronic medical records based on heterogeneous graph representation

基于异构图表示的中医电子病历分类方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 10

References 22

Related Articles 15

Recommended Articles

Metrics

模型	精确率	召回率	F1	AUC
LSTM	0.666 5	0.664 8	0.663 1	0.882 9
Text-GCN	0.749 5	0.750 9	0.748 3	0.930 7
LERT	0.767 4	0.763 6	0.763 0	0.915 2
Text_CNN	0.724 0	0.723 6	0.721 2	0.924 6
Text_RNN	0.690 8	0.685 5	0.684 9	0.915 9
FastText	0.725 0	0.714 5	0.713 4	0.937 3
TCM-GCN	0.784 6	0.781 8	0.780 7	0.927 2

实验	构图方法	精确率	召回率	F1	AUC
1	LERT+one-hot+PMI	0.771 5	0.770 9	0.770 0	0.916 2
2	LERT+ BW25	0.763 4	0.761 8	0.760 5	0.906 2
3	BW25+PMI	0.767 1	0.765 5	0.764 3	0.913 3
4	LERT+TF-IDF+PMI	0.774 1	0.772 7	0.771 4	0.916 2
5	LERT+BW25+PMI	0.784 6	0.781 8	0.780 7	0.927 2

[1]	Guixiang XUE, Hui WANG, Weifeng ZHOU, Yu LIU, Yan LI. Port traffic flow prediction based on knowledge graph and spatio-temporal diffusion graph convolutional network [J]. Journal of Computer Applications, 2024, 44(9): 2952-2957.
[2]	Chuanlin PANG, Rui TANG, Ruizhi ZHANG, Chuan LIU, Jia LIU, Shibo YUE. Distributed power allocation algorithm based on graph convolutional network for D2D communication systems [J]. Journal of Computer Applications, 2024, 44(9): 2855-2862.
[3]	Qi SHUAI, Hairui WANG, Guifu ZHU. Chinese story ending generation model based on bidirectional contrastive training [J]. Journal of Computer Applications, 2024, 44(9): 2683-2688.
[4]	Xianglan WU, Yang XIAO, Mengying LIU, Mingming LIU. Text-to-SQL model based on semantic enhanced schema linking [J]. Journal of Computer Applications, 2024, 44(9): 2689-2695.
[5]	Quanmei ZHANG, Runping HUANG, Fei TENG, Haibo ZHANG, Nan ZHOU. Automatic international classification of disease coding method incorporating heterogeneous information [J]. Journal of Computer Applications, 2024, 44(8): 2476-2482.
[6]	Fan YANG, Yao ZOU, Mingzhi ZHU, Zhenwei MA, Dawei CHENG, Changjun JIANG. Credit card fraud detection model based on graph attention Transformation neural network [J]. Journal of Computer Applications, 2024, 44(8): 2634-2642.
[7]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.
[8]	Xun YAO, Zhongzheng QIN, Jie YANG. Generative label adversarial text classification model [J]. Journal of Computer Applications, 2024, 44(6): 1781-1785.
[9]	Youren YU, Yangsen ZHANG, Yuru JIANG, Gaijuan HUANG. Chinese named entity recognition model incorporating multi-granularity linguistic knowledge and hierarchical information [J]. Journal of Computer Applications, 2024, 44(6): 1706-1712.
[10]	Shibin LI, Jun GONG, Shengjun TANG. Semi-supervised heterophilic graph representation learning model based on Graph Transformer [J]. Journal of Computer Applications, 2024, 44(6): 1816-1823.
[11]	Xinyan YU, Cheng ZENG, Qian WANG, Peng HE, Xiaoyu DING. Few-shot news topic classification method based on knowledge enhancement and prompt learning [J]. Journal of Computer Applications, 2024, 44(6): 1767-1774.
[12]	Longtao GAO, Nana LI. Aspect sentiment triplet extraction based on aspect-aware attention enhancement [J]. Journal of Computer Applications, 2024, 44(4): 1049-1057.
[13]	Xianfeng YANG, Yilei TANG, Ziqiang LI. Aspect-level sentiment analysis model based on alternating‑attention mechanism and graph convolutional network [J]. Journal of Computer Applications, 2024, 44(4): 1058-1064.
[14]	Hang YU, Yanling ZHOU, Mengxin ZHAI, Han LIU. Text classification based on pre-training model and label fusion [J]. Journal of Computer Applications, 2024, 44(3): 709-714.
[15]	Baoshan YANG, Zhi YANG, Xingyuan CHEN, Bing HAN, Xuehui DU. Analysis of consistency between sensitive behavior and privacy policy of Android applications [J]. Journal of Computer Applications, 2024, 44(3): 788-796.