基于无监督语义哈希的高效相似题检索模型

doi:10.11772/j.issn.1001-9081.2023091260

《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (1): 206-216.DOI: 10.11772/j.issn.1001-9081.2023091260

• 数据科学与技术 • 上一篇

基于无监督语义哈希的高效相似题检索模型

佟威¹, 何理扬²^,³, 李锐²^,³, 黄威¹, 黄振亚²^,³, 刘淇²^,³()

^1.教育部教育考试院, 北京 100084
^2.中国科学技术大学计算机科学与技术学院, 合肥 230027
^3.认知智能全国重点实验室, 合肥 230088

收稿日期:2023-09-14 修回日期:2023-10-14 接受日期:2023-10-24 发布日期:2023-12-08 出版日期:2024-01-10
通讯作者: 刘淇
作者简介:佟威（1984—），男，河北沧州人，博士，主要研究方向：教育数据挖掘、自然语言处理；
何理扬（1998—），男，湖南郴州人，博士研究生，CCF会员，主要研究方向：信息检索；
李锐（2000—），男，安徽六安人，硕士研究生，CCF会员，主要研究方向：信息检索；
黄威（1995—），男，浙江温州人，博士，CCF会员，主要研究方向：教育数据挖掘、层次化多标签分类；
黄振亚（1992—），男，安徽合肥人，副教授，博士，CCF会员，主要研究方向：数据挖掘、文本挖掘、知识推理、教育大数据分析；
第一联系人：刘淇（1986—），男，山东临沂人，教授，博士，CCF会员，主要研究方向：数据挖掘、智慧教育、推荐系统、社交网络分析。
基金资助:
国家教育考试科研规划课题(GJK2021009);国家重点研发计划项目(2021YFF0901003);国家自然科学基金资助项目(62106244);安徽高校协同创新项目(GXXT-2022-042)

Efficient similar exercise retrieval model based on unsupervised semantic hashing

Wei TONG¹, Liyang HE²^,³, Rui LI²^,³, Wei HUANG¹, Zhenya HUANG²^,³, Qi LIU²^,³()

^1.National Education Examinations Authority，Beijing 100084，China
^2.School of Computer Science and Technology，University of Science and Technology of China，Hefei Anhui 230027，China
^3.State Key Laboratory of Cognitive Intelligence，Hefei Anhui 230088，China

Received:2023-09-14 Revised:2023-10-14 Accepted:2023-10-24 Online:2023-12-08 Published:2024-01-10
Contact: Qi LIU
About author:TONG Wei， born in 1984， Ph. D. His research interests include education data mining， natural language processing.
HE Liyang， born in 1998， Ph. D. candidate. His research interests include information retrieval.
LI Rui， born in 2000， M. S. candidate. His research interests include information retrieval.
HUANG Wei， born in 1995， Ph. D. His research interests include education data mining， hierarchical multi-label categorization.
HUANG Zhenya， born in 1992， Ph. D.， associate professor. His research interests include data mining， text mining， knowledge reasoning， education big data analysis.
Supported by:
National Education Examinations Authority(GJK2021009);National Key Research and Development Program of China(2021YFF0901003);National Natural Science Foundation of China(62106244);University Synergy Innovation Program of Anhui Province(GXXT-2022-042)

摘要/Abstract

摘要：

相似题检索旨在从数据库中找到与给定查询试题考查目标相似的试题。随着在线教育的不断发展，试题数据库日益庞大，且由于试题数据的专业属性使标注相关性非常困难，因此需要一种高效且无需标注的相似题检索模型。无监督语义哈希能在无监督信号的前提下将高维数据映射为低维且高效的二值表征。但不能简单地将语义哈希模型应用在相似题检索模型中，因为试题数据具有丰富的语义信息，而二值向量的表征空间有限。为此，提出一个能获取、保留关键信息的相似题检索模型。首先，设计了一个关键信息获取模块获取试题数据的关键信息，并引入去冗余目标损失去除冗余信息；其次，在编码过程中引入随时间变化的激活函数，减少编码信息损失；再次，为了最大化利用汉明空间，在优化过程中引入比特平衡目标和比特无关目标以优化二值表征的分布。在MATH和HISTORY数据集上的实验结果表明，相较于表现最好的文本语义哈希模型DHIM （Deep Hash InfoMax），所提模型在2个数据集的3个召回率设置上分别平均提升约54%和23%；在检索效率方面，所提模型比最优的相似题检索模型QuesCo具有明显的优势。

关键词: 相似题检索, 无监督语义哈希, 表征学习, 对比学习

Abstract:

Finding similar exercises aims to retrieve exercises with similar testing goals to a given query exercise from the exercise database. As online education evolves， the exercise database is growing in size， and due to the professional characteristic of the exercises， it is not easy to annotate their relations. Thus， online education systems require an efficient and unsupervised model for finding similar exercise. Unsupervised semantic hashing can map high-dimensional data to compact and efficient binary representation under the premise of unsupervised signals. However，it is inadequate to simply apply the semantic hashing model to the similar exercise retrieval model because exercise data contains rich semantic information while the representation space of binary vector is limited. To address this issue， a similar exercise retrieval model was introduced to acquire and retain crucial information. Firstly， a crucial information acquisition module was designed to acquire critical information from exercise data and a de-redundancy object loss was proposed to eliminate redundant information. Secondly， a time-aware activation function was introduced to reduce coding information loss. Thirdly， to maximize the utilization of the Hamming space， a bit balance loss and a bit independent loss were introduced to optimize the distribution of binary representation in the optimization process. Experimental results on MATH and HISTORY datasets demonstrate that the proposed model outperforms the state-of-the-art text semantic hashing model Deep Hash InfoMax （DHIM）， with an average improvement of approximately 54% and 23% respectively across three recall settings. Moreover， compared to the best-performing similar exercise retrieval model QuesCo， the proposed model demonstrates a clear advantage on search efficiency.

中图分类号:

TP391.3

佟威, 何理扬, 李锐, 黄威, 黄振亚, 刘淇. 基于无监督语义哈希的高效相似题检索模型[J]. 计算机应用, 2024, 44(1): 206-216.

Wei TONG, Liyang HE, Rui LI, Wei HUANG, Zhenya HUANG, Qi LIU. Efficient similar exercise retrieval model based on unsupervised semantic hashing[J]. Journal of Computer Applications, 2024, 44(1): 206-216.

图/表 11

图1 相似题示例和相似题检索流程

Fig. 1 Examples of similar exercises and process of similar exercise retrieval

表1 符号及其含义

Tab. 1 Symbols and their meanings

名称	含义	名称	含义
$τ$	温度超参数	$B$	一个训练迷你批次集合
$s i$	第 $i$ 个单词的词向量	$H i j$	哈希码 $h i$ 第 $j$ 个维度的值
$s i 1$	第 $i$ 个局部表征	$K$	搜索返回的相关试题数量
$s g$	全局表征

表1 符号及其含义

Tab. 1 Symbols and their meanings

名称	含义	名称	含义
$τ$	温度超参数	$B$	一个训练迷你批次集合
$s i$	第 $i$ 个单词的词向量	$H i j$	哈希码 $h i$ 第 $j$ 个维度的值
$s i 1$	第 $i$ 个局部表征	$K$	搜索返回的相关试题数量
$s g$	全局表征

图2 USH-SER模型的结构

Fig. 2 Structure of the USH-SER model

图3 时间感知的激活函数

Fig. 3 Time-aware activation function

表2 MATH数据集上不同模型的R@K和MRR结果对比

Tab. 2 Result comparison of R@K and MRR among different models on MATH dataset

模型	R@100	R@200	R@400	MRR
SPH	0.183 9	0.278 1	0.363 2	0.021 3
STH	0.183 4	0.289 4	0.382 3	0.020 3
VDSH	0.192 8	0.303 0	0.421 8	0.023 4
NASH	0.192 2	0.314 0	0.432 2	0.027 8
NbrReg	0.194 5	0.314 9	0.449 6	0.025 8
PairRec	0.204 1	0.321 0	0.453 1	0.030 9
WISH	0.234 0	0.375 4	0.498 5	0.035 4
DHIM	0.264 5	0.443 4	0.534 5	0.038 0
BM25	0.299 2	0.361 1	0.424 4	0.045 7
Doc2Vec	0.231 1	0.332 5	0.448 8	0.029 9
VSM	0.280 5	0.382 9	0.453 1	0.039 1
BERT	0.462 4	0.586 9	0.737 2	0.097 1
QuesNet	0.327 0	0.464 9	0.633 4	0.073 1
QuesCo	0.5639	0.6963	0.8222	0.1747
USH-SER	0.4875	0.5921	0.7694	0.1395

表3 HISTORY数据集上不同模型的R@K结果对比

Tab. 3 Result comparison of R@K among different models on HISTORY dataset

模型	R@100	R@200	R@400
SPH	0.010 8	0.018 5	0.029 4
STH	0.011 8	0.019 8	0.031 8
VDSH	0.018 3	0.029 8	0.046 3
NASH	0.017 9	0.029 0	0.045 1
NbrReg	0.019 2	0.031 1	0.049 3
PairRec	0.020 6	0.033 2	0.051 6
WISH DHIM	0.021 4 0.0271	0.034 4 0.0431	0.054 5 0.0690
USH-SER	0.0332	0.0535	0.0850

表4 HISTORY数据集上不同模型的Precision@100对比

Tab. 4 Comparison of Precision@100 among different models on HISTORY dataset

模型	不同哈希码长度下的 $P r e c i s i o n @ 100$
模型	8 b	16 b	32 b	64 b
SPH	0.132 2	0.161 1	0.173 1	0.189 9
STH	0.143 2	0.168 0	0.189 3	0.204 1
VDSH	0.258 8	0.274 1	0.294 3	0.328 4
NASH	0.257 1	0.264 3	0.286 4	0.301 9
NbrReg	0.278 5	0.289 1	0.308 0	0.342 1
PairRec	0.301 3	0.318 3	0.330 3	0.343 4
WISH DHIM	0.313 3 0.3634	0.322 2 0.3945	0.343 0 0.4351	0.362 1 0.4632
USH-SER	0.4523	0.4918	0.5323	0.5512

表4 HISTORY数据集上不同模型的Precision@100对比

Tab. 4 Comparison of Precision@100 among different models on HISTORY dataset

模型	不同哈希码长度下的 $P r e c i s i o n @ 100$
模型	8 b	16 b	32 b	64 b
SPH	0.132 2	0.161 1	0.173 1	0.189 9
STH	0.143 2	0.168 0	0.189 3	0.204 1
VDSH	0.258 8	0.274 1	0.294 3	0.328 4
NASH	0.257 1	0.264 3	0.286 4	0.301 9
NbrReg	0.278 5	0.289 1	0.308 0	0.342 1
PairRec	0.301 3	0.318 3	0.330 3	0.343 4
WISH DHIM	0.313 3 0.3634	0.322 2 0.3945	0.343 0 0.4351	0.362 1 0.4632
USH-SER	0.4523	0.4918	0.5323	0.5512

表5 消融实验的5个部分

Tab. 5 Five components of ablation study

模块	含义
$C 1$	加入最大化局部表征与全局表征的目标 $L m$
$C 2$	加入文本的自注意力提取模块
$C 3$	加入试题图片
$C 4$	加入时间感知的激活函数
$C 5$	加入最大化汉明空间利用率目标 $L b$ 和 $L I$

表5 消融实验的5个部分

Tab. 5 Five components of ablation study

模块	含义
$C 1$	加入最大化局部表征与全局表征的目标 $L m$
$C 2$	加入文本的自注意力提取模块
$C 3$	加入试题图片
$C 4$	加入时间感知的激活函数
$C 5$	加入最大化汉明空间利用率目标 $L b$ 和 $L I$

表6 不同数据集上消融实验结果对比

Tab. 6 Result comparison of ablation study on different datasets

模型	MATH	HOSTORY
USH-SER w/o $C 1$	0.725 3	0.080 1
USH-SER w/o $C 2$	0.729 9	0.081 4
USH-SER w/o $C 3$	0.747 2	0.0841
USH-SER w/o $C 4$	0.7534	0.083 2
USH-SER w/o $C 5$	0.742 1	0.082 4
USH-SER	0.7694	0.0850

表6 不同数据集上消融实验结果对比

Tab. 6 Result comparison of ablation study on different datasets

模型	MATH	HOSTORY
USH-SER w/o $C 1$	0.725 3	0.080 1
USH-SER w/o $C 2$	0.729 9	0.081 4
USH-SER w/o $C 3$	0.747 2	0.0841
USH-SER w/o $C 4$	0.7534	0.083 2
USH-SER w/o $C 5$	0.742 1	0.082 4
USH-SER	0.7694	0.0850

图4 HISTORY数据集和MATH数据集图片示例

Fig. 4 Examples of images in HISTORY and Math datasets

图5 在HISTORY数据集上查询100个相似题的平均时间开销

Fig. 5 Average time cost of retreiving 100 similar exercises on HISTORY dataset

参考文献 50

1	THAI-NGHE N， SCHMIDT-THIEME L. Multi-relational factorization models for student modeling in intelligent tutoring systems ［C］// Proceedings of the 2015 7th International Conference on Knowledge and Systems Engineering. Piscataway： IEEE， 2015： 61-66. 10.1109/kse.2015.9
2	WU R， LIU Q， LIU Y， et al. Cognitive modelling for predicting examinee performance ［C］// Proceedings of the 24th International Joint Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2015： 1017-1024.
3	JIANG L， WANG P， CHENG K， et al. EduHawkes： A neural hawkes process approach for online study behavior modeling ［C］// Proceedings of the 2021 SIAM International Conference on Data Mining. Philadelphia： SIAM， 2021： 567-575. 10.1137/1.9781611976700.64
4	HAGE H， AÏMEUR E. Exam question recommender system ［C］// Proceedings of the 2005 Conference on Artificial Intelligence in Education： Supporting Learning through Intelligent and Socially Informed Technology. ［S.l.］： IOS Press， 2005： 249-257.
5	JIANG L， WANG Y， XIE S， et al. Which courses to choose？ Recommending courses to groups of students in online tutoring platforms ［J］. Applied Intelligence， 2023， 53： 11727-11736. 10.1007/s10489-022-03993-4
6	LIU J， HUANG Z， ZHAI C， et al. Learning by applying： A general framework for mathematical reasoning via enhancing explicit knowledge learning ［C］// Proceedings of the 35th Conference on Innovative Applications of Artificial Intelligence. Palo Alto： AAAI Press， 2023： 4497-4506. 10.1609/aaai.v37i4.25571
7	LIN X， HUANG Z， ZHAO H， et al. Learning relation-enhanced hierarchical solver for math word problems ［J］. IEEE Transactions on Neural Networks and Learning Systems （Early Access）， 2023： 1-15. 10.1109/tnnls.2023.3272114
8	PELÁNEK R. Measuring similarity of educational items： An overview ［J］. IEEE Transactions on Learning Technologies， 2020， 13（2）： 354-366. 10.1109/tlt.2019.2896086
9	LIU Q， HUANG Z， HUANG Z， et al. Finding similar exercises in online education systems ［C］// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Piscataway： IEEE， 2018： 1821-1830. 10.1145/3219819.3219960
10	HAGE H， AIMERU E. ICE： A system for identification of conflicts in exams ［C］// Proceedings of the 2006 IEEE International Conference on Computer Systems and Applications. Piscataway： IEEE， 2006： 980-987. 10.1109/aiccsa.2006.205207
11	WILLIAMS A E， AGUILAR-ROCA N M， TSAI M， et al. Assessment of learning gains associated with independent exam analysis in introductory biology ［J］. CBE Life Sciences Education， 2011， 10： 346-356. 10.1187/cbe.11-03-0025
12	YIN Y， LIU Q， HUANG Z， et al. QuesNet： A unified representation for heterogeneous test questions ［C］// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York： ACM， 2019： 1328-1336. 10.1145/3292500.3330900
13	NING Y， HUANG Z， LIN X， et al. Towards a holistic understanding of mathematical questions with contrastive pre-training ［C］// Proceedings of the 35th Conference on Innovative Applications of Artificial Intelligence. Palo Alto： AAAI Press， 2023： 13409-13418. 10.1609/aaai.v37i11.26573
14	LUO X， WANG H， WU D， et al. A survey on deep hashing methods ［J］. ACM Transactions on Knowledge Discovery from Data， 2023， 17（1）： 1-50. 10.1145/3532624
15	LIN K， LU J， CHEN C-S， et al. Learning compact binary descriptors with unsupervised deep neural networks ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 1183-1192. 10.1109/cvpr.2016.133
16	CHENG M， JING L， NG M K. Robust unsupervised cross-modal hashing for multimedia retrieval ［J］. ACM Transactions on Information Systems， 2020， 38（3）： 1-25. 10.1145/3389547
17	HE L， HUANG Z， CHEN E， et al. An efficient and robust semantic hashing framework for similar text search ［J］. ACM Transactions on Information Systems， 2023， 41（4）： 1-31.
18	王永欣，田洁茹，陈振铎，等.基于标记增强的离散跨模态哈希方法［J］.软件学报， 2023， 34（7）： 3438-3450.
	WANG Y X， TIAN J R， CHEN Z D， et al. Label enhancement based discrete cross-modal hashing method ［J］. Journal of Software， 2023， 34（7）： 3438-3450.
19	NOROUZI M， PUNJANI A， FLEET D J. Fast search in Hamming space with multi-index hashing ［C］// Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2012： 3108-3115. 10.1109/cvpr.2012.6248043
20	CHAIDAROON S， FANG Y. Variational deep semantic hashing for text documents ［C］// Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2017： 75-84. 10.1145/3077136.3080816
21	SHEN D， SU Q， CHANFUWA P， et al. NASH： toward end-to-end neural architecture for generative semantic hashing ［C］// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2018： 2041-2050. 10.18653/v1/p18-1190
22	YU J， LI D， HOU J， et al. Similarity measure of test questions based on ontology and VSM ［J］. The Open Automation and Control Systems Journal， 2014， 6（1）： 262-276. 10.2174/1874444301406010262
23	PENG S， YUAN K， GAO L， et al. MathBERT： A pre-trained model for mathematical formula understanding ［EB/OL］. （2021-05-02）［2023-10-14］. .
24	HUANG Z， LIN X， WANG H， et al. DisenQNet： Disentangled representation learning for educational questions ［C］// Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. New York： ACM， 2021： 696-704. 10.1145/3447548.3467347
25	李长升，闵齐星，成雨蓉，等.捕获局部语义结构和实例辨别的无监督哈希［J］.软件学报， 2021， 32（3）： 742-752.
	LI C S， MIN Q X， CHENG Y R， et al. Local semantic structure captured and instance discriminated by unsupervised hashing ［J］. Journal of Software， 2021， 32（3）： 742-752.
26	王亚芳，刘东升，侯敏.基于图像相似度检测代码克隆［J］.计算机应用， 2019， 39（7）： 2074-2080.
	WANG Y F， LIU D S， HOU M. Clone code detection based on image similarity ［J］. Journal of Computer Applications， 2019， 39（7）： 2074-2080.
27	廖列法，李志明，张赛赛.基于深度残差网络的迭代量化哈希图像检索方法［J］.计算机应用， 2022， 42（9）： 2845-2852.
	LIAO L F， LI Z M， ZHANG S S. Image retrieval method based on deep residual network and iterative quantization hashing ［J］. Journal of Computer Applications， 2022， 42（9）： 2845-2852.
28	谭钰，王小琴，蓝如师，等.基于判别性矩阵分解的多标签跨模态哈希检索［J］.计算机应用， 2023， 43（5）： 1349-1354.
	TAN Y， WANG X Q， LAN R S， et al. Multi-label cross-modal hashing retrieval based on discriminative matrix factorization ［J］. Journal of Computer Applications， 2023， 43（5）： 1349-1354.
29	DATAR M， IMMORLICA N， INDVK P， et al. Locality-sensitive hashing scheme based on p-stable distributions ［C］// Proceedings of the 20th Annual Symposium on Computational Geometry. New York： ACM， 2004： 253-262. 10.1145/997817.997857
30	WEISS Y， TORRALBA A， FERGUS R. Spectral hashing ［C］// Proceedings of the 21st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2008： 1753-1760.
31	ZHANG D， WANG J， CAI D， et al. Self-taught hashing for fast similarity search ［C］// Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2010： 18-25. 10.1145/1835449.1835455
32	J-P HEO， LEE Y， HE J， et al. Spherical hashing ［C］// Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2012： 2957-2964. 10.1109/cvpr.2012.6248024
33	CHAIDAROON S， EBESU T， FANG Y. Deep semantic text hashing with weak supervision ［C］// Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. New York： ACM， 2018： 1109-1112. 10.1145/3209978.3210090
34	YIN P， LYU J， ZHANG S， et al. Understanding straight-through estimator in training activation quantized neural nets ［EB/OL］. ［2023-09-11］. . 10.1137/18m1166134
35	HANSEN C， HANSEN C， SIMONSEN J G， et al. Unsupervised neural generative semantic hashing ［C］// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2019： 735-744. 10.1145/3331184.3331255
36	HANSEN C， HANSEN C， SIMONSEN J G， et al. Unsupervised semantic hashing with pairwise reconstruction ［C］// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2020： 2009-2012. 10.1145/3397271.3401220
37	CHAIDAROON S， PARK D H， CHANG Y， et al. node2hash： Graph aware deep semantic text hashing ［J］. Information Processing & Management， 2020， 57（6）： No.102143. 10.1016/j.ipm.2019.102143
38	YE F， MANOTUMRUKSA J， YILMAZ E. Unsupervised few-bits semantic hashing with implicit topics modeling ［C］// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2020： 2566-2575. 10.18653/v1/2020.findings-emnlp.233
39	OU Z， SU Q， YU J， et al. Refining BERT embeddings for document Hashing via mutual information maximization ［EB/OL］. ［2023-06-13］. . 10.18653/v1/2021.findings-emnlp.203
40	CHEN T， KORNBLITH S， NOROUZI M， et al. A simple framework for contrastive learning of visual representations ［C］// Proceedings of the 37th International Conference on Machine Learning. New York： JMLR.org， 2020： 1597-1607.
41	HJELM R D， FEDOROV A， LAVOIE-MARCHILDON S， et al. Learning deep representations by mutual information estimation and maximization ［EB/OL］. ［2022-12-08］. .
42	KIM Y. Convolutional neural networks for sentence classification ［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2014： 1746-1751. 10.3115/v1/d14-1181
43	NOWOZIN S， CSEKE B， TOMIOKA R. f-GAN： Training generative neural samplers using variational divergence minimization ［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2016： 271-279.
44	QIN H， GONG R， LIU X， et al. Forward and backward information retention for accurate binary neural networks［C］// Proceedings of the IEEE/CVF 2020 Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 2250-2259. 10.1109/cvpr42600.2020.00232
45	DOAN K D， REDDY C K. Efficient implicit unsupervised text hashing using adversarial autoencoder ［C］// Proceedings of the 2020 Web Conference. New York： ACM， 2020： 684-694. 10.1145/3366423.3380150
46	HE J， CHANG S-F， RADHAKRISHNAN R， et al. Compact hashing with joint optimization of search accuracy and time ［C］// Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2011： 753-760. 10.1109/cvpr.2011.5995518
47	ROBERTSON S， ZARAGOZA H， TAYLOR M. Simple BM25 extension to multiple weighted fields ［C］// Proceedings of the 13th ACM International Conference on Information and Knowledge Management. New York： ACM， 2004： 42-49. 10.1145/1031171.1031181
48	LE Q， MIKOLOV T. Distributed representations of sentences and documents ［C］// Proceedings of the 31st International Conference on Machine Learning. New York： JMLR.org， 2014： 1188-1196.
49	TSINAKOS A， KAZANIDIS I. Identification of conflicting questions in the PARES system ［J］. The International Review of Research in Open and Distributed Learning， 2012， 13（3）： 297-313. 10.19173/irrodl.v13i3.1176
50	DEVLIN J， CHANG M-W， LEE K， et al. BERT： Pre-training of deep bidirectional transformers for language understanding ［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg： ACL， 2019： 4171-4186. 10.18653/v1/n18-2

[1]	徐则林, 杨敏, 陈勐. 融合空间和文本信息的兴趣点类别表征模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2456-2461.
[2]	雷景生, 剌凯俊, 杨胜英, 吴怡. 基于上下文语义增强的实体关系联合抽取[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1438-1444.
[3]	高榕, 沈加伟, 邵雄凯, 吴歆韵. 基于Fastformer和自监督对比学习的实例分割算法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1062-1070.
[4]	党伟超, 程炳阳, 高改梅, 刘春霞. 基于对比超图转换器的会话推荐[J]. 《计算机应用》唯一官方网站, 2023, 43(12): 3683-3688.
[5]	张安勤, 王小慧. 基于时序异常检测的动力电池安全预警[J]. 《计算机应用》唯一官方网站, 2023, 43(12): 3799-3805.
[6]	吴明月, 周栋, 赵文玉, 屈薇. 基于流形学习的句向量优化[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3062-3069.
[7]	韩滕跃, 牛少彰, 张文. 基于对比学习的多模态序列推荐算法[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1683-1688.
[8]	孙鹏翔, 毕利, 王俊杰. 基于改进深度残差网络的光伏板积灰程度识别[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3733-3739.
[9]	代雨柔, 杨庆, 张凤荔, 周帆. 基于自监督学习的社交网络用户轨迹预测模型[J]. 计算机应用, 2021, 41(9): 2545-2551.
[10]	成科扬, 孟春运, 王文杉, 师文喜, 詹永照. 解耦表征学习研究进展[J]. 《计算机应用》唯一官方网站, 2021, 41(12): 3409-3418.
[11]	刘成斌, 郑巍, 樊鑫, 杨丰玉. 基于网络表征学习的混合缺陷预测模型[J]. 计算机应用, 2019, 39(12): 3633-3638.

基于无监督语义哈希的高效相似题检索模型

Efficient similar exercise retrieval model based on unsupervised semantic hashing

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 50

相关文章 11

编辑推荐

Metrics