Journal of Computer Applications

    Next Articles

Ancient Chinese character image retrieval algorithm based on NetVLAD feature encoding

CHEN Huihui1, SUN Hongtao1, GUAN Boliang1, HENG Zhongqin2   

  1. 1. School of Computer Science and Artificial Intelligence, Foshan University 2. School of Marxism, Foshan University
  • Received:2025-03-26 Revised:2025-06-07 Online:2025-07-01 Published:2025-07-01
  • About author:CHEN Huihui, born in 1978, Ph. D., professor. Her research interests include multi-modal data fusion, intelligent perception. SUN Hongtao, born in 2002, M. S. candidate. His research interests include multi-modal data fusion, image recognition. GUAN Boliang, born in 1991, Ph. D., lecturer. His research interests include computer vision. HENG Zhongqin, born in 1969, Ph. D., research librarian. His research interests include digital humanities for antiquities.
  • Supported by:
    National Natural Science Foundation of China (61972092)

基于NetVLAD特征编码的古籍汉字图像检索算法

陈荟慧1,孙洪韬1,关柏良1,衡中青2   

  1. 1.佛山大学 计算机与人工智能学院 2.佛山大学 马克思主义学院
  • 通讯作者: 陈荟慧
  • 作者简介:陈荟慧(1978—),女,河南洛阳人,教授,博士,CCF高级会员,主要研究方向:多模态数据融合、智能感知;孙洪韬(2002—),男,广东南雄人,硕士研究生,主要研究方向:多模态数据融合、图像识别;关柏良(1991—),男,广东佛山人,讲师,博士,主要研究方向:计算机视觉;衡中青(1969—),男,江苏宝应人,研究馆员,博士,主要研究方向:古籍数字人文。
  • 基金资助:
    国家自然科学基金资助项目(61972092)

Abstract: The retrieval of ancient characters is a part of the digitization work of ancient books. Ancient Chinese books often exhibit inconsistent printing glyphs and a wide variety of font types. Using visual features for Chinese character retrieval is an effective solution. A Chinese Character Feature Extraction and Encoding Network (CFEENet) was proposed. Firstly, a convolutional neural network was used to extract the visual features of Chinese character images. Secondly, a trainable generalized vector aggregation layer, namely NetVLAD, was employed to aggregate and encode these visual features. Finally, cosine similarity was used to calculate the similarity of encodings for ancient Chinese character retrieval from ancient books. A visual analysis of the CFEENet encoding is carried out using t-distributed stochastic neighbor embedding, and it is found that the clusters formed by the CFEENet encoding have high density, small overlap between clusters, and high encoding resolution. The CFEENet method is evaluated across multiple ancient book datasets. The experimental result shows that CFEENet outperforms comparative methods such as Ancient Chinese Character Image Network (ACCINet) in terms of mean Average Precision (mAP) and F1 score in most scenarios, while demonstrating a good balance between retrieval quality and efficiency. The applicability and effectiveness of the CFEENet method is proved to solve the task of retrieving Chinese character in ancient book images.

Key words: ancient Chinese character retrieval, convolutional neural network, NetVLAD, visual feature, feature encoding

摘要: 古籍汉字检索是当前古籍数字化工作的一部分。古籍版面中通常存在汉字印刷字模不一致和字体种类多的现象,使用视觉特征实现汉字检索是有效的解决方案。为此,提出汉字特征提取编码网络(CFEENet)。首先,使用卷积神经网络提取古籍汉字图像的视觉特征;然后,使用可训练的通用向量聚合描述层NetVLAD聚合视觉特征并编码;最后,使用余弦相似度计算编码相似性以实现古籍版面汉字检索。使用t分布邻域嵌入对CFEENet编码降维后可视化分析,发现编码形成的簇密度高、簇间重叠小、编码分辨率高。在多个古籍数据集上测试CFEENet,实验结果表明,CFEENet在大多数场景下的平均精度均值(mAP)和F1分数优于古籍汉字图像特征提取网络(ACCINet)等对比方法,且在检索质量与效率之间实现良好平衡,验证了CFEENet在古籍汉字检索任务中的适用性与有效性。

关键词: 古籍汉字检索, 卷积神经网络, NetVLAD, 视觉特征, 特征编码

CLC Number: