《计算机应用》唯一官方网站 ›› 2026, Vol. 46 ›› Issue (3): 750-757.DOI: 10.11772/j.issn.1001-9081.2025030320

• 人工智能 • 上一篇    下一篇

基于NetVLAD特征编码的古籍汉字图像检索算法

陈荟慧1(), 孙洪韬1, 关柏良1, 衡中青2   

  1. 1.佛山大学 计算机与人工智能学院,广东 佛山 528225
    2.佛山大学 马克思主义学院,广东 佛山 528225
  • 收稿日期:2025-03-28 修回日期:2025-06-07 接受日期:2025-06-10 发布日期:2025-07-01 出版日期:2026-03-10
  • 通讯作者: 陈荟慧
  • 作者简介:孙洪韬(2002—),男,广东南雄人,硕士研究生,主要研究方向:多模态数据融合、图像识别
    关柏良(1991—),男,广东佛山人,讲师,博士,主要研究方向:计算机视觉
    衡中青(1969—),男,江苏宝应人,研究馆员,博士,主要研究方向:古籍里的数字人文。
  • 基金资助:
    国家自然科学基金资助项目(61972092)

Chinese character image retrieval algorithm in ancient books based on NetVLAD feature encoding

Huihui CHEN1(), Hongtao SUN1, Boliang GUAN1, Zhongqing HENG2   

  1. 1.School of Computer Science and Artificial Intelligence,Foshan University,Foshan Guangdong 528225,China
    2.School of Marxism,Foshan University,Foshan Guangdong 528225,China
  • Received:2025-03-28 Revised:2025-06-07 Accepted:2025-06-10 Online:2025-07-01 Published:2026-03-10
  • Contact: Huihui CHEN
  • About author:SUN Hongtao, born in 2002, M. S. candidate. His research interests include multi-modal data fusion, image recognition.
    GUAN Boliang, born in 1991, Ph. D., lecturer. His research interests include computer vision.
    HENG Zhongqing, born in 1969, Ph. D., research librarian. His research interests include digital humanities in ancient books.
  • Supported by:
    National Natural Science Foundation of China(61972092)

摘要:

古籍汉字检索是当前古籍数字化工作的一部分。古籍中通常存在汉字印刷字模不一致和字体种类多的现象,使用视觉特征实现汉字检索是有效的解决方案。因此,提出汉字特征提取编码网络(CFEENet)。首先,使用卷积神经网络(CNN)提取古籍汉字图像的视觉特征;其次,使用可训练的通用向量聚合层NetVLAD聚合并编码视觉特征;最后,使用余弦相似度计算编码相似性以实现古籍汉字检索。此外,使用t-分布随机邻域嵌入(t-SNE)对CFEENet编码降维后进行可视化分析,发现CFEENet编码形成的簇密度高、簇间重叠小且编码分辨率高。在多个古籍数据集上测试CFEENet的实验结果表明,CFEENet在大多数场景下的平均精度均值(mAP)和F1分数优于古籍汉字图像特征提取网络(ACCINet)等对比方法,且在检索质量与效率之间实现了良好平衡,验证了CFEENet在古籍汉字检索任务中的适用性与有效性。

关键词: 古籍汉字检索, 卷积神经网络, NetVLAD, 视觉特征, 特征编码

Abstract:

Retrieval of ancient characters is a part of current digitization work of ancient books. Ancient Chinese books often exhibit inconsistent printing glyphs and a wide variety of font types, and using visual features for Chinese character retrieval is an effective solution. Therefore, a Chinese Character Feature Extraction and Encoding Network (CFEENet) was proposed. Firstly, a Convolutional Neural Network (CNN) was used to extract the visual features of Chinese character images in ancient books. Secondly, a trainable generalized vector aggregation layer, namely NetVLAD, was employed to aggregate and encode the visual features. Finally, the cosine similarity was used to calculate the code similarity to realize Chinese character retrieval in ancient books. Besides, a visual analysis of CFEENet encodes was carried out using t-distributed Stochastic Neighbor Embedding (t-SNE) after dimension reduction, and it was found that the clusters formed by CFEENet encoding had high density, small overlap between clusters, and high encoding resolution. CFEENet was tested on multiple ancient book datasets. Experimental results show that CFEENet outperforms comparison methods such as Ancient Chinese Character Image Network (ACCINet) in terms of mean Average Precision (mAP) and F1 score in most scenarios, while achieves a good balance between retrieval quality and efficiency, verifying the applicability and effectiveness of CFEENet in tasks of retrieving Chinese character in ancient books.

Key words: Chinese character retrieval in ancient books, Convolutional Neural Network (CNN), NetVLAD, visual feature, feature encoding

中图分类号: