《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (12): 3813-3819.DOI: 10.11772/j.issn.1001-9081.2024121815

• 人工智能 • 上一篇    下一篇

基于特征交互与表示增强的语音手机来源开集识别方法

岳峰1,2,3, 彭洋1,4, 苏兆品1,2,4, 张国富1,3,4, 廉晨思3,5, 杨波3,5, 方振1,4   

  1. 1.合肥工业大学 计算机与信息学院,合肥 230601
    2.智能互联系统安徽省实验室(合肥工业大学),合肥 230009
    3.音视频智能防识联合实验室(合肥工业大学),合肥 230009
    4.工业安全与应急技术安徽省重点实验室(合肥工业大学),合肥 230009
    5.安徽省公安厅 物证鉴定管理处,合肥 230009
  • 收稿日期:2024-12-26 修回日期:2025-01-07 接受日期:2025-01-10 发布日期:2025-01-15 出版日期:2025-12-10
  • 通讯作者: 苏兆品
  • 作者简介:岳峰(1981—),男,安徽合肥人,副研究员,博士,主要研究方向:群智能、软件工程
    彭洋(1999—),男,河北邢台人,硕士研究生,主要研究方向:声纹识别
    苏兆品(1983—),女,山东菏泽人,副教授,博士,CCF会员,主要研究方向:复杂智能系统、多媒体安全
    张国富(1979—),男,安徽合肥人,教授,博士,CCF会员,主要研究方向:语音安全
    廉晨思(1984—),女,安徽合肥人,高级工程师,硕士,主要研究方向:声纹和合成语音鉴定
    杨波(1985—),男,安徽合肥人,高级工程师,硕士,主要研究方向:声纹和合成语音鉴定
    方振(2000—),男,安徽亳州人,硕士研究生,主要研究方向:声纹识别。
  • 基金资助:
    教育部人文社会科学研究规划基金资助项目(24YJA870011);安徽省重点研究与开发计划项目(202104d07020001);安徽省自然科学基金资助项目(2208085MF166)

Open-set source cell-phone identification method based on feature interaction and representation enhancement

Feng YUE1,2,3, Yang PENG1,4, Zhaopin SU1,2,4, Guofu ZHANG1,3,4, Chensi LIAN3,5, Bo YANG3,5, Zhen FANG1,4   

  1. 1.School of Computer and Information Technology,Hefei University of Technology,Hefei Anhui 230601,China
    2.Intelligent Interconnected Systems Provincial Laboratory of Anhui(Hefei University of Technology),Hefei Anhui 230009 China
    3.Joint Laboratory of Intelligent Prevention and Recognition of Audio and Video (Hefei University of Technology),Hefei Anhui 230009 China
    4.Anhui Provincial Key Laboratory of Industrial Safety and Emergency Technology;(Hefei University of Technology),Hefei Anhui 230009 China
    5.Department of Physical Evidence Identification,Anhui Public Security Department,Hefei Anhui 230009 China
  • Received:2024-12-26 Revised:2025-01-07 Accepted:2025-01-10 Online:2025-01-15 Published:2025-12-10
  • Contact: Zhaopin SU
  • About author:YUE Feng, born in 1981, Ph. D., associate research fellow. His research interests include swarm intelligence, software engineering.
    PENG Yang, born in 1999, M. S. candidate. His research interests include voice print recognition.
    SU Zhaopin, born in 1983, Ph. D., associate professor. Her research interests include complex intelligent systems, multimedia security.
    ZHANG Guofu, born in 1979, Ph. D., professor. His research interests include speech security.
    LIAN Chensi, born in 1984, M. S., senior engineer. Her research interests include voice print, synthesis speech identification.
    YANG Bo, born in 1985, M. S., senior engineer. His research interests include voice print, synthesis speech identification.
    FANG Zhen, born in 2000, M. S. candidate. His research interests include voice print recognition.
  • Supported by:
    Humanities and Social Sciences Research Planning Fund of the Ministry of Education(24YJA870011);Key Research and Development Program of Anhui Province(202104d07020001);Natural Science Foundation of Anhui Province(2208085MF166)

摘要:

基于手机语音的多媒体取证任务一直都是研究热点,然而已有语音手机识别任务均局限于闭集模式,即训练集与测试集共享相同的类别集合,无法保证未知类别手机的识别精度,所以现有方法无法直接应用于未知手机。为此,提出一种基于特征交互与表示增强的语音手机来源开集识别方法(FireOSCI)。首先,设计基于多头注意力模块Fastformer的全局特征提取模块GlobalBlock,以更好地捕捉整个语音样本的全局信息,获得丰富的设备特征信息;其次,设计基于SE-Res2Block(Squeeze-Excitation Res2Block)的局部特征提取模块LocalBlocks,专注于增强跟手机信息相关的特征,抑制与手机来源识别无关的特征;随后,设计基于注意力机制的特征融合机制,将全局特征和多层局部特征深度融合;最后,设计基于注意力池化的手机来源确认网络,以提高开集模式下的识别准确率。在13个不同手机品牌、86种不同型号的手机语音数据集上的对比实验结果表明,所提方法可以实现未知类别手机的识别,为语音手机来源的开集识别提供可参考的技术方案。

关键词: 语音手机来源, 开集识别, 特征交互, 表示增强, 深度融合

Abstract:

Multimedia forensics tasks based on cell-phone speech has always been a key research hotspot. However, the existing speech-based cell-phone identification tasks are all confined to the closed-set mode, which means that the training set and the test set share the same category set, which cannot guarantee the recognition accuracy for cell-phones of unknown categories, leading to the difficulty in applications of the existing methods to the unknown cell-phones. Therefore, an Open-set Source Cell-phone Identification method based on Feature interaction and representation enhancement (FireOSCI) was proposed. Firstly, a global information extraction block named GlobalBlock was designed on the basis of the multi-head attention block Fastformer for better capturing the global information from the whole speech sample and obtaining rich device feature information. Secondly, a local feature extraction block named LocalBlocks was presented on the basis of SE-Res2Block (Squeeze-Excitation Res2Block) to focus on enhancing cell-phone information related features and suppressing the features that are not related to the source cell-phone identification. Thirdly, an attention mechanism based feature fusion mechanism was designed to fuse global features with multi-layer local features deeply. Finally, a source cell?phone confirmation network was designed on the basis of attention pooling to improve the recognition accuracy in open-set mode. Comparison experimental results on cell-phone speech dataset with 13 different cell-phone brands and 86 different cell-phone models show that the proposed method can achieve identification of unknown categories of cell-phones, and provide a referable technical solution for the open-set recognition of speech-based source cell-phones.

Key words: source cell-phone, open-set recognition, feature interaction, representation enhancement, deep fusion

中图分类号: