计算机应用 ›› 2012, Vol. 32 ›› Issue (10): 2895-2898.DOI: 10.3724/SP.J.1087.2012.02895

• 人工智能 • 上一篇    下一篇

基于张量神经网络的音频多语义分类方法

邢玲,贺梅,马强,朱敏   

  1. 西南科技大学 信息工程学院, 四川 绵阳 621010
  • 收稿日期:2012-04-13 修回日期:2012-05-23 发布日期:2012-10-23 出版日期:2012-10-01
  • 通讯作者: 邢玲
  • 作者简介:邢玲(1978-),女,四川成都人,副教授,博士,主要研究方向:网络信息语义理解;贺梅(1986-),女,重庆人,硕士研究生,主要研究方向:网络音频内容管理;马强(1982-),男,四川绵阳人,讲师,博士研究生,主要研究方向:信息主动服务;朱敏(1986-),女,湖北恩施人,硕士研究生,主要研究方向:网络信息智能处理。
  • 基金资助:
    国家自然科学基金资助项目;国家自然科学基金资助项目;国家自然科学基金资助项目

Multi-semantic audio classification method based on tensor neural network

XING Ling,HE Mei,MA Qiang,ZHU Min   

  1. School of Information Engineering, Southwest University of Science and Technology, Mianyang Sichuan 621010, China
  • Received:2012-04-13 Revised:2012-05-23 Online:2012-10-23 Published:2012-10-01
  • Contact: XING Ling

摘要: 音频特征向量已广泛应用于音频分类的研究,该表示形式虽能有效体现音频的固有特性,但无法表示音频信息多语义特性及各语义间的相关性。提出了基于张量统一内容定位(TUCL)的音频语义表征方式,将音频语义描述表示为三阶张量,并构建多语义张量空间。在此空间中,张量语义离散度(TSD)能有效聚集具有相同语义的音频资源,通过计算各音频资源的TSD来完成对音频资源的分类,并构建了RBF张量神经网络(RBFTNN)来自适应学习分类模型。实验结果表明,在多语义分类的情况下,TSD算法的分类性能明显优于当前典型的高斯混合模型(GMM)算法;通过与支持向量机(SVM)学习模型相比可知,基于TSD的RBFTNN模型分类学习的准确率明显优于基于TSD的SVM模型。

关键词: 语义离散度, 多语义分类, 语义表征, 张量语义空间, 神经网络

Abstract: Researches on the audio classification have involved various types of vector features. However, multi-semantics of audio information not only have their own properties, but also have some correlations among them. Whereas, to a certain extent, the simple vector representation cannot represent the multi-semantics and ignore their relations. Tensor Uniform Content Locator (TUCL) was brought forward to express the semantic information of audio, and a three-order Tensor Semantic Space (TSS) was constructed according to the semantic tensor. Tensor Semantic Dispersion (TSD) can aggregate some audio resources with the same semantics, and at the same time, the automatic audio classification can be accomplished by calculating their TSD. And Radical Basis Function Tensor Neural Network (RBFTNN) was constructed and used to train intelligent learning model. For the problem of multi-semantic audio classification, the experimental results show that our method can significantly improve the classification precision in comparison with the typical method of Gaussian Mixture Model (GMM), and the classification precision of RBFTNN model is obviously better than that of Support Vector Machine (SVM).

Key words: semantic dispersion, multi-semantic classification, semantic description, Tensor Semantic Space (TSS), neural network

中图分类号: