Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (9): 2838-2847.DOI: 10.11772/j.issn.1001-9081.2024081178

• Data science and technology • Previous Articles    

Multi-source heterogeneous data analysis method combining deep learning and tensor decomposition

Hongjun ZHANG1,2, Gaojun PAN3(), Hao YE4,5, Yubin LU5, Yiheng MIAO6   

  1. 1.China Communications Services Corporation Limited,Beijing 100073,China
    2.Institute of High Performance Computing and Big Data Processing,Nanjing University of Posts and Telecommunications,Nanjing Jiangsu 210003,China
    3.Zhejiang Communication Industry Service Company Limited,Hangzhou Zhejiang 310052,China
    4.Zhongbo Information Technology Research Institute Company Limited,Nanjing Jiangsu 210006,China
    5.Postal Industry Technology Research and Development Center (Internet of Things Technology),State Post Bureau,Nanjing University of Posts and Telecommunications,Nanjing Jiangsu 210003,China
    6.School of Communications and Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing Jiangsu 210003,China
  • Received:2024-08-20 Revised:2024-11-16 Accepted:2024-12-04 Online:2024-12-17 Published:2025-09-10
  • Contact: Gaojun PAN
  • About author:ZHANG Hongjun, born in 1985, Ph. D., senior engineer. His research interests include high-performance computing, key technologies of big data and new quality productivity.
    YE Hao, born in 1999, M. S., assistant engineer. His research interests include smart logistics, intelligent manufacturing, penetration testing, data mining.
    LU Yubin, born in 2001, M. S. candidate. Her research interests include logistics management.
    MIAO Yiheng, born in 2000, M. S. candidate. His research interests include machine learning, spectrum sensing, e-commerce.
  • Supported by:
    National Natural Science Foundation of China(61972208)

结合深度学习和张量分解的多源异构数据分析方法

张宏俊1,2, 潘高军3(), 叶昊4,5, 陆玉彬5, 缪宜恒6   

  1. 1.中国通信服务股份有限公司,北京 100073
    2.南京邮电大学 高性能计算与大数据处理研究所,南京 210003
    3.浙江省通信产业服务有限公司,杭州 310052
    4.中博信息技术研究院有限公司,南京 210006
    5.南京邮电大学 国家邮政局邮政行业技术研发中心(物联网技术),南京 210003
    6.南京邮电大学 通信与信息工程学院,南京 210003
  • 通讯作者: 潘高军
  • 作者简介:张宏俊(1985—),男,安徽马鞍山人,高级工程师,博士,主要研究方向:高性能计算、大数据关键技术及新质生产力
    叶昊(1999—),男,江苏无锡人,助理工程师,硕士,CCF会员,主要研究方向:智慧物流、智能制造、渗透测试、数据挖掘
    陆玉彬(2001—),女,江苏扬州人,硕士研究生,主要研究方向:物流管理
    缪宜恒(2000—),男,江苏淮安人,硕士研究生,主要研究方向:机器学习、频谱感知、电子商务。
  • 基金资助:
    国家自然科学基金资助项目(61972208);国家自然科学基金资助项目(62102194);国家自然科学基金资助项目(62102196)

Abstract:

In the dynamic field of consumer electronics, understanding user behavior is crucial for product innovation and raising user satisfaction. Therefore, a groundbreaking multi-clustering method was proposed that combines deep learning with tensor decomposition to address challenges of data analysis and mining. Firstly, high-level features were extracted from complex heterogeneous datasets, such as for datasets of various sensors and user interactions in modern devices, deep neural networks were used to encapsulate diverse features of data sources. Secondly, tensor decomposition techniques were applied to feature extraction and clustering analysis, thereby treating each data source as a different modality within a data tensor to reveal latent structure and patterns of the data source. Finally, experiments were carried out on a dataset obtained in collaboration with an e-commerce platform, covering tens of thousands of customers. Empirical results demonstrate that the proposed tensor decomposition algorithm integrated with Convolutional Neural Network (CNN) performs well on consumer electronics-related datasets, with all accuracies over 0.7 and outstanding scores in key metrics such as purity, Adjusted Rand Index (ARI), and Normalized Mutual Information (NMI), confirming effectiveness of the proposed method in capturing intrinsic structure and similarity of data; compared with existing methods such as Dynamic Multi-Clustering Routine (DMCR) method, Deep Multi-Modal Clustering (DMMC) method, and FAST-CNN, the proposed method shows significant advantages on multiple evaluation metrics, verifying its superiority over the comparative methods in terms of accuracy and stability, and its advantages in uncovering underlying data principles and interrelationships between heterogeneous data.

Key words: Convolutional Neural Network (CNN), tensor decomposition, multi-clustering, deep learning, consumer electronics

摘要:

在消费电子的动态领域,了解用户行为对于产品创新和提高用户满意度至关重要。因此,提出一种突破性的多聚类方法以结合深度学习与张量分解,从而应对数据分析和挖掘的挑战。首先,从复杂的异构数据集中提取高级特征,例如对现代设备的各种传感器和用户交互的数据集,采用深度神经网络封装数据源的各种特征;其次,把张量分解技术应用于特征提取和聚类分析,以将每个数据源视为数据张量中的不同模态,从而揭示它们潜在的结构和模式;最后,采用与某电商平台合作得到的涵盖数万消费者的多模态购物数据的数据集进行实验。实验结果表明,所提结合卷积神经网络(CNN)的张量分解算法在消费电子相关数据集上表现出色,准确率均超过0.7,同时在纯度、调整兰德指数(ARI)和归一化互信息(NMI)等关键指标上也表现突出,验证了所提方法在捕捉数据内在结构和相似性方面的有效性;与动态的多聚类(DMCR)方法、深度多模态聚类(DMMC)方法以及FAST-CNN等现有方法相比,所提方法在多个评价指标上均显示出显著优势,不仅验证了它在准确性和稳定性方面优于对比方法,而且展现了它在揭示数据底层原理和异构数据之间相互关系方面的优势。

关键词: 卷积神经网络, 张量分解, 多聚类, 深度学习, 消费电子产品

CLC Number: