Loading...

Table of Content

    10 April 2025, Volume 45 Issue 4
    Artificial intelligence
    Review on bimodal emotion recognition based on speech and text
    Lingmin HAN, Xianhong CHEN, Wenmeng XIONG
    2025, 45(4):  1025-1034.  DOI: 10.11772/j.issn.1001-9081.2024030319
    Asbtract ( )   HTML ( )   PDF (1625KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Emotion recognition is a technology that allows computers to recognize and understand human emotions. It plays an important role in many fields and is an important development direction in the field of artificial intelligence. Therefore, the research status of bimodal emotion recognition based on speech and text was summarized. Firstly, the representation space of emotion was classified and elaborated. Secondly, the emotion databases were classified according to their emotion representation space, and the common multi-modal emotion databases were summed up. Thirdly, the methods of bimodal emotion recognition based on speech and text were introduced, including feature extraction, modal fusion, and decision classification. Specifically, the modal fusion methods were highlighted and divided into four categories, namely feature level fusion, decision level fusion, model level fusion and multi-level fusion. In addition, results of a series of bimodal emotion recognition methods based on speech and text were compared and analyzed. Finally, the application scenarios, challenges, and future development directions of emotion recognition were introduced. The above aims to analyze and review the work of multi-modal emotion recognition, especially bimodal emotion recognition based on speech and text, providing valuable information for emotion recognition.

    Self-supervised learning method using minimal prior knowledge
    Junyi ZHU, Leilei CHANG, Xiaobin XU, Zhiyong HAO, Haiyue YU, Jiang JIANG
    2025, 45(4):  1035-1041.  DOI: 10.11772/j.issn.1001-9081.2024030366
    Asbtract ( )   HTML ( )   PDF (1521KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In order to make up for the high demand of supervised information in supervised learning, a self-supervised learning method based on minimal prior knowledge was proposed. Firstly, the unlabeled data were clustered on the basis of the prior knowledge of data, or the initial labels were generated for unlabeled data based on center distances of labeled data. Secondly, the data were selected randomly after labeling, and the machine learning method was selected to build sub-models. Thirdly, the weight and error of each data extraction were calculated to obtain average error of the data as the data label degree for each dataset, and set an iteration threshold based on the initial data label degree. Finally, the termination condition was determined on the basis of comparing the data-label degree and the threshold during the iteration process. Experimental results on 10 UCI public datasets show that compared with unsupervised learning algorithms such as K-means, supervised learning methods such as Support Vector Machine (SVM) and mainstream self-supervised learning methods such as TabNet (Tabular Network), the proposed method achieves high classification accuracy on unbalanced datasets without using labels or on balanced datasets using limited labels.

    Multi-branch multi-view based contextual contrastive representation learning method for time series
    Guangju YANG, Tianjian LUO, Kaijun WANG, Siqi YANG
    2025, 45(4):  1042-1052.  DOI: 10.11772/j.issn.1001-9081.2024040448
    Asbtract ( )   HTML ( )   PDF (3856KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Time series data are applied in various industries. However, the lack of label information and the complex temporal-spectral variations pose challenges for learning representations of time series. Therefore, a Multi-Branch Multi-View contextual Contrastive Representation Learning (MBMVCRL) method for time series was proposed. Firstly, time series samples were enhanced from both time and frequency perspectives and then input into a multi-branch multi-view model to extract multi-perspective feature representations of the time series. Secondly, for contrastive representation learning, the contextual contrastive loss and cross-prediction loss were calculated on the basis of the feature representations from the two perspectives, and joint training was conducted to obtain the optimal feature representation. Finally, to validate the representation capability of the proposed method for time series, an Affine Nonnegative Collaborative Representation (ANCR) classifier was used for the down-stream classification tasks. Experimental results show that the proposed method improves the recognition accuracy by 5.15, 0.90, and 1.89 percentage points, respectively, compared to the mainstream Time-Series Temporal and Contextual Contrasting (TS-TCC) method in human action, epilepsy, and sleep state recognition tasks. Ablation experimental results demonstrate the importance of the multi-branch multi-view model, highlighting the proposed model’s characteristics of low parameter sensitivity, fast convergence, and good generalization across different time series applications.

    Recommendation algorithm of graph contrastive learning based on hybrid negative sampling
    Renjie TIAN, Mingli JING, Long JIAO, Fei WANG
    2025, 45(4):  1053-1060.  DOI: 10.11772/j.issn.1001-9081.2024040419
    Asbtract ( )   HTML ( )   PDF (1954KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Contrastive Learning (CL) has the ability to extract self-supervised signals from raw data, providing strong support for addressing data sparsity in recommender systems. However, most existing CL-based recommendation algorithms focus on improving model structures and data augmentation methods, and ignoring the importance of enhancing negative sample quality and uncovering potential implicit relationships between users and items in recommendation tasks. To address this issue, a Hybrid negative Sampling-based Graph Contrastive Learning recommendation algorithm (HSGCL) was proposed. Firstly, differing from the uniform sampling method to sample from real data, a positive sample hybrid method was used by the proposed algorithm to integrate positive sample information into negative samples. Secondly, informative hard negative samples were created through a skip-mix method. Meanwhile, multiple views were generated by altering the graph structure using Node Dropout (ND), and controlled uniform noise smoothing was introduced in the embedding space to adjust the uniformity of learning representations. Finally, the main recommendation task and CL task were trained jointly. Numerical experiments were conducted on three public datasets: Douban-Book, Yelp2018, and Amazon-Kindle. The results show that compared to the baseline model Light Graph Convolution Network (LightGCN), the proposed algorithm improves the Recall@20 by 23%, 13%, and 7%, respectively, and Normalized Discounted Cumulative Gain (NDCG@20) by 32%, 14%, and 5%, respectively, and performs excellently in enhancing the diversity of negative sample embedding information. It can be seen that by improving negative sampling method and data augmentation, the proposed algorithm improves negative sample quality, the uniformity of representation distribution, and accuracy of the recommendation algorithm.

    Multi-view and multi-scale contrastive learning for graph collaborative filtering
    Weichao DANG, Xinyu WEN, Gaimei GAO, Chunxia LIU
    2025, 45(4):  1061-1068.  DOI: 10.11772/j.issn.1001-9081.2024030393
    Asbtract ( )   HTML ( )   PDF (1493KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    A Multi-View and Multi-Scale Contrastive Learning for graph collaborative filtering (MVMSCL) model was proposed to address the limitations of single view and the data sparsity in graph collaborative filtering recommendation methods. Firstly, an initial interaction diagram was constructed on the basis of user-item interactions, and multiple potential intentions in user-item interactions were considered to build multi-intention decomposition view. Secondly, the adjacency matrix was improved using high-order relationships to construct a collaborative neighbor view. Thirdly, the irrelevant noise interactions were removed to construct the adaptively enhanced initial interaction diagram and multi-intention decomposition view. Finally, contrastive learning paradigms with local, cross-layer, and global scales were introduced to generate self-supervised signals, thereby improving the recommendation performance. Experimental results on three public datasets, Gowalla, Amazon-book and Tmall, demonstrate that the recommendation performance of MVMSCL surpasses that of the comparison models. Compared with the optimal baseline model DCCF (Disentangled Contrastive Collaborative Filtering framework), MVMSCL has the Recall@20 increased by 5.7%, 14.5% and 10.0%, respectively, and the Normalized Discounted Cumulative Gain NDCG@20 increased by 4.6%, 17.9% and 11.5%, respectively.

    Channel shuffle attention mechanism based on group convolution
    Liwei ZHANG, Quan LIANG, Yutao HU, Qiaole ZHU
    2025, 45(4):  1069-1076.  DOI: 10.11772/j.issn.1001-9081.2024040525
    Asbtract ( )   HTML ( )   PDF (2671KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Introduction of attention mechanisms allows the backbone network to learn more discriminative feature representations. However, traditional attention mechanisms control the complexity of attention by channel dimension reduction or decreasing channel number while increasing batch size, which leads to excessive reduction of the number of channels and loss of important feature information. To address this issue, a Channel Shuffle Attention (CSA) module was proposed. Firstly, group convolutions were used to learn attention weights to control the complexity of CSA. Secondly, the traditional channel shuffle and Deep Channel Shuffle (DCS) methods were used to enhance the exchange of channel feature information between different groups. Thirdly, inverse channel shuffle was used to restore the order of attention weights. Finally, the restored attention weights were multiplied with the original feature map to obtain a more expressive feature map. Experimental results show that on CIFAR-100 dataset, ResNet50 adding CSA reduces the number of parameters by 2.3% and increases the Top-1 accuracy by 0.57 percentage points compared to ResNet50 adding CA (Coordinate Attention), and has the quantity of computation reduced by 18.4% and the Top-1 accuracy increased by 0.27 percentage points compared with ResNet50 adding EMA (Efficient Multi-scale Attention). On COCO2017 dataset, YOLOv5s adding CSA improves the mean Average Precision (mAP@50) by 0.5 and 0.2 percentage points, respectively, compared to YOLOv5s adding CA and EMA. It can be seen that CSA achieves a balance between the number of parameters and the computational complexity, and improves the accuracy of image classification tasks and the localization capability of object detection tasks at the same time.

    Edge federation dynamic analysis for hierarchical federated learning based on evolutionary game
    Yufei XIANG, Zhengwei NI
    2025, 45(4):  1077-1085.  DOI: 10.11772/j.issn.1001-9081.2024040428
    Asbtract ( )   HTML ( )   PDF (2452KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the issue that limited edge resources of the existing Edge Server Providers (ESPs) reduce the Quality of Service (QoS)of hierarchical federated learning edge nodes, a dynamic Edge Federated Framework (EFF)was proposed by considering the potential edge federation probability among edge servers. In the proposed framework, different ESPs cooperated to provide additional edge resources for hierarchical federated learning, which suffered from reduced model training efficiency due to client heterogeneity or Non-Independent and Identically Distributed (Non-IID)data. Firstly, decisions were offloaded by quantifying the communication model, and offloading tasks were assigned to the edge servers of other ESPs within the framework, so as to meet the elastic demand of edge resources. Secondly, the Multi-round Iterative EFF Participation Strategy (MIEPS)algorithm was used to solve the evolutionary game equilibrium solution among ESPs, thereby finding an appropriate resource allocation strategy. Finally, the existence, uniqueness, and stability of the equilibrium point were validated through theoretical and simulation experiments. Experimental results show that compared to non-federation and pairwise federation strategies, the tripartite EFF constructed using MIEPS algorithm improves the prediction accuracy of the global model trained on Independent and Identically Distributed (IID) datasets by 1.5 and 1.0 percentage points, respectively, and the prediction accuracy based on Non-IID datasets by 2.1 and 0.7 percentage points, respectively. Additionally, by changing the resource allocation method of ESP, it is validated that EFF can distribute the rewards of ESP fairly, encouraging more ESPs to participate and forming a positive cooperation environment.

    Clustering federated learning algorithm for heterogeneous data
    Qingli CHEN, Yuanbo GUO, Chen FANG
    2025, 45(4):  1086-1094.  DOI: 10.11772/j.issn.1001-9081.2024010132
    Asbtract ( )   HTML ( )   PDF (2335KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Federated Learning (FL) is a new machine learning model construction paradigm with great potential in privacy preservation and communication efficiency, but in real Internet of Things (IoT) scenarios, there is data heterogeneity between client nodes, and learning a unified global model will lead to a decrease in model accuracy. To solve this problem, a Clustering Federated Learning based on Feature Distribution (CFLFD) algorithm was proposed. In this algorithm, the results obtained through Principal Component Analysis (PCA) of the features extracted from the model by each client node were clustered in order to cluster client nodes with similar data distribution to collaborate with each other, so as to achieve higher model accuracy. In order to demonstrate the effectiveness of the algorithm, extensive experiments were conducted on three datasets and four benchmark algorithms. The results show that the algorithm improves model accuracy by 1.12 and 3.76 percentage points respectively compared to the FedProx on CIFAR10 dataset and Office-Caltech10 dataset.

    Cross-domain few-shot classification model based on relation network and Vision Transformer
    Yiqin YAN, Chuan LUO, Tianrui LI, Hongmei CHEN
    2025, 45(4):  1095-1103.  DOI: 10.11772/j.issn.1001-9081.2023121852
    Asbtract ( )   HTML ( )   PDF (2414KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problem of poor classification accuracy of few-shot learning models in domain shift conditions, a cross-domain few-shot model based on relation network and ViT (Vision Transformer) — ReViT (Relation ViT) was proposed. Firstly, ViT was introduced as a feature extractor, and the pre-trained deep neural network was employed to solve the problem of insufficient feature expression ability of shallow neural network. Secondly, a shallow convolutional network was used as a task adapter to enhance the knowledge transfer ability of the model, and a non-linear classifier was constructed on the basis of the relation network and the channel attention mechanism. Thirdly, the feature extractor and the task adapter were integrated to enhance the generalization ability of the model. Finally, a four-stage learning strategy of “pre-training — meta-training — fine-tuning — meta-testing” was adopted to train the model, which further improved the cross-domain classification performance of ReViT by effective integration of transfer learning and meta learning. Experimental results using average classification accuracy as evaluation metric show that ReViT has good performance on cross-domain few-shot classification problems. Specifically, the classification accuracies of ReViT under in-domain scenarios and out-of-domain scenarios are improved by 5.82 and 1.71 percentage points, respectively, compared to the sub-optimal model on Meta-Dataset. The classification accuracies of ReViT are improved by 1.00, 1.54 and 2.43 percentage points, respectively, compared to the sub-optimal model on 5-way 5-shot for three sub-problems EuroSAT (European SATellite data), CropDisease, and ISIC (International Skin Imaging Collaboration) of BCDFSL (Broader study of Cross-Domain Few-Shot Learning) dataset. The classification accuracies of ReViT are improved by 0.13, 0.97, and 3.40 percentage points, respectively, compared to the sub-optimal model on 5-way 20-shot for EuroSAT, CropDisease, and ISIC. The classification accuracy of ReViT is improved by 0.36 percentage point compared to the sub-optimal model on 5-way 50-shot for CropDisease. It can be seen that ReViT have good classification accuracy in image classification tasks with sparse samples.

    Multi-label classification model based on multi-label relational graph and local dynamic reconstruction learning
    Jie HU, Qiyang ZHENG, Jun SUN, Yan ZHANG
    2025, 45(4):  1104-1112.  DOI: 10.11772/j.issn.1001-9081.2024030386
    Asbtract ( )   HTML ( )   PDF (1080KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In multi-label classification tasks, the existing models mainly consider the co-occurrences of labels in the training set when constructing dependency, ignoring the various types of relationships and the dynamic interactions in different samples among labels. Therefore, the multi-label relational graph and the local dynamic reconstruction graph were combined to learn more complete label dependency. Firstly, based on the global co-occurrence relationships of labels, a multi-label relational graph was constructed using a data-driven approach to learn different types of dependency among labels. Secondly, the relevance of text information and label semantics was explored through the label attention mechanism. Finally, the label graph was reconstructed dynamically to learn and capture local-specific relationships among labels. Experiments were conducted on three public datasets BibTeX, Delicious, and Reuters-21578. The results show that the proposed model increases the macro average F1 (maF1) value by 1.6, 1.0 and 2.2 percentage points, respectively, and achieves improvements of comprehensive performance compared with Multi-relation Message Passing (MrMP).

    Data augmentation technique incorporating label confusion for Chinese text classification
    Haitao SUN, Jiayu LIN, Zuhong LIANG, Jie GUO
    2025, 45(4):  1113-1119.  DOI: 10.11772/j.issn.1001-9081.2024040550
    Asbtract ( )   HTML ( )   PDF (863KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Traditional data augmentation techniques, such as synonym substitution, random insertion, and random deletion, may change the original semantics of text and even result in the loss of critical information. Moreover, data in text classification tasks typically have both textual and label parts. However, traditional data augmentation methods only focus on the textual part. To address these issues, a Label Confusion incorporated Data Augmentation (LCDA) technique was proposed for providing a comprehensive enhancement of data from both textual and label aspects. In terms of text, by enhancing the text through random insertion and replacement of punctuation marks and completing end-of-sentence punctuation marks, textual diversity was increased with all textual information and sequence preserved. In terms of labels, simulated label distribution was generated using a label confusion approach, and used to replace the traditional one-hot label distribution, so as to better reflect the relationships among instances and labels as well as between labels. In experiments conducted on few-shot datasets constructed from THUCNews (TsingHua University Chinese News) and Toutiao Chinese news datasets, the proposed technique was combined with TextCNN, TextRNN, BERT (Bidirectional Encoder Representations from Transformers), and RoBERTa-CNN (Robustly optimized BERT approach Convolutional Neural Network) text classification models. The experimental results indicate that compared to those before enhancement, all models demonstrate significant performance improvements. Specifically, on 50-THU, a dataset constructed on THUCNews dataset, the accuracies of four models combing LCDA technique are improved by 1.19, 6.87, 3.21, and 2.89 percentage points, respectively, compared to those before enhancement, and by 0.78, 7.62, 1.75, and 1.28 percentage points, respectively, compared to those of the four models combining softEDA (Easy Data Augmentation with soft labels) method. By both textual and label processing results, model accuracy is enhanced by LCDA technique significantly, particularly in application scenarios characterized by limited data availability.

    Boundary-cross supervised semantic segmentation network with decoupled residual self-attention
    Kunyuan JIANG, Xiaoxia LI, Li WANG, Yaodan CAO, Xiaoqiang ZHANG, Nan DING, Yingyue ZHOU
    2025, 45(4):  1120-1129.  DOI: 10.11772/j.issn.1001-9081.2024040415
    Asbtract ( )   HTML ( )   PDF (4007KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Focused on the challenges of edge information loss and incomplete segmentation of large lesions in endoscopic semantic segmentation networks, a Boundary-Cross Supervised semantic Segmentation Network (BCS-SegNet) with Decoupled Residual Self-Attention (DRA) was proposed. Firstly, DRA was introduced to enhance the network’s ability to learn distantly related lesions. Secondly, a Cross Level Fusion (CLF) module was constructed to combine multi-level feature maps within the encoding structure in a pairwise way, so as to realize the fusion of image details and semantic information at low computational cost. Finally, multi-directional and multi-scale 2D Gabor transform was utilized to extract edge information, and spatial attention was used to weight edge features in the feature maps, so as to supervise decoding process of the segmentation network, thereby providing more accurate intra-class segmentation consistency at pixel level. Experimental results demonstrate that on ISIC2018 dermoscopy and Kvasir-SEG/CVC-ClinicDB colonoscopy datasets, BCS-SegNet achieves the mIoU (mean Intersection over Union) and Dice coefficient of 84.27%, 90.68% and 79.24%, 87.91%, respectively; on the self-built esophageal endoscopy dataset, BCS-SegNet achieves the mIoU of 82.73% and Dice coefficient of 90.84%, while the above mIoU is increased by 3.30% over that of U-net and 4.97% over that of UCTransNet. It can be seen that the proposed network can realize visual effects such as more complete segmentation regions and clearer edge details.

    Domain adaptation integrating environment label smoothing and nuclear norm discrepancy
    Meirong DING, Jinxin ZHUO, Yuwu LU, Qinglong LIU, Jicong LANG
    2025, 45(4):  1130-1138.  DOI: 10.11772/j.issn.1001-9081.2024040417
    Asbtract ( )   HTML ( )   PDF (2993KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The existing domain adaptation methods overly focus on fine-grained feature learning in the source domain, hindering their ability to extend to the target domain effectively, making them prone to overfitting in specific environments, and lacking robustness to complex environments. To address the above mentioned issues, a domain adaptation model that integrates Environment Label Smoothing and Nuclear norm Discrepancy (ELSND) was proposed. In the proposed model, through the environment label smoothing module, the probability of true labels was reduced and the probability of non-true labels was increased to enhance the model adaptability to different scenarios. At the same time, the nuclear norm discrepancy module was employed to measure distribution difference between the source and target domains, thereby improving the classification certainty at decision boundaries. Large number of experiments were conducted on adaptive benchmark datasets of three domains including Office-31, Office-Home and MiniDomainNet. Compared with the state-of-the-art baseline model DomainAdaptor-Aug (DomainAdaptor with generalized entropy minimization-Augmentation) on MiniDomainNet dataset, ELSND model achieves a 1.23 percentage points increase in accuracy of image classification domain adaptation tasks. Therefore, the proposed model has a higher precision and generalization in image classification.

    Unsupervised text style transfer based on semantic perception of proximity
    Junxiu AN, Linwang YANG, Yuan LIU
    2025, 45(4):  1139-1147.  DOI: 10.11772/j.issn.1001-9081.2024040536
    Asbtract ( )   HTML ( )   PDF (3019KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problem that the distance boundaries between word vectors in latent space are not fully considered in discrete word perturbation and embedding perturbation methods, a Semantic Proximity-aware Adversarial Auto-Encoders (SPAAE) method was proposed. Firstly, adversarial auto-encoders were used as the underlying model. Secondly, standard deviation of the probability distribution of noise vectors was obtained on the basis of proximity distance of the word vectors. Finally, by randomly sampling the probability distribution, the perturbation parameters were adjusted dynamically to maximize the blurring of its own semantics without affecting the semantics of other word vectors. Experimental results show that compared with the DAAE (Denoising Adversarial Auto-Encoders) and EPAAE (Embedding Perturbed Adversarial Auto-Encoders) methods, the proposed method has the natural fluency increased by 14.88% and 15.65%, respectively, on Yelp dataset; the proposed method has the Text Style Transfer (TST) accuracy improved by 11.68% and 6.45%, respectively, on Scitail dataset; the proposed method has the BLEU (BiLingual Evaluation Understudy) increased by 28.16% and 26.17%, respectively, on Tenses dataset. It can be seen that SPAAE method provides a more accurate way of perturbing word vectors in theory, and demonstrates its significant advantages in different style transfer tasks on 7 public datasets. Especially in the guidance of online public opinion, the proposed method can be used for style transfer of emotional text.

    Knowledge graph completion using hierarchical attention fusing directed relationships and relational paths
    Sheping ZHAI, Qing YANG, Yan HUANG, Rui YANG
    2025, 45(4):  1148-1156.  DOI: 10.11772/j.issn.1001-9081.2024030321
    Asbtract ( )   HTML ( )   PDF (1723KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Most of the existing Knowledge Graph Completion (KGC) methods do not fully exploit the relational paths in the triple structure, and only consider the graph structure information; meanwhile, the existing models focus on considering the neighborhood information in the process of entity aggregation, and the learning of relations is relatively simple. To address the above problems, a graph attention model that integrates directed relations and relational paths was proposed, namely DRPGAT. Firstly, the regular triples were converted into directed relationship-based triples, and the attention mechanism was introduced to give different weights to different directed relationships, so as to realize the entity information aggregation. At the same time, the relational path model was established, and the relational positions were embedded into the path information to distinguish the relationships among different positions. And the irrelevant paths were filtered to obtain the useful path information. Secondly, the attention mechanism was used to carry out deep path information learning to realize the aggregation of relations. Finally, the entities and relations were fed into the decoder and trained to obtain the final completion results. Link prediction experiments were conducted on two real datasets to verify the effectiveness of the proposed model. Experimental results show that compared to the optimal results of the baseline models, on FB15k-237 dataset, DRPGAT has the Mean Rank (MR) reduced by 13, and the Mean Reciprocal Rank (MRR), Hits@1, Hits@3, and Hits@10 improved by 1.9, 1.2, 2.3, and 1.6 percentage points, respectively; on WN18RR dataset, DRPGAT has the MR reduced by 125, and the MRR, Hits@1, Hits@3, and Hits@10 improved by 1.1, 0.4, 1.2, and 0.6 percentage points, respectively, indicating the effectiveness of the proposed model.

    Consultation recommendation method based on knowledge graph and dialogue structure
    Chun XU, Shuangyan JI, Huan MA, Enwei SUN, Mengmeng WANG, Mingyu SU
    2025, 45(4):  1157-1168.  DOI: 10.11772/j.issn.1001-9081.2024050573
    Asbtract ( )   HTML ( )   PDF (2938KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problems that the existing consultation recommendation methods do not fully utilize the rich dialogue information between doctors and patients and do not capture patients’ real-time health needs and preferences, a consultation recommendation method based on Knowledge Graph and Dialogue Structure (KGDS) was proposed. Firstly, a medical Knowledge Graph (KG) including comment sentiment analysis and professional medical knowledge was constructed to improve the fine-grained feature representations of doctors and patients. Secondly, in the patient representation learning part, a patient query encoder was designed to extract key features of query text at both word and sentence levels, and to improve the higher-level feature interactions between doctor and patient vectors through attention mechanism. Thirdly, a diagnosis dialogue was modeled to make full use of the rich dialogue information between doctors and patients to enhance the doctor-patient feature representation. Finally, a dialogue simulator based on contrastive learning was designed to capture the dynamic needs and real-time preferences of patients, and the simulated dialogue representation was used to support recommendation score prediction. Experimental results on a real dataset show that compared with the optimal baseline method, KGDS increases AUC(Area Under the Curve), MRR@15(Mean Reciprocal Rank), Diversity@15, F1@15, HR@15 (Hit Ratio) and NDCG@15(Normalized Discounted Cumulative Gain) by 1.82, 1.78, 3.85, 3.06, 10.02 and 4.51 percentage points, respectively, which verifies the effectiveness of the proposed consultation recommendation method, and it can be seen that adding sentiment analysis and KG improves the interpretability of the recommendation results.

    Tender information extraction method based on prompt tuning of knowledge
    Yiheng SUN, Maofu LIU
    2025, 45(4):  1169-1176.  DOI: 10.11772/j.issn.1001-9081.2024030336
    Asbtract ( )   HTML ( )   PDF (1313KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Current information extraction tasks mainly rely on Large Language Models (LLMs). However, the frequent occurrence of domain terms in tender information and the lack of relevant prior knowledge of the models result in low fine-tuning efficiency and poor extraction performance. Additionally, the extraction and generalization performance of the models depend on the quality of prompt information and the construction way of prompt templates to a great extent. To address these issues, a Tender Information Extraction method based on Prompt Learning (TIEPL) was proposed. Firstly, prompt learning method for generative information extraction was utilized to inject domain knowledge into the LLM, thereby achieving unified optimization of pre-training and fine-tuning stages. Secondly, with the LoRA (Low-Rank Adaptation) fine-tuning method as framework, a prompt training bypass was designed separately, and a prompt template with keywords was designed in the tender scenarios, thereby enhancing the bidirectional association between model information extraction and prompts. Experimental results on a self-built tender inviting and winning dataset indicate that TIEPL improves Recall-Oriented Understudy for Gisting Evaluation (ROUGE-L) and BLEU-4 (BiLingual Evaluation Understudy) by 1.05 and 4.71 percentage points, respectively, compared to the sub-optimal method, UIE(Universal Information Extraction), and TIEPL can generate extraction results more accurately and completely. This demonstrates the effectiveness of the proposed method in improving the accuracy and generalization of tender information extraction.

    Open-world knowledge reasoning model based on path and enhanced triplet text
    Liqin WANG, Zhilei GENG, Yingshuang LI, Yongfeng DONG, Meng BIAN
    2025, 45(4):  1177-1183.  DOI: 10.11772/j.issn.1001-9081.2024030265
    Asbtract ( )   HTML ( )   PDF (838KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Traditional knowledge reasoning methods based on representation learning can only be used for closed-world knowledge reasoning. Conducting open-world knowledge reasoning effectively is a hot issue currently. Therefore, a knowledge reasoning model based on path and enhanced triplet text, named PEOR (Path and Enhanced triplet text for Open world knowledge Reasoning), was proposed. First, multiple paths generated by structures between entity pairs and enhanced triplets generated by individual entity neighborhood structures were utilized. Among then, the path text was obtained by concatenating the text of triplets in the path, and the enhanced triplet text was obtained by concatenating the text of head entity neighborhood, relation, and tail entity neighborhood. Then, BERT (Bidirectional Encoder Representations from Transformers) was employed to encode the path text and enhanced triplet text separately. Finally, semantic matching attention calculation was performed using path vectors and triplet vectors, followed by aggregation of semantic information from multiple paths using semantic matching attention. Comparison experimental results on three open-world knowledge graph datasets: WN18RR, FB15k-237, and NELL-995 show that compared with suboptimal model BERTRL (BERT-based Relational Learning), the proposed model has Hits@10 (Hit ratio) metric improved by 2.6, 2.3 and 8.5 percentage points, respectively, validating the effectiveness of the proposed model.

    Fact verification of semantic fusion collaborative reasoning based on graph embedding
    Malei SHEN, Zhicai SHI, Yongbin GAO, Jianyang HU
    2025, 45(4):  1184-1189.  DOI: 10.11772/j.issn.1001-9081.2024040436
    Asbtract ( )   HTML ( )   PDF (2217KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    As a critical task in the field of natural language processing, fact verification requires the ability to retrieve relevant evidences from large amount of plain text based on a given claim and use this evidence to reason and verify the claim. Previous studies usually use concatenation of evidence sentences or graph structure to represent the relationships among the evidences, but cannot represent the internal relevance among the evidences clearly. Therefore, a collaborative reasoning network model based on graph and text fusion — CNGT (Co-attention Network with Graph and Text fusion) was designed. The semantic fusion of evidence sentences was achieved by constructing evidence knowledge graph. Firstly, the evidential knowledge graph was constructed according to the evidence sentences, and the graph representation was learned by graph transformation encoder. Then, the BERT (Bidirectional Encoder Representations from Transformers) model was used to encode the claim and evidence sentences. Finally, the reasoning graph information and text features were fused effectively through the double-layer cooperative reasoning network. Experimental results show that the proposed model is better than the advanced model KGAT (Knowledge Graph Attention neTwork) on FEVER (Fact Extraction and VERification) dataset with Label Accuracy (LA) increased by 0.84 percentage points and FEVER score increased by 1.51 percentage points. It can be seen that the model pays more attention to the relationships among evidence sentences, demonstrating the interpretability of the model for the relationships among evidence sentences through the evidence graph.

    Novel speaker identification framework based on narrative unit and reliable label
    Tianyu LIU, Ye TAO, Chaofeng LU, Jiawang LIU
    2025, 45(4):  1190-1198.  DOI: 10.11772/j.issn.1001-9081.2024030331
    Asbtract ( )   HTML ( )   PDF (2354KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Speaker Identification (SI) in novels aims to determine the speaker of a quotation by its context. This task is of great help in assigning appropriate voices to different characters in the production of audiobooks. However, the existing methods mainly use fixed window values in the selection of the context of quotations, which is not flexible enough and may produce redundant segments, making it difficult for the model to capture useful information. Besides, due to the significant differences in the number of quotations and writing styles in different novels, a small number of labeled samples cannot enable the model to fully generalize, and the labeling of datasets is expensive. To solve the above problems, a novel speaker identification framework that integrates narrative units and reliable labels was proposed. Firstly, a Narrative Unit-based Context Selection (NUCS) method was used to select a suitable length of context for the model to focus highly on the segment closest to the quotation attribution. Secondly, a Speaker Scoring Network (SSN) was constructed with the generated context as input. In addition, the self-training was introduced, and a Reliable Pseudo Label Selection (RPLS) algorithm was designed to compensate for the lack of labeled samples to some extent and screen out more reliable pseudo-label samples with higher quality. Finally, a Chinese Novel Speaker Identification corpus (CNSI) containing 11 Chinese novels was built and labeled. To evaluate the proposed framework, experiments were conducted on two public datasets and the self-built dataset. The results show that the novel speaker identification framework that integrates narrative units and reliable labels is superior to the methods such as CSN (Candidate Scoring Network), E2E_SI and ChatGPT-3.5.

    Tibetan word segmentation system based on pre-trained model tokenization reconstruction
    Jie YANG, Tashi NYIMA, Dongrub RINCHEN, Jindong QI, Dondrub TSHERING
    2025, 45(4):  1199-1204.  DOI: 10.11772/j.issn.1001-9081.2024040442
    Asbtract ( )   HTML ( )   PDF (1442KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address poor performance of the existing pre-trained model in Tibetan segmentation tasks, a method was proposed to establish a tokenization reconstruction standard to regulate the constraint text, and subsequently reconstruct the tokenization of the Tibetan pre-trained model to perform Tibetan segmentation tasks. Firstly, the standardization operation was performed on the original text to solve the incorrect cuts due to language mixing and so on. Secondly, reconstruction of the tokenization at syllable granularity was performed on the pre-trained model to make the cut-off units parallel to the labeled units. Finally, after completing the sticky cuts using the improved sliding window restoration method, the Re-TiBERT-BiLSTM-CRF model was established by the use of the “Begin, Middle, End and Single” (BMES) four element annotation method, so as to obtain the Tibetan word segmentation system. Experimental results show that the pre-trained model after reconstructing the tokenization is significantly better than the original pre-trained model in the segmentation tasks. The obtained system has a high Tibetan word segmentation precision, and its F1 value can reach up to 97.15%, so it can complete Tibetan segmentation tasks well.

    Data science and technology
    Group recommendation model by graph neural network based on multi-perspective learning
    Cong WANG, Yancui SHI
    2025, 45(4):  1205-1212.  DOI: 10.11772/j.issn.1001-9081.2024030337
    Asbtract ( )   HTML ( )   PDF (2528KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Focusing on the problem that it is difficult for the existing group recommendation models based on Graph Neural Networks (GNNs) to fully utilize explicit and implicit interaction information, a Group Recommendation by GNN based on Multi-perspective learning (GRGM) model was proposed. Firstly, hypergraphs, bipartite graphs, as well as hypergraph projections were constructed according to the group interaction data, and the corresponding GNN was adopted aiming at the characteristics of each graph to extract node features of the graph, thereby fully expressing the explicit and implicit relationships among users, groups, and items. Then, a multi-perspective information fusion strategy was proposed to obtain the final group and item representations. Experimental results on Mafengwo, CAMRa2011, and Weeplases datasets show that compared to the baseline model ConsRec, GRGM model improves the Hit Ratio (HR@5, HR@1) and Normalized Discounted Cumulative Gain (NDCG@5, NDCG@10) by 3.38%, 1.96% and 3.67%, 3.84%, respectively, on Mafengwo dataset, 2.87%, 1.18% and 0.96%, 1.62%, respectively, on CAMRa2011 dataset, and 2.41%, 1.69% and 4.35%, and 2.60%, respectively, on Weeplaces dataset. It can be seen that GRGM model has better recommendation performance compared with the baseline models.

    Developer recommendation for open-source projects based on collaborative contribution network
    Lan YOU, Yuang ZHANG, Yuan LIU, Zhijun CHEN, Wei WANG, Xing ZENG, Zhangwei HE
    2025, 45(4):  1213-1222.  DOI: 10.11772/j.issn.1001-9081.2024040454
    Asbtract ( )   HTML ( )   PDF (4564KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Recommending developers for open-source projects is of great significance to the construction of open-source ecology. Different from traditional software development, developers, projects, organizations and correlations in the open-source field reflect the characteristics of open collaborative projects, and their embedded semantics help to recommend developers accurately for open-source projects. Therefore, a Developer Recommendation method based on Collaborative Contribution Network (DRCCN) was proposed. Firstly, a CCN was constructed by utilizing the contribution relationships among Open-Source Software (OSS) developers, OSS projects and OSS organizations. Then, based on CCN, a three-layer deep heterogeneous GraphSAGE (Graph SAmple and aggreGatE) Graph Neural Network (GNN) model was constructed to predict the links between developer nodes and open-source project nodes, so as to generate the corresponding embedding pairs. Finally, according to the prediction results, the K-Nearest Neighbor (KNN) algorithm was adopted to complete the developer recommendation. The proposed model was trained and tested on GitHub dataset, and the experimental results show that compared to the contrastive learning model for sequential recommendation CL4SRec (Contrastive Learning for Sequential Recommendation), DRCCN improves the precision, recall, and F1 score by approximately 10.7%, 2.6%, and 4.2%, respectively. It can be seen that the proposed model can provide important reference for the developer recommendation of open-source community projects.

    Multi-behavior recommendation based on cascading residual graph convolutional network
    Weichao DANG, Chujun SONG, Gaimei GAO, Chunxia LIU
    2025, 45(4):  1223-1231.  DOI: 10.11772/j.issn.1001-9081.2024040461
    Asbtract ( )   HTML ( )   PDF (2164KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    A Multi-Behavior Recommendation based on Cascading Residual graph convolutional network (CRMBR) model was proposed to address the problems of data sparsity and neglecting the complex connections among multiple behaviors in multi-behavior recommendation research. Firstly, the global embeddings of users and items were learned from a unified isomorphic graph constructed from the interactions of all behaviors and used as initialization embeddings. Secondly, the embeddings of different types of behaviors were refined continuously to improve the user preferences by capturing the connections among different behaviors through cascading residual blocks. Finally, user and item embeddings were aggregated through two different aggregation strategies, respectively, and optimized using Multi-Task Learning (MTL). Experimental results on several real datasets show that the recommendation performance of CRMBR model is better than that of the current mainstream models. Compared with the advanced benchmark model — Multi-Behavior Hierarchical Graph Convolutional Network (MB-HGCN), the proposed model has the Hit Rate (HR@20) and Normalized Discount Cumulative Gain (NDCG@20) improved by 3.1% and 3.9% on Tmall dataset, increased by 15.8% and 16.9% on Beibei dataset, and improved by 1.0% and 3.3% on Jdata dataset, respectively, which validates the effectiveness of the proposed model.

    Cyber security
    Framework and implementation of network data security protection based on zero trust
    Zuoguang WANG, Chao LI, Li ZHAO
    2025, 45(4):  1232-1240.  DOI: 10.11772/j.issn.1001-9081.2024040526
    Asbtract ( )   HTML ( )   PDF (3893KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In order to address the failure of boundary protection measures caused by the evolution with complexity, dynamics and fragmentation of network architecture, and to cope with the challenge for network data security caused by the continuous emergence of vulnerabilities in non-autonomous controllable systems, software, hardware and cryptographic algorithms, the following tasks were performed. Firstly, a zero trust network architecture implementation model was designed on the basis of zero trust concept. Secondly, a zero trust network security protection framework was proposed, which integrated concept of zero trust security, Chinese cryptographic algorithm system, and trusted computing technology in links such as identity management and authentication, authorization and access, data processing and transmission, framework processes such as Chinese cryptographic certificate application and issuance, business data secure processing and transmission were designed, and functional components such as identity and access management module, terminal trusted network access proxy device were designed and implemented. Finally, a network platform based on the security protection framework was built, which provided new frameworks, technologies and tools for network data security protection and zero trust security practices. Security analysis and performance test results show that with the proposed platform, the signing and signature verification performance of the SM2 reaches 1 118.72 and 441.43 times per second respectively, the encryption and decryption performance of SM4 reaches 10.05 MB/s and 9.96 MB/s respectively, and the secure data access/response performance reaches 7.23 MB/s, demonstrating that the proposed framework can provide stable support for data security.

    Secure cluster control of UAVs under DoS attacks based on APF and DDPG algorithm
    Bingquan LIN, Lei LIU, Huafeng LI, Chen LIU
    2025, 45(4):  1241-1248.  DOI: 10.11772/j.issn.1001-9081.2024040464
    Asbtract ( )   HTML ( )   PDF (4132KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Addressing the issues of communication obstruction and unpredictable motion trajectories of Unmanned Aerial Vehicles (UAVs) under Denial of Service (DoS) attacks, research was conducted on the secure cluster control strategy for multi-UAV during DoS attacks within a framework that integrates Artificial Potential Field (APF) and Deep Deterministic Policy Gradient (DDPG) algorithm. Firstly, Hping3 was utilized to detect DoS attacks on all UAVs, thereby determining the network environment of the UAV cluster in real time. Secondly, when no attack was detected, the traditional APF was employed for cluster flight. After detecting attacks, the targeted UAVs were marked as dynamic obstacles while other UAV switched to control strategies generated by DDPG algorithm. Finally, with the proposed framework, the cooperation and advantage complementary of APF and DDPG were realized, and the effectiveness of the DDPG algorithm was validated through simulation in Gazebo. Simulation results indicate that Hping3 can detect the UAVs under attack in real time, and other normal UAVs can avoid obstacles stably after switching to DDPG algorithm, so as to ensure cluster security; the success rate of the switching obstacle avoidance strategy during DoS attacks is 72.50%, significantly higher than that of the traditional APF (31.25%), and the switching strategy converges gradually, demonstrating a pretty stability; the trained DDPG obstacle avoidance strategy exhibits a degree of generalization, capable of completing tasks stably with 1 to 2 unknown obstacles appeared in the environment.

    Post-quantum certificateless public audit scheme based on lattice
    Haifeng MA, Jiewei CAI, Qingshui XUE, Jiahai YANG, Jing HAN, Zixuan LU
    2025, 45(4):  1249-1255.  DOI: 10.11772/j.issn.1001-9081.2024050605
    Asbtract ( )   HTML ( )   PDF (1220KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Periodic audit of data stored on cloud servers is a core strategy to ensure the security and integrity of cloud-stored data. It can identify and address the risks of data tampering or loss effectively. However, traditional public audit schemes suffer from issues such as certificate management or key escrow, leading to privacy leak problem during data querying and dynamic modification. Furthermore, with the continuous development of quantum computing technology, public audit schemes based on traditional public key systems face serious threats of being cracked by quantum computers. To address the above issues, a post-quantum certificateless public audit scheme based on lattice was proposed. Firstly, a certificateless public key cryptosystem was used to solve the certificate management and key escrow problems in traditional public audit schemes. Secondly, during data querying and dynamic modification processes, Data Owners (DO) were not required to provide specific data block information, thereby ensuring the privacy of the DO. Finally, lattice cryptography technology was employed to resist attacks from quantum computers. Theoretical analysis and experimental comparison results demonstrate that the proposed scheme can resist malicious attacks while ensuring the privacy of DO operations, and it achieves higher efficiency in label generation.

    Network and communications
    Joint beamforming and power allocation in RIS-assisted multi-cluster NOMA-DFRC system
    Yuchen LI, Junyi WU, Mengjia GE, Lili PAN, Xiaorong JING
    2025, 45(4):  1256-1262.  DOI: 10.11772/j.issn.1001-9081.2024040530
    Asbtract ( )   HTML ( )   PDF (3211KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Facing the higher demands for communication and sensing in upcoming Dual-Function Radar Communication (DFRC) systems, a DFRC system model was proposed that combines multi-cluster Non-Orthogonal Multiple Access (NOMA) technology and Reconfigurable Intelligent Surface (RIS). In the proposed model, the superimposed multi-cluster NOMA signals were utilized by the DFRC base stations to achieve target perception and the virtual line-of-sight links established by RIS reflection were used to enhance the communication performance of users in multi-cluster NOMA. Based on the proposed model, with the goal of maximizing weighted sum of the system sum rate and the sensing power, a non-convex objective function with multiple constraints and coupled variables was constructed. To solve this objective function, an optimization scheme for joint beamforming and power allocation was proposed. In the proposed scheme, firstly, the original optimization problem was decomposed into three subproblems. Subsequently, methods such as Successive Convex Approximation (SCA) and SemiDefinite Relaxation (SDR) were employed to transform the original non-convex optimization subproblems into convex optimization subproblems. Finally, the Alternating Optimization (AO) method was applied to solve the subproblems, thereby achieving joint beamforming (including active and passive beamforming) and intra-cluster power allocation coefficient optimization. Simulation results indicate that the proposed scheme has good performance of communication and sensing, and compared with the Orthogonal Multiple Access (OMA) scheme, it has the system sum rate improved by about 1 bit/(s·Hz) with high target perception performance, achieving a good compromise between communication performance and perception performance.

    Protocol conversion method based on semantic similarity
    Dingmu YANG, Longqiang NI, Jing LIANG, Zhaoyuan QIU, Yongzhen ZHANG, Zhiqiang QI
    2025, 45(4):  1263-1270.  DOI: 10.11772/j.issn.1001-9081.2024040534
    Asbtract ( )   HTML ( )   PDF (2168KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Protocol conversion is usually used to solve the problem of data interaction between different protocols, and its nature is to find mapping relationship between different protocol fields. In the traditional methods of protocol conversion, several drawbacks are identified: traditional conversions are mainly designed on the basis of specific protocols, so that they are static and lack flexibility, and are not suitable for environments with multi-protocol conversion; whenever a protocol changes, a reanalysis of the protocol’s structure and semantic fields is required to reconstruct the mapping relationship between fields, leading to an exponential increase in workload and a decrease in protocol conversion efficiency. Therefore, a general method of protocol conversion based on semantic similarity was proposed to enhance protocol conversion efficiency by exploring the relationship between fields intelligently. Firstly, the BERT (Bidirectional Encoder Representations from Transformers) model was employed to classify the protocol fields, and eliminate the fields that “should not” have mapping relationship. Secondly, the semantic similarities between fields were computed to reason the mapping relationship between fields, resulting in the formation of a field mapping table. Finally, a general framework for protocol conversion based on semantic similarity was introduced, and related protocols were defined for validation. Simulation results show that the precision of field classification of the proposed method reaches 94.44%; and the precision of mapping relationship identification of the proposed method reaches 90.70%, which is 13.93% higher than that of the method based on knowledge extraction. The above results verify that the proposed method is feasible, can identify the mapping relationships between different protocol fields quickly, and is suitable for scenarios with multi-protocol conversion in unmanned collaboration.

    Multimedia computing and computer simulation
    Moving pedestrian detection neural network with invariant global sparse contour point representation
    Qingqing ZHAO, Bin HU
    2025, 45(4):  1271-1284.  DOI: 10.11772/j.issn.1001-9081.2024040561
    Asbtract ( )   HTML ( )   PDF (7106KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    As pedestrians are non-rigid objects, effective invariant representation of their visual features is the key to improving recognition performance. In natural visual scenes, moving pedestrians often undergo changes in scale, background, and pose, which creates obstacles for existing techniques for extracting these irregular features. The issue was addressed by exploring the problem of invariant recognition of moving pedestrians based on the neural structural characteristics of mammalian retinas, and a Moving Pedestrian Detection Neural Network (MPDNN) was proposed for visual scenes. MPDNN was composed of two neural modules: the presynaptic network and the postsynaptic network. The presynaptic network was used to perceive low-level visual motion cues representing the moving object and extract the object’s binarized visual information, and the postsynaptic network was utilized to take advantage of the sparse invariant response properties in the biological visual system and use the invariant relationship between large concave and convex regions of the object’s contour after continuous shape changes, then, stably changed visual features were encoded from low-level motion cues to build invariant representations of pedestrians. Experimental results show that MPDNN achieves a 96.96% cross-domain detection accuracy on the public datasets CUHK Avenue and EPFL, which is 4.52 percentage points higher than the SOTA (State of the Art) model; MPDNN demonstrates good robustness on scale and motion posture variation datasets, with accuracy of 89.48% and 91.45%, respectively. The effectiveness of the biological invariant object recognition mechanism in moving pedestrian detection was validated by the above experimental results.

    Gait recognition method based on dilated reparameterization and atrous convolution architecture
    Lina HUO, Leren XUE, Yujun DAI, Xinyu ZHAO, Shihang WANG, Wei WANG
    2025, 45(4):  1285-1292.  DOI: 10.11772/j.issn.1001-9081.2024050566
    Asbtract ( )   HTML ( )   PDF (1928KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Gait recognition aims at identifying people by their walking postures. To solve the problem of poor matching between the Effective Receptive Field (ERF) and the human silhouette region, a gait recognition method based on atrous convolution, named DilatedGait, was proposed. Firstly, atrous convolution was employed to expand the neurons’ receptive fields, thereby alleviating the resolution degradation by downsampling and model deepening. Therefore, the recognizability of the silhouette structure was enhanced. Secondly, Dilated Reparameterization Module (DRM) was proposed to optimize the ERF focus range by fusing the multi-scale convolution kernel parameters through reparameterization method, thus enabling the model to capture more global contextual information. Finally, the discriminative gait features were extracted via feature mapping. Experiments were conducted on the outdoor datasets Gait3D and GREW, and the results show that compared with the existing state-of-the-art method GaitBase, DilatedGait improves 9.0 and 14.2 percentage points respectively in Rank-1 and mean Inverse Negative Penalty (mINP) on Gait3D and increases 11.6 and 8.8 percentage points respectively in Rank-1 and Rank-5 on GREW. It can be seen that DilatedGait overcomes the adverse effects of complex covariates and further enhances the accuracy of gait recognition in outdoor scenes.

    3D hand pose estimation combining attention mechanism and multi-scale feature fusion
    Shiyue GUO, Jianwu DANG, Yangping WANG, Jiu YONG
    2025, 45(4):  1293-1299.  DOI: 10.11772/j.issn.1001-9081.2024040507
    Asbtract ( )   HTML ( )   PDF (2747KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the problem of inaccurate 3D hand pose estimation from a single RGB image due to occlusion and self-similarity, a 3D hand pose estimation network combining attention mechanism and multi-scale feature fusion was proposed. Firstly, Sensory Enhancement Module (SEM) was proposed, which combined dilated convolution and CBAM (Convolutional Block Attention Module) attention mechanism, and it was used to replace the Basicblock of HourGlass Network (HGNet) to expand the receptive field and enhance the sensitivity to spatial information, so as to improve the ability of extracting hand features. Secondly, a multi-scale information fusion module SS-MIFM (SPCNet and Soft-attention-Multi-scale Information Fusion Module) combining SPCNet (Spatial Preserve and Content-aware Network) and Soft-Attention enhancement was designed to aggregate multi-level features effectively and improve the accuracy of 2D hand keypoint detection significantly with full consideration of the spatial content awareness mechanism. Finally, a 2.5D pose conversion module was proposed to convert 2D pose into 3D pose, thereby avoiding the problem of spatial loss caused by the direct regression of 2D keypoint coordinates to calculate 3D pose information. Experimental results show that on InterHand2.6M dataset, the two?hand Mean Per Joint Position Error (MPJPE), the single?hand MPJPE, and the Mean Relative-Root Position Error (MRRPE) of the proposed algorithm reach 12.32, 9.96 and 29.57 mm, respectively; on RHD (Rendered Hand pose Dataset), compared with InterNet and QMCG-Net algorithms, the proposed algorithm has the End-Point Error (EPE) reduced by 2.68 and 0.38 mm, respectively. The above results demonstrate that the proposed algorithm can estimate hand pose more accurately and is more robust in some two-hand interaction and occlusion scenarios.

    Video anomaly detection for moving foreground regions
    Lihu PAN, Shouxin PENG, Rui ZHANG, Zhiyang XUE, Xuzhen MAO
    2025, 45(4):  1300-1309.  DOI: 10.11772/j.issn.1001-9081.2024040519
    Asbtract ( )   HTML ( )   PDF (2907KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Imbalance in data distribution between static background information and moving foreground objects often leads to insufficient learning of abnormal foreground region information, thereby affecting the accuracy of Video Anomaly Detection (VAD). To address this issue, a Nested U-shaped Frame Predictive Generative Adversarial Network (NUFP-GAN) was proposed for VAD. In the proposed method, a nested U-shaped frame prediction network architecture, which had the capability to highlight significant targets in video frames, was utilized as the frame prediction module. In the discrimination phase, a self-attention patch discriminator was designed to extract more important appearance and motion features from video frames using receptive fields of different sizes, thereby enhancing the accuracy of anomaly detection. Additionally, to ensure the consistency of multi-scale features of predicted frames and real frames in high-level semantic information, a multi-scale consistency loss was introduced to further improve the method’s anomaly detection performance. Experimental results show that the proposed method achieves the Area Under Curve (AUC) values of 87.6%, 85.2%, 96.0%, and 73.3%, respectively, on CUHK Avenue, UCSD Ped1, UCSD Ped2, and ShanghaiTech datasets; on ShanghaiTech dataset, the AUC value of the proposed method is 1.8 percentage points higher than that of MAMC (Memory-enhanced Appearance-Motion Consistency) method. It can be seen that the proposed method can meet the challenges brought by data distribution imbalance in VAD effectively.

    Remote sensing image building extraction network based on dual promotion of semantic and detailed features
    Yang ZHOU, Hui LI
    2025, 45(4):  1310-1316.  DOI: 10.11772/j.issn.1001-9081.2024030387
    Asbtract ( )   HTML ( )   PDF (3171KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Accurate edge information extraction is crucial for building segmentation. Current approaches often simply fuse multi-scale detailed features with semantic features or design complex loss functions to guide the network’s focus on edge information, ignoring the mutual promotion effect between semantic and detailed features. To address these issues, a remote sensing image building extraction network based on dual promotion of semantic and detailed features was developed. The structure of the proposed network was similar to the framework of U-Net. The shallow high-resolution detailed feature maps were extracted in the encoder, and the deep Semantic and Detail Feature dual Facilitation module(SDFF) was embedded in the backbone network in the decoder, so as to enable the network to have both good semantic and detail feature extraction capabilities. After that, channel fusion was performed on semantic and detailed features, and combined with edge loss supervision of images with varying resolutions, the ability to extract building details and the generalization of the network were enhanced. Experimental results demonstrate that compared to various mainstream methods such as U-Net and Dual-Stream Detail-Concerned Network (DSDCNet), the proposed network achieves superior semantic segmentation results on WHU and Massachusetts buildings (Massachusetts) datasets, showing better preservation of building edge features and effective improvement of building segmentation accuracy in remote sensing images.

    YOLOv5s-MRD: efficient fire and smoke detection algorithm for complex scenarios based on YOLOv5s
    Yang HOU, Qiong ZHANG, Zixuan ZHAO, Zhengyu ZHU, Xiaobo ZHANG
    2025, 45(4):  1317-1324.  DOI: 10.11772/j.issn.1001-9081.2024040527
    Asbtract ( )   HTML ( )   PDF (4304KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Current fire and smoke detection methods mainly rely on site inspection by staff, which results in low efficiency and poor real-time performance, so an efficient fire and smoke detection algorithm for complex scenarios based on YOLOv5s, called YOLOv5s-MRD (YOLOv5s-MPDIoU-RevCol-Dyhead), was proposed. Firstly, the MPDIoU (Maximized Position-Dependent Intersection over Union) method was employed to modify the border loss function, thereby enhancing the accuracy and efficiency of Bounding Box Regression (BBR) by adapting to BBR in overlapping or non-overlapping scenarios. Secondly, the RevCol (Reversible Column) network model concept was applied to reconstruct the backbone of YOLOv5s, transforming it into a backbone network with multi-column network architecture. At the same time, by incorporating reversible links across various layers of the model, so that the retention of feature information was maximized, thereby improving the network’s feature extraction capability. Finally, with the integration of Dynamic head detection heads, scale awareness, spatial awareness, and task awareness were unified, thereby improving detection heads’ accuracy and effectiveness significantly without additional computational cost. Experimental results demonstrate that on DFS (Data of Fire and Smoke) dataset, compared to the original YOLOv5s algorithm, the proposed algorithm achieves a 9.3% increase in mAP@0.5 (mean Average Precision), a 6.6% improvement in prediction accuracy, and 13.8% increase in recall. It can be seen that the proposed algorithm can meet the requirements of current fire and smoke detection application scenarios.

    Multi-scale 2D-Adaboost microscopic image recognition algorithm of Chinese medicinal materials powder
    Yiding WANG, Zehao WANG, Yaoli LI, Shaoqing CAI, Yuan YUAN
    2025, 45(4):  1325-1332.  DOI: 10.11772/j.issn.1001-9081.2024040438
    Asbtract ( )   HTML ( )   PDF (3858KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    A multi-scale 2D-Adaboost algorithm was proposed to solve the problem that the microscopic images of Chinese medicinal materials powder contain a large number of fine features and background interference factors, which leads to excessive changes in the same medicinal materials (large differences within the class) and too similar features among various medicinal materials (small differences between the classes). Firstly, a global-local feature fusion backbone network architecture was constructed to extract multi-scale features better. By combining the advantages of Transformer and Convolutional Neural Network (CNN), this architecture was able to extract and fuse global and local features at various scales effectively, thereby improving the feature capture capability of the backbone network significantly. Secondly, the single-scale output of Adaboost was extended to multi-scale output, and a 2D-Adaboost structure-based background suppression module was constructed. With this module, the output feature maps of each scale of the backbone network were divided into foreground and background, thereby suppressing feature values of the background region effectively and enhancing the strength of discriminative features. Finally, an extra classifier was added to each scale of the 2D-Adaboost structure to build a feature refinement module, which coordinated the collaborative learning among the classifiers by controlling temperature parameters, thereby refining the feature maps of different scales gradually, helping the network to learn more appropriate feature scales, and enriching the detailed feature representation. Experimental results show that the recognition accuracy of the proposed algorithm reaches 96.85%, which is increased by 7.56, 5.26, 3.79 and 2.60 percentage points, respectively, compared with those of ConvNeXt-L, ViT-L, Swin-L, and Conformer-L models. The high accuracy and stability of the classification validate the effectiveness of the proposed algorithm in classification tasks of Chinese medicinal materials powder microscopic images.

    Cervical cell nucleus image segmentation based on multi-scale guided filtering
    Xinyao LINGHU, Yan CHEN, Pengcheng ZHANG, Yi LIU, Zhiguo GUI, Wei ZHAO, Zhanhao DONG
    2025, 45(4):  1333-1339.  DOI: 10.11772/j.issn.1001-9081.2024040546
    Asbtract ( )   HTML ( )   PDF (2232KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problems such as lack of contextual information connection, inaccurate and low-precision segmentation of cervical cell nucleus images, a cervical cell nucleus segmentation network named DGU-Net (Dense-Guided-UNet) was proposed on the basis of improved U-net combined with dense block and U-shaped convolutional multi-scale guided filtering module, which could segment cervical cell nucleus images more completely and accurately. Firstly, the U-net model with encoder and decoder structures was used as backbone of the network to extract image features. Secondly, the dense block module was introduced to connect the features between different layers, so as to realize transmission of contextual information, thereby enhancing feature extraction ability of the model. Meanwhile, the multi-scale guided filtering module was introduced after each downsampling and before each upsampling to introduce obvious edge detail information in the grayscale guided image for enhancement of the image details and edge information. Finally, a side output layer was added to each decoder path, so as to fuse and average all the output feature information, thereby fusing the feature information of different scales and levels to increase accuracy and completeness of the results. Experiments were conducted on Herlev dataset and the proposed network was compared with three deep learning models: U-net, Progressive Growing of U-net+ (PGU-net+), and Lightweight Feature Attention Network (LFANet). Results show that compared with PGU-net+, DGU-Net increases the accuracy by 70.06%; compared with LFANet, DGU-Net increases the Intersection-over-Union (IoU) by 6.75%. It can be seen that DGU-Net is more accurate in processing edge detail information, and outperforms the comparison models in segmentation indicators generally.

    Frontier and comprehensive applications
    Construction and application of knowledge graph for epidemiological investigation
    Zixin XU, Xiuwen YI, Jie BAO, Tianrui LI, Junbo ZHANG, Yu ZHENG
    2025, 45(4):  1340-1348.  DOI: 10.11772/j.issn.1001-9081.2024040479
    Asbtract ( )   HTML ( )   PDF (5297KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Major sudden infectious diseases are often characterized by high infectivity, rapid mutation and significant risk, which pose substantial threats to human life security and economic development. Epidemiological investigation is a crucial step in curbing the spread of infectious diseases and are prerequisites for implementing precise full-chain infection prevention and control measures. Existing epidemiological investigation systems have many shortcomings, such as manual inefficiencies, poor data quality, and lack of specialized knowledge. To address these defects, a set of technological application schemes were proposed to assist in epidemiological investigation based on the existing digitization combined with knowledge graph. Firstly, a knowledge graph was constructed on the basis of the five categories of entities: people, locations, events, items, and organizations, as well as their relationships and attributes. Secondly, following the idea of identifying risk points and tracing to close contacts based on cases, cases were used as the starting point with points as the focuses to aid in determining at-risk populations and points risk. Finally, through the visual analysis of epidemiological investigation data, several applications were implemented, including information placement in epidemiological investigation, tracing of the spread and propagation, and the awareness of epidemic situations, so as to assist in the successful implementation of major sudden infectious disease prevention and control work. Within the same error range, the accuracy of the graph enhancement-based trajectory placement method is significantly higher than that of the traditional manual inquiry-based method, with the determination accuracy within one kilometer reached 85.15%; the graph enhancement-based method for determining risk points and populations improves the efficiency significantly, reducing the average time to generate reports to within 1 h. Experimental results demonstrate that the proposed scheme integrates the technical advantages of knowledge graph effectively, improves the scientific nature and effectiveness of precise epidemic prevention and control strategy formulation, and provides important reference value for practical exploration in the field of infectious disease prevention.

    Stability analysis of nonlinear time-delay system based on memory-based saturation controller
    Chao GE, Shuiqing YE, Hong WANG, Zheng YAO
    2025, 45(4):  1349-1355.  DOI: 10.11772/j.issn.1001-9081.2024030406
    Asbtract ( )   HTML ( )   PDF (1231KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Exponential stability of nonlinear systems with time delay under the action of memory-based saturation controller was studied. Firstly, the factors of system parameter uncertainty were considered. Secondly, the polytopic method with distributed time-delay term auxiliary feedback was used to deal with the saturated nonlinearity. At the same time, an augmented Lyapunov-Krasovskii functional was established, and the integral terms were scaled by the improved integral inequality, thus the stability criteria based on Linear Matrix Inequalities (LMI) were derived. In addition, an attraction domain optimization scheme with less conservativeness was developed to increase the upper bound of the attraction domain. Finally, a simulation example was given to prove the effectiveness and practicability of the proposed scheme. Experimental results show that compared with the existing attraction domain optimization scheme without memory-based controller, the proposed attraction domain optimization scheme with memory-based controller is less conservative.

    Power work order classification in substation area based on MiniRBT-LSTM-GAT and label smoothing
    Jiaxin LI, Site MO
    2025, 45(4):  1356-1362.  DOI: 10.11772/j.issn.1001-9081.2024040533
    Asbtract ( )   HTML ( )   PDF (1024KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Record of power work orders in substation area serves as a reflection of substation operational conditions and user requirements, and is an important basis for establishing substation’s electricity safety management system and meeting the electricity demands of users. To address the issues of power work order classification in substation areas brought by high complexity and strong professionalism of the orders, a power work order classification in substation area, Mini RoBERTa-Long Short-Term Memory-Graph Attention neTwork (MiniRBT-LSTM-GAT) was proposed. Label Smoothing (LS) and a pre-trained language model were integrated by the proposed model. Firstly, a pre-trained model was utilized to calculate the character-level feature vector representation in the power work order text. Secondly, Bidirectional Long Short-Term Memory (BiLSTM) network was employed to capture the dependency within the power text sequence. Thirdly, Graph Attention neTwork (GAT) was applied to emphasize the feature information that contributes to text classification significantly. Finally, LS was used to modify the loss function, so as to improve the classification accuracy of the model. The proposed model was compared with mainstream text classification algorithms on Power Work Order dataset in Rural power Station area (RSPWO), 95598 Power Work Order dataset in ZheJiang province (ZJPWO), and THUCNews (TsingHua University Chinese News) dataset. Experimental results show that compared with Bidirectional Encoder Representations from Transformers (BERT) model for Electric Power Audit Text classification (EPAT-BERT), the proposed model has an increase of 2.76 percentage points in precision and 2.02 percentage points in F1 value on RSPWO, and has an increase of 1.77 percentage points in precision and 1.40 percentage points in F1 value on ZJPWO. In comparison with capsule network based on BERT and dependency syntax (BRsyn-caps), the proposed model has an increase of 0.76 percentage points in precision and 0.71 percentage points in accuracy on THUCNews dataset. The above confirms the effectiveness of the proposed model in enhancing the classification performance of power work orders in substation area, and the good performance of the proposed model on THUCNews dataset, verifying the generality of the model.

2025 Vol.45 No.4

Current Issue
Archive
Honorary Editor-in-Chief: ZHANG Jingzhong
Editor-in-Chief: XU Zongben
Associate Editor: SHEN Hengtao XIA Zhaohui
Domestic Post Distribution Code: 62-110
Foreign Distribution Code: M4616
Address:
No. 9, 4th Section of South Renmin Road, Chengdu 610041, China
Tel: 028-85224283-803
  028-85222239-803
Website: www.joca.cn
E-mail: bjb@joca.cn
WeChat
Join CCF