Journal of Computer Applications

Review of mean field theory for deep neural network

Mengmei YAN, Dongping YANG

2024, 44(2): 331-343. DOI: 10.11772/j.issn.1001-9081.2023020166

Asbtract ( )

HTML ( )

PDF (1848KB) ( )

Figures and Tables | References | Related Articles | Metrics

Mean Field Theory （MFT） provides profound insights to understand the operation mechanism of Deep Neural Network （DNN）， which can theoretically guide the engineering design of deep learning. In recent years， more and more researchers have started to devote themselves into the theoretical study of DNN， and in particular， a series of works based on mean field theory have attracted a lot of attention. To this end， a review of researches related to mean field theory for deep neural networks was presented to introduce the latest theoretical findings in three basic aspects： initialization， training process， and generalization performance of deep neural networks. Specifically， the concepts， properties and applications of edge of chaos and dynamical isometry for initialization were introduced， the training properties of overparameter networks and their equivalence networks were analyzed， and the generalization performance of various network architectures were theoretically analyzed， reflecting that mean field theory is a very important basic theoretical approach to understand the mechanisms of deep neural networks. Finally， the main challenges and future research directions were summarized for the investigation of mean field theory in the initialization， training and generalization phases of DNN.

Incentive mechanism for federated learning based on generative adversarial network

Sunjie YU, Hui ZENG, Shiyu XIONG, Hongzhou SHI

2024, 44(2): 344-352. DOI: 10.11772/j.issn.1001-9081.2023020244

Asbtract ( )

HTML ( )

PDF (2639KB) ( )

Figures and Tables | References | Related Articles | Metrics

Focused on the current lack of fair and reasonable incentive mechanism for federated learning， and the difficulty in measuring the contribution to federated learning by participant nodes with different data volumes， different data qualities， and different data distributions， a new incentive mechanism for federated learning based on Generative Adversarial Network （GAN） was proposed. Firstly， a GAN with Trained model （GANT） was proposed to achieve high-precision sample generation. Then， the contribution evaluation algorithm of the incentive mechanism was implemented based on GANT. The algorithm filtered samples and generated data labels through the joint model， and introduced the local data labels of the participant nodes to balance the impact of non-independent identically distributed data labels on the contribution evaluation. Finally， a two-stage Stackelberg game was used to realize the federated learning incentive process. The security analysis results show that the proposed incentive mechanism ensures data security and system stability in the process of federated learning. The experimental results show that the proposed incentive mechanism is correct， and the contribution evaluation algorithm has good performance under different data volumes， different data qualities and different data distributions.

Deep subspace clustering based on multiscale self-representation learning with consistency and diversity

Zhuo ZHANG, Huazhu CHEN

2024, 44(2): 353-359. DOI: 10.11772/j.issn.1001-9081.2023030275

Asbtract ( )

HTML ( )

PDF (1205KB) ( )

Figures and Tables | References | Related Articles | Metrics

Deep Subspace Clustering （DSC） is based on the assumption that the original data lies in a collection of low-dimensional nonlinear subspaces. In the multi-scale representation learning methods for deep subspace clustering， based on deep auto-encoder， fully connected layers are added between the encoder and the corresponding decoder for each layer to capture multi-scale features， without deeply analyzing the nature of multi-scale features and considering the multi-scale reconstruction loss between input data and output data. In order to solve the above problems， firstly， the reconstruction loss function of each network layer was established to supervise the learning of encoder parameters at different levels； then， a more effective multi-scale self-representation module was proposed based on the block diagonality of the sum of the common self-representation matrix and the unique self-representation matrices for multi-scale features； finally， the diversity of unique self-representation matrices for different scale features was analyzed in depth and the multi-scale feature matrices were used effectively. On this basis， an MSCD-DSC （Multiscale Self-representation learning with Consistency and Diversity for Deep Subspace Clustering） method was proposed. Experimental results on the datasets Extended Yale B， ORL， COIL20 and Umist show that， compared to the suboptimal method MLRDSC （Multi-Level Representation learning for Deep Subspace Clustering）， the clustering error rate of MSCD-DSC is reduced by 15.44%， 2.22%， 3.37%， and 13.17%， respectively， indicating that the clustering effect of MSCD-DSC is better than those of the existing methods.

Fake review detection algorithm combining Gaussian mixture model and text graph convolutional network

Xing WANG, Guijuan LIU, Zhihao CHEN

2024, 44(2): 360-368. DOI: 10.11772/j.issn.1001-9081.2023020219

Asbtract ( )

HTML ( )

PDF (4451KB) ( )

Figures and Tables | References | Related Articles | Metrics

For insufficient edge weight window threshold design in Text Graph Convolutional Network （Text GCN）， to mine the word association structure more accurately and improve prediction accuracy， a fake review detection algorithm combining Gaussian Mixture Model （GMM） and Text GCN named F-Text GCN was proposed. The edge signal strength of fake reviews that are relatively weak compared to normal reviews in training data size was improved by using GMM nature to separate noise edge weight distributions. Additionally， considering the diversity of information sources， the adjacency matrix was constructed by combing documents， words， reviews and non-text features. Finally， the fake review association structure of the adjacency matrix was extracted through spectral decomposition of Text GCN. Validation experiments were performed on 126 086 actual Chinese reviews collected by a large domestic e-commerce platform. Experimental results show that， for detecting fake reviews， the F1 value of F-Text GCN is 82.92%， outperforming BERT （Bidirectional Encoder Representation from Transformers） and Text CNN by 10.46% and 11.60%， respectively， the F1 of F-Text GCN is 2.94% higher than that of Text GCN. For highly imitated fake reviews which are challenging to detect， F-Text GCN achieves the overall prediction accuracy of 94.71% by secondary detection on the samples that Support Vector Machine （SVM） was difficult to detect， which is 2.91% and 14.54% higher than those of Text GCN and SVM. Based on study findings， lexical interference in consumer decision-making is evident in fake reviews’ second-order graph neighbor structure. This result indicates that the proposed algorithm is especially suitable for extracting long-range word collocation structures and global sentence feature pattern variations for fake reviews detection.

Multimodal emotion recognition method based on multiscale convolution and self-attention feature fusion

Tian CHEN, Conghu CAI, Xiaohui YUAN, Beibei LUO

2024, 44(2): 369-376. DOI: 10.11772/j.issn.1001-9081.2023020185

Asbtract ( )

HTML ( )

PDF (2138KB) ( )

Figures and Tables | References | Related Articles | Metrics

Emotion recognition based on physiological signals is affected by noise and other factors， resulting in low accuracy and weak cross-individual generalization ability. Concerning the issue， a multimodal emotion recognition method based on ElectroEncephaloGram （EEG）， ElectroCardioGram （ECG）， and eye movement signals was proposed. Firstly， physiological signals were performed multi-scale convolution to obtain higher-dimensional signal features and reduce parameter size. Secondly， self-attention was employed in the fusion of multimodal signal features to enhance the weights of key features and reduce feature interference between modalities. Finally， a Bi-directional Long Short-Term Memory （Bi-LSTM） network was used for extraction of temporal information of fused features and classification. Experimental results show that， the proposed method achieves recognition accuracies of 90.29%， 91.38%， and 83.53% for valence， arousal， and valence/arousal four-class recognition tasks， respectively， with improvements of 3.46-7.11 and 0.92-3.15 percentage points compared to the EEG single-modality and EEG+ECG bimodal methods. The proposed method can accurately recognize emotion with better recognition stability between individuals.

Entity category enhanced nested named entity recognition in automotive domain

Ziqi HUANG, Jianpeng HU

2024, 44(2): 377-384. DOI: 10.11772/j.issn.1001-9081.2023020239

Asbtract ( )

HTML ( )

PDF (1347KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem of poor recognition of nested entities and long entities in the Chinese automotive domain entity extraction task， an Entity Category Enhanced nested Named Entity Recognition （ECE-NER） model was proposed. Firstly， the model’s perception of domain entity boundaries was improved based on feature fusion encoding. Then， the tail word recognition module was used to obtain the entity tail word set by multi-layer perceptron. Finally， the forward boundary recognition module was used to obtain entity category-enhanced entity representation of candidate tail words， based on the sememe-constructed entity category features and self-attention mechanism. By fusing domain entity category features， a biaffine encoder was used to calculate the entity span probabilities of the specific tail words in order to determine the named entities. The experimental evaluation was carried out on the failure dataset of the automobile production line， the failure extraction and evaluation dataset of the automobile industry CCL2022， and the Chinese medical text dataset CHIP2020. The experimental results on the first two datasets show that ECE-NER model increases F1 value by 4.1， 1.8， 1.6 percentage points and 9.0， 5.4， 7.3 percentage points respectively compared with the baseline models including the sequence labeling model （BERT+BiLSTM+CRF） and the span-based entity extraction models （PURE（Princeton University Relation Extraction）， SpERT（Span-based Entity and Relation Transformer））. Especially， ECE-NER model increases F1 value of nested entity recognition by 13.3， 8.3 and 21.7， 9.3 percentage points in the first and third datasets compared to PURE and SpERT models. The experimental results verify the effectiveness of the proposed model on the recognition of nested entities.

Chinese medical named entity recognition based on self-attention mechanism and lexicon enhancement

Xinran LUO, Tianrui LI, Zhen JIA

2024, 44(2): 385-392. DOI: 10.11772/j.issn.1001-9081.2023020179

Asbtract ( )

HTML ( )

PDF (2158KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the difficulty of word boundary recognition stemming from nested entities in Chinese medical texts， as well as significant semantic information loss in existing Lattice-LSTM structures with integrated lexical features， an adaptive lexical information enhancement model for Chinese Medical Named Entity Recognition （MNER） was proposed. First， the BiLSTM （Bi-directional Long-Short Term Memory） network was utilized to encode the contextual information of the character sequence and capture the long-distance dependencies. Next， potential word information of each character was modeled as character-word pairs， and the self-attention mechanism was utilized to realize internal interactions between different words. Finally， a lexicon adapter based on bilinear-attention mechanism was used to integrate lexical information into each character in the text sequence， enhancing semantic information effectively while fully utilizing the rich boundary information of words and suppressing words with low correlation. Experimental results demonstrate that the average F1 value of the proposed model increases by 1.37 to 2.38 percentage points compared to the character-based baseline model， and its performance is further optimized when combined with BERT.

High-precision entity and relation extraction in medical domain based on pseudo-entity data augmentation

Andi GUO, Zhen JIA, Tianrui LI

2024, 44(2): 393-402. DOI: 10.11772/j.issn.1001-9081.2023020143

Asbtract ( )

HTML ( )

PDF (4228KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problems of dense knowledge and the propagation of error during entity extraction and relation classification in medical domain， a high-precision entity and relation extraction framework based on pseudo-entity data augmentation was proposed. First， a Transformer-based feature reading unit was added in the entity extraction module to capture category information for accurately identifying medical long entities among dense entities. Second， a relation negative example generation module was inserted into the pipeline extraction framework， pseudo-entities were generated for confusing relation classification model by an under-sampling-based pseudo-entity generation model， and three data augmentation generation strategies were proposed to improve the model’s ability to identify subject-object reversal， subject-object boundary errors， and relation classification errors. Finally， the problem of the sharp increase in training time caused by data enhancement was alleviated by the levitated-marker-based relation classification model. On CMeIE dataset， four mainstream models were compared with the proposed model. For entity extraction tasks， the proposed model improved the F1 value by 2.26% compared with suboptimal model PL-Marker（Packed Levitated Marker）， while for entity relation extraction tasks， the proposed medel improved the F1 value by 5.45% and the precision by 15.62% compared with suboptimal pipeline extraction model proposed by CBLUE （Chinese Biomedical Language Understanding Evaluation）. The experimental results show that using both the feature reading unit and the pseudo-entity data enhancement module can effectively improve the precision of extraction.

Violent crime hierarchy algorithm by joint modeling of improved hierarchical attention network and TextCNN

Jiawei ZHANG, Guandong GAO, Ke XIAO, Shengzun SONG

2024, 44(2): 403-410. DOI: 10.11772/j.issn.1001-9081.2023030270

Asbtract ( )

HTML ( )

PDF (1110KB) ( )

Figures and Tables | References | Related Articles | Metrics

A text classification method in Natural Language Processing （NLP） was introduced into the field of criminal psychology to scientifically and intelligently grade the violent tendencies of prisoners. A Criminal semantic Convolutional Hierarchical Attention Network （CCHA-Net） based on the joint modeling of two channels of improved HAN （Hierarchy Attention Network） and TextCNN （Text Convolutional Neural Network） was proposed to complete the violent criminal temperament grade by separately mining the semantic information of crime facts and basic information of prisoners. Firstly， Focal Loss was used to simultaneously replace the Cross-Entropy function in both channels to optimize the sample size imbalance problem. Secondly， in the two-channel input layer， positional encoding was simultaneously introduced to improve the perception of positional information. The HAN channel was improved by using max-pooling to construct salient vectors. Finally， global average pooling was used to replace the fully connected method in all output layers to avoid overfitting. Experimental results show that compared with 17 related baseline models such as AC-BiLSTM （Attention-based Bidirectional Long Short-Term Memory with Convolution layer） and Support Vector Machine （SVM）， the indicators of CCHA-Net reach the best， the micro-average F1 （Micro_F1） is 99.57%， and the Area Under the Curve （AUC） under the macro-average and the micro-average are 99.45% and 99.89%， respectively， which are 4.08， 5.59 and 0.74 percentage points higher than those of the suboptimal AC-BiLSTM. It can be verified that the violent criminal temperament grade task can be effectively performed by CCHA-Net.

Classification method for traditional Chinese medicine electronic medical records based on heterogeneous graph representation

Kaitian WANG, Qing YE, Chunlei CHENG

2024, 44(2): 411-417. DOI: 10.11772/j.issn.1001-9081.2023030260

Asbtract ( )

HTML ( )

PDF (1643KB) ( )

Figures and Tables | References | Related Articles | Metrics

Traditional Chinese Medicine （TCM） electronic medical records face challenges in data mining， low utilization rates， and difficulty in extracting meaningful information due to their complex and diverse structures， as well as non-standard diagnosis and treatment terminology. To address these issues， a TCM electronic medical record classification model called TCM-GCN was proposed based on Linguistically-motivated bidirectional Encoder Representation from Transformer （LERT） pre-training model and Graph Convolutional Network （GCN）， and represented by a heterogeneous graph. The model was used to improve the extraction and classification of effective features in TCM electronic medical records. Firstly， the medical records were converted into sentence vectors using the word embedding method of the LERT layer and integrated into the heterogeneous graph to complement the overall semantic features that were missing in the graph structure. Next， to mitigate the negative impact of the structural characteristics on feature extraction， keywords were added to the nodes of the heterogeneous graph. The BM25 and Pointwise Mutual Information （PMI） algorithms were employed to construct edges representing the features of medical records， such as “medical record - keyword” and “keyword - keyword”. Finally， the task of medical record classification was completed by TCM-GCN， relying on the heterogeneous graph constructed by using LERT-BM25-PMI to aggregate and extract the feature relationships between medical records. Experimental results on the TCM electronic medical record dataset show that， compared to the suboptimal LERT， TCM-GCN achieves improvements of 2.24%， 2.38%， and 2.32% in accuracy， recall， and F1 value， respectively， after applying a weighted average， which confirms the effectiveness of the algorithm in capturing hidden features in medical records and classifying TCM electronic medical records.

Text punctuation restoration for Vietnamese speech recognition with multimodal features

Hua LAI, Tong SUN, Wenjun WANG, Zhengtao YU, Shengxiang GAO, Ling DONG

2024, 44(2): 418-423. DOI: 10.11772/j.issn.1001-9081.2023020231

Asbtract ( )

HTML ( )

PDF (3010KB) ( )

Figures and Tables | References | Related Articles | Metrics

The text sequence output by the Vietnamese speech recognition system lacks punctuation， and punctuating the recognized text can help eliminate ambiguity and make it easier to understand. However， the punctuation restoration model based on text modality faces the problem of inaccurate punctuation prediction when dealing with noisy text， as errors in phonemes often occur in Vietnamese speech recognition systems， which can destroy the semantics of the text. A Vietnamese speech recognition text punctuation restoration method that utilizes multi-modal features was proposed， guided by intonation pauses and tone changes in Vietnamese speech to correctly predict punctuation for noisy text. Specifically， Mel-Frequency Cepstral Coefficients （MFCC） were used to extract speech features， pre-trained language models were used to extract text context features， and speech and text features were fused with label attention mechanism to fuse multi-modal features， thereby enhancing the model’s ability to learn contextual information from noisy Vietnamese text. Experimental results show that compared to punctuation restoration models that extract only text features based on Transformer and BERT （Bidirectional Encoder Representations from Transformers）， the proposed method improves the precision， recall， and F1 score on Vietnamese dataset by at least 10 percent points， demonstrating the effectiveness of fusing speech and text features in improving punctuation prediction accuracy for noisy Vietnamese speech recognition text.

Decoupling-fusing algorithm for multiple tasks with autonomous driving environment perception

Cunyi LIAO, Yi ZHENG, Weijin LIU, Huan YU, Shouyin LIU

2024, 44(2): 424-431. DOI: 10.11772/j.issn.1001-9081.2023020155

Asbtract ( )

HTML ( )

PDF (3609KB) ( )

Figures and Tables | References | Related Articles | Metrics

In the process of driving， autonomous vehicles need to complete target detection， instance segmentation and target tracking for pedestrians and vehicles at the same time. An environment perception model was proposed based on deep learning for multi-task learning of these three tasks simultaneously. Firstly， spatio-temporal features were extracted from continuous frame images by Convolutional Neural Network （CNN）. Then， the spatio-temporal features were decoupled and refused by attention mechanism， and differential selection of spatio-temporal features was achieved by making full use of the correlation between tasks. Finally， in order to balance the learning rates between different tasks， the model was trained by dynamic weighted average method. The proposed model was validated on KITTI dataset， and the experimental results show that the F1 score is increased by 0.6 percentage points in target detection compared with CenterTrack model， the Multiple Object Tracking Accuracy （MOTA） is increased by 0.7 percentage points in target tracking compared with TraDeS（Track to Detect and Segment） model， and the $A P 50$ and $A P 75$ are increased by 7.4 and 3.9 percentage points respectively in instance segmentation compared with SOLOv2 （Segmenting Objects by LOcations version 2） model.

Multi-robot reinforcement learning path planning method based on request-response communication mechanism and local attention mechanism

Fuqin DENG, Huifeng GUAN, Chaoen TAN, Lanhui FU, Hongmin WANG, Tinlun LAM, Jianmin ZHANG

2024, 44(2): 432-438. DOI: 10.11772/j.issn.1001-9081.2023020193

Asbtract ( )

HTML ( )

PDF (1916KB) ( )

Figures and Tables | References | Related Articles | Metrics

To reduce the blocking rate of multi-robot path planning in dynamic environments， a Distributed Communication and local Attention based Multi-Agent Path Finding （DCAMAPF） was proposed based on Actor-Critic deep reinforcement learning method framework， using request-response communication mechanism and local attention mechanism. In the Actor network， local observation and action information was requested by each robot from other robots in its field of view based on the request-response communication mechanism， and a coordinated action strategy was planned accordingly. In the Critic network， attention weights were dynamically allocated by each robot to the local observation and action information of other robots that had successfully responded within its field of view based on the local attention mechanism. The experimental results showed that， the blocking rate was reduced by approximately 6.91， 4.97， and 3.56 percentage points， respectively， in a discrete initialization environment， compared with traditional dynamic path planning methods such as D^* Lite， the latest distributed reinforcement learning method MAPPER， and the latest centralized reinforcement learning method AB-MAPPER （Attention and BicNet based MAPPER）； in a centralized initialization environment， the mean blocking rate was reduced by approximately 15.86， 11.71 and 5.54 percentage points； while the occupied computing cache was also reduced. Therefore， the proposed method ensures the efficiency of path planning and is applicable for solving multi-robot path planning tasks in different dynamic environments.

Path planning algorithm of manipulator based on path imitation and SAC reinforcement learning

Ziyang SONG, Junhuai LI, Huaijun WANG, Xin SU, Lei YU

2024, 44(2): 439-444. DOI: 10.11772/j.issn.1001-9081.2023020132

Asbtract ( )

HTML ( )

PDF (2673KB) ( )

Figures and Tables | References | Related Articles | Metrics

In the training process of manipulator path planning algorithm， the training efficiency of manipulator path planning is low due to the huge action space and state space leading to sparse rewards， and it becomes challenging to evaluate the value of both states and actions given the immense number of states and actions. To address the above problems， a robotic manipulator planning algorithm based on SAC （Soft Actor-Critic） reinforcement learning was proposed. The learning efficiency was improved by incorporating the demonstrated path into the reward function so that the manipulator imitated the demonstrated path during reinforcement learning， and the SAC algorithm was used to make the training of the manipulator path planning algorithm faster and more stable. The proposed algorithm and Deep Deterministic Policy Gradient （DDPG） algorithm were used to plan 10 paths respectively， and the average distances between paths planned by the proposed algorithm and the DDPG algorithm and the reference paths were 0.8 cm and 1.9 cm respectively. The experimental results show that the path imitation mechanism can improve the training efficiency， and the proposed algorithm can better explore the environment and make the planned paths more reasonable than DDPG algorithm.

Gait control method based on maximum entropy deep reinforcement learning for biped robot

Yuanchao LI, Chongben TAO, Chen WANG

2024, 44(2): 445-451. DOI: 10.11772/j.issn.1001-9081.2023020153

Asbtract ( )

HTML ( )

PDF (2699KB) ( )

Figures and Tables | References | Related Articles | Metrics

For the problem of gait stability control for continuous linear walking of a biped robot， a Soft Actor-Critic （SAC） gait control algorithm based on maximum entropy Deep Reinforcement Learning （DRL） was proposed. Firstly， without accurate robot dynamic model built in advance， all parameters were derived from joint angles without additional sensors. Secondly， the cosine similarity method was used to classify experience samples and optimize the experience replay mechanism. Finally， reward functions were designed based on knowledge and experience to enable the biped robot continuously adjust its attitude during the linear walking training process， and the reward functions ensured the robustness of straight walking. The proposed method was compared with other DRL methods such as PPO （Proximal Policy Optimization） and TRPO （Trust Region Policy Optimization） in Roboschool simulation environment. The results show that the proposed method not only achieves fast and stable linear walking of the biped robot， but also has better algorithmic robustness.

Visual analysis of multivariate spatio-temporal data for origin-destination flow

Siyi ZHOU, Tianrui LI

2024, 44(2): 452-459. DOI: 10.11772/j.issn.1001-9081.2023020178

Asbtract ( )

HTML ( )

PDF (3328KB) ( )

Figures and Tables | References | Related Articles | Metrics

Integrated Circuit （IC） card can record a resident’s mobile travel， reflecting the resident’s Origin-Destination （OD） information. However， due to the large scale of OD flow data， it is easy to cause visual clutter when visualizing the spatial distribution of OD flow directly. Moreover， multivariate data is difficult to be combined with flow data because it contains a variety of different types of data. To solve the problem that direct visualizing the spatial distribution of large-scale OD data is easy to cause visual occlusion， a flow clustering method based on Orthogonal Nonnegative Matrix Decomposition （ONMF） was proposed. The OD data was clustered before being visualized， so that unnecessary occlusion was reduced. For that it is difficult to combine and analyze multivariate spatio-temporal data with multiple types， a site multivariate time series data view for bus stop was designed. Bus stop flow and four types of multivariate data — air quality， air temperature， relative humidity， and rainfall were coded on the same time series， to improve the spatial utilization rate of the view， and could be compared and analyzed. To assist users to explore and analyze， an interactive visual analysis system was developed based on origin-destination flow and multivariate data， and a variety of interactive operations were designed to improve the efficiency of user exploration. Finally， based on the Singapore IC card dataset， the proposed clustering method was evaluated from clustering effect and running time. In the comparison experiment results， using silhouette coefficient to evaluate the clustering effect， the clustering effect of the proposed method is improved by 0.028 compared with the original method and 0.253 compared with K-means clustering method. The running time comparison results show that its running time is 254 seconds less than that of ONMFS （Orthogonal NMF through Subspace exploration） method with better clustering effect. The effectiveness of the system was verified by case analysis and system function comparison.

Group recommendation method based on implicit trust and group consensus

Tingting LI, Junfeng CHU, Yanyan WANG

2024, 44(2): 460-468. DOI: 10.11772/j.issn.1001-9081.2023030267

Asbtract ( )

HTML ( )

PDF (1711KB) ( )

Figures and Tables | References | Related Articles | Metrics

Focused on the issue that existing group recommendation methods take less account of the implicit estimation of socialization relationships among group members and the use of group consensus to reduce the influence of preference conflicts， a Group Recommendation method based on implicit Trust and group Consensus （GR-TC） was proposed. The method was divided into a recommendation phase and a consensus phase. In the recommendation phase， implicit trust values were mined based on preference information and social relationships among members. The members’ individual preferences and weights， and the initial group preferences were estimated. In the consensus phase， inconsistent members were identified by consensus measurement and identification rules， a maximum harmony optimization consensus model was built， and the group recommendation list was obtained by adjusting and updating the group preferences. Experimental results show that social relationships among members affect group recommendation results， reasonable selection of implicit trust weights improves the harmony of inconsistent members. Compared with the traditional consensus feedback mechanism， the implicit trust-induced maximum harmony consensus feedback mechanism has less adjustment cost and less impact on inconsistent members.

Two-stage recommendation algorithm of Siamese graph convolutional neural network

Zhiwen JING, Yujia ZHANG, Boting SUN, Hao GUO

2024, 44(2): 469-476. DOI: 10.11772/j.issn.1001-9081.2023020180

Asbtract ( )

HTML ( )

PDF (2896KB) ( )

Figures and Tables | References | Related Articles | Metrics

To solve the problem that the two-tower neural network in the recommendation system is difficult to learn the interaction information between the user side and the item side and the graph connection information， a new algorithm TSN （Two-stage Siamese graph convolutional Neural network recommendation algorithm） was proposed. First， a heterogeneous graph based on user behavior was built. Then， a graph convolutional Siamese network was designed between the two-tower neural networks， so as to achieve information interaction while learning the connection information of the heterogeneous graph. Finally， by designing a special structure of two-stage information sharing mechanism， the neural networks on the user side and the item side could transmit information dynamically and bidirectionally during the training process， and neural network cascading was effectively avoided. In comparative experiments on MovieLens and Douban movie datasets， the NDCG@10， NDCG@50， NDCG@100 of the proposed algorithm are 11.39% to 23.98% higher than those of the optimal benchmark algorithm DAT （Dual Augmented Two-tower model for online large-scale recommendation）. The results show that the proposed algorithm can alleviate the problem of lack of information interaction in the two-tower neural network； and significantly improves the recommendation performance compared with the previous algorithms.

Top-k high average utility sequential pattern mining algorithm under one-off condition

Keshuai YANG, Youxi WU, Meng GENG, Jingyu LIU, Yan LI

2024, 44(2): 477-484. DOI: 10.11772/j.issn.1001-9081.2023030268

Asbtract ( )

HTML ( )

PDF (519KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the issue that traditional Sequential Pattern Mining （SPM） does not consider pattern repetition and ignores the effects of utility （unit price or profit） and pattern length on user interest， a Top-k One-off high average Utility sequential Pattern mining （TOUP） algorithm was proposed. The TOUP algorithm mainly includes two core steps： average utility calculation and candidate pattern generation. Firstly， a CSP （Calculation Support of Pattern） algorithm based on the occurrence position of each item and the item repetition relation array was proposed to calculate pattern support， thereby achieving rapid calculation of the average utility of patterns. Secondly， candidate patterns were generated by itemset extension and sequence extension， and a maximum average utility upper bound was proposed. Based on this upper bound， effective pruning of candidate patterns was achieved. Experimental results on five real datasets and one synthetic dataset show that compared to the TOUP-dfs and HAOP-ms algorithms， TOUP algorithm reduces the number of candidate patterns by 38.5% to 99.8% and 0.9% to 77.6%， respectively， and decreases the running time by 33.6% to 97.1% and 57.9% to 97.2%， respectively. Therefore， the algorithm performance of TOUP is better， and it can mine patterns of interests to users more efficiently.

Attribute-based encryption scheme for blockchain privacy protection

Haifeng MA, Yuxia LI, Qingshui XUE, Jiahai YANG, Yongfu GAO

2024, 44(2): 485-489. DOI: 10.11772/j.issn.1001-9081.2023020173

Asbtract ( )

HTML ( )

PDF (1621KB) ( )

Figures and Tables | References | Related Articles | Metrics

To solve the security problems caused by the disclosure of blockchain ledgers， the key lies in the hiding of private information. An attribute-based encryption scheme with multiple authorities was proposed for privacy protection of blockchain data. Compared to single authority， multiple authorities are decentralized and avoid any single point of failure. First， the key component generation algorithm was modified， where each authority used the user identity as a parameter to generate private key components， preventing collusion between nodes to access unauthorized data. Then， identity-based signature technology was modified to establish a connection between user identities and wallet addresses， making the blockchain policeable and the illegal users traceable. Finally， based on the DBDH （Decisional Bilinear Diffie-Hellman） hypothesis， the safety of the proposed scheme was proved in random oracle model. The experimental results show that， compared with the blockchain privacy protection scheme based on the ring signature based on the elliptic curve and the blockchain privacy protection scheme supporting keyword forgetting search， the proposed scheme takes the least amount of time and is more feasible， when generating the same number of blocks.

Adversarial example detection algorithm based on quantum local intrinsic dimensionality

Yu ZHANG, Yan CHANG, Shibin ZHANG

2024, 44(2): 490-495. DOI: 10.11772/j.issn.1001-9081.2023020172

Asbtract ( )

HTML ( )

PDF (1918KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to solve the high time complexity problem of the adversarial example detection algorithm based on Local Intrinsic Dimensionality （LID）， combined with the advantages of quantum computing， an adversarial example detection algorithm based on quantum LID was proposed. First， the SWAP-Test quantum algorithm was used to calculate the similarity between the measured example and all examples in one time， avoiding the redundant calculation in the classical algorithm. Then Quantum Phase Estimation （QPE） algorithm and quantum Grover search algorithm were combined to calculate the local intrinsic dimension of the measured example. Finally， LID was used as the evaluation basis of the binary detector to detect and distinguish the adversarial examples. The detection algorithm was tested and verified on IRIS， MNIST， and stock time series datasets. The simulation experimental results show that the calculated LID values can highlight the difference between adversarial examples and normal examples， and can be used as a detection basis to differentiate example attributes. Theoretical research proves that the time complexity of the proposed detection algorithm is the same order of magnitude as the product of the number of iterations of Grover operator and the square root of the number of adjacent examples and the number of training examples， which is obviously better than that of the adversarial example detection algorithm based on LID and achieves exponential acceleration.

Interference trajectory publication based on improved glowworm swarm algorithm and differential privacy

Peng PENG, Zhiwei NI, Xuhui ZHU, Qian CHEN

2024, 44(2): 496-503. DOI: 10.11772/j.issn.1001-9081.2023030259

Asbtract ( )

HTML ( )

PDF (2085KB) ( )

Figures and Tables | References | Related Articles | Metrics

In view of the redundancy of dataset and the risk of privacy leakage caused by the similarity of track shape when the interference track was noised and publicated by the historical track， an IGSO-SDTP （Trajectory Protection of Simplification and Differential privacy of the track data based on Improved Glowworm Swarm Optimization） was proposed. Firstly， the historical trajectory dataset was reduced based on the position salient points. Secondly， the simplified trajectory dataset was generalized and noised by combining k-anonymity and differential privacy. Finally， a weighted distance was designed to take into account the distance error and track similarity， and the weighted distance was used as the evaluation index to solve the interference track with a small weighted distance based on IGSO （Improved Glowworm Swarm Optimization） algorithm. Experimental results on multiple datasets show that compared with the RD（Differential privacy for Raw trajectory data）， SDTP（Trajectory Protection of Simplification and Differential privacy）， LIC（Linear Index Clustering algorithm）， and DPKTS（Differential Privacy based on K-means Trajectory shape Similarity）， the weighted distances obtained by IGSO-SDTP are reduced by 21.94%， 9，15%， 14.25% and 10.55%， respectively. It can be seen that the interference trajectory publicated by IGSO-SDTP has better usability and stability.

Searchable electronic health record sharing scheme with user revocation

Zheng WANG, Jingwei WANG, Xinchun YIN

2024, 44(2): 504-511. DOI: 10.11772/j.issn.1001-9081.2023030272

Asbtract ( )

HTML ( )

PDF (1957KB) ( )

Figures and Tables | References | Related Articles | Metrics

With the rapid development and wide application of the Internet of Things （IoT） and cloud storage technology， an increasing number of sensor devices are deployed to the Internet of Medical Things （IoMT） system every year， which promotes the popularization of Electronic Health Record （EHR）. However， the secure storage and retrieval of EHRs have not been properly resolved. To address this problem， a searchable attribute-based encryption scheme with a fixed-length trapdoor was constructed for the search and verification of ciphertext， which reduced the communication overhead required by users. By adopting the online/offline encryption technology， the computing overhead on the user side was reduced. Meanwhile， with the help of chameleon hash function， a private key with the characteristics of anti-collision and semantical security was constructed， which avoided the problem of frequent updating of private keys of unrevoked users and greatly reduced the computing overhead of users. Theoretical analysis and experimental results show that the proposed scheme can resist chosen-plaintext attack under the Decisional Bilinear Diffie-Hellman （DBDH） assumption， and compared with other similar attribute based encryption schemes， the proposed scheme is more efficient， which supports online encryption， efficient user revocation， and has lower computational and storage overheads.

Electronic voting scheme based on SM2 threshold blind signature

Jintao RAO, Zhe CUI

2024, 44(2): 512-518. DOI: 10.11772/j.issn.1001-9081.2022121876

Asbtract ( )

HTML ( )

PDF (1208KB) ( )

Figures and Tables | References | Related Articles | Metrics

An electronic voting scheme based on SM2 blind threshold signature was proposed to address the security and efficiency issues in the algorithm protocol layer of domestic electronic election system. Firstly，the SM2 threshold blind signature algorithm was constructed based on the SM2 signature algorithm，the methods of Shamir secret sharing， Random Secret Sharing （RSS）， secret sum， difference and product sharing， and Inversion Secret Sharing （ISS） were used to share the secret private key and random number in SM2 signature algorithm without changing the original signature process. At the same time，a blinding factor was introduced to blind the message to be signed， achieving the privacy protection of the message sender and effective sharing of sensitive information. Secondly，the algorithm security analysis results show that， the constructed blind signature algorithm has blindness， robustness，and non-forgeability under the random oracle model. Compared with the existing RSA （Rivest-Shamir-Adleman） and Elliptic Curve Digital Signature Algorithm （ECDSA） threshold blind signature algorithms，the constructed SM2 threshold blind signature algorithm has the advantages of low computational complexity and small computational cost， making it suitable for large-scale elections. Finally， a secure electronic voting protocol was designed based on the SM2 threshold blind signature algorithm. The analysis results show that the proposed protocol has properties such as non-forgeability， confidentiality， legality， and robustness. Furthermore， a single voting process can be completed in just 15.706 1 ms.

High-efficiency dual-LAN Terahertz WLAN MAC protocol based on spontaneous data transmission

Zhi REN, Jindong GU, Yang LIU, Chunyu CHEN

2024, 44(2): 519-525. DOI: 10.11772/j.issn.1001-9081.2023020250

Asbtract ( )

HTML ( )

PDF (1941KB) ( )

Figures and Tables | References | Related Articles | Metrics

In the existing Dual LAN （Local Area Network） Terahertz Wireless LAN （Dual-LAN THz WLAN） related MAC （Medium Access Control） protocol， some nodes may repeatedly send the same Channel Time Request （CTRq） frame within multiple superframes to apply for time slot resources and idle time slots exist in some periods of network operation， therefore an efficient MAC protocol based on spontaneous data transmission SDTE-MAC （high-Efficiency MAC Protocol based on Spontaneous Data Transmission） was proposed. SDTE-MAC protocol enabled each node to maintain one or more time unit linked lists to synchronize with the rest of the nodes in the network running time， so as to know where each node started sending data frames at the channel idle time slot. The protocol optimized the traditional channel slot allocation and channel remaining slot reallocation processes， improved network throughput and channel slot utilization， reduced data delay， and could further improve the performance of Dual-LAN THz WLAN. The simulation results showed that when the network saturates， compared with the new N-CTAP （Normal Channel Time Allocation Period） slot resource allocation mechanism and adaptive shortening superframe period mechanism in the AHT-MAC （Adaptive High Throughout multi-pan MAC protocol）， the MAC layer throughput of the SDTE-MAC protocol was increased by 9.2%， the channel slot utilization was increased by 10.9%， and the data delay was reduced by 22.2%.

Design and implementation of component-based development framework for deep learning applications

Xiang LIU, Bei HUA, Fei LIN, Hongyuan WEI

2024, 44(2): 526-535. DOI: 10.11772/j.issn.1001-9081.2023020213

Asbtract ( )

HTML ( )

PDF (4596KB) ( )

Figures and Tables | References | Related Articles | Metrics

Concerning the current lack of effective development and deployment tools for deep learning applications， a component-based development framework for deep learning applications was proposed. The framework splits functions according to the type of resource consumption， uses a review-guided resource allocation scheme for bottleneck elimination， and uses a step-by-step boxing scheme for function placement that takes into account high CPU utilization and low memory overhead. The real-time license plate number detection application developed based on this framework achieved 82% GPU utilization in throughput-first mode， 0.73 s average application latency in latency-first mode， and 68.8% average CPU utilization in three modes （throughput-first mode， latency-first mode， and balanced throughput/latency mode）. The experimental results show that based on this framework， a balanced configuration of hardware throughput and application latency can be performed to efficiently utilize the computing resources of the platform in the throughput-first mode and meet the low latency requirements of the applications in the latency-first mode. Compared with MediaPipe， the use of this framework enabled ultra-real-time multi-person pose estimation application development， and the detection frame rate of the application was improved by up to 1 077%. The experimental results show that the framework is an effective solution for deep learning application development and deployment on CPU-GPU heterogeneous servers.

Survey on tile-based viewport adaptive streaming scheme of panoramic video

Junjie LI, Yumei WANG, Zhijun LI, Yu LIU

2024, 44(2): 536-547. DOI: 10.11772/j.issn.1001-9081.2023020209

Asbtract ( )

HTML ( )

PDF (2319KB) ( )

Figures and Tables | References | Related Articles | Metrics

Panoramic videos have attracted wide attention due to their unique immersive and interactive experience. The high bandwidth and low delay required for wireless streaming of panoramic videos have brought challenges to existing network streaming systems. Tile-based viewport adaptive streaming can effectively alleviate the streaming pressure brought by panoramic video， and has become the current mainstream scheme and hot research topic. By analyzing the research status and development trend of tile-based viewport adaptive streaming， the two important modules of this streaming scheme， namely viewport prediction and bit rate allocation， were discussed， and the methods in relevant fields were summarized from different perspectives. Firstly， based on the panoramic video streaming framework， the relevant technologies were clarified. Secondly， the user experience quality indicators to evaluate the performance of the streaming system were introduced from the subjective and objective dimensions. Then， the classic research methods were summarized from the aspects of viewport prediction and bit rate allocation. Finally， the future development trend of panoramic video streaming was discussed based on the current research status.

Weakly supervised action localization method with snippet contrastive learning

Weichao DANG, Lei ZHANG, Gaimei GAO, Chunxia LIU

2024, 44(2): 548-555. DOI: 10.11772/j.issn.1001-9081.2023020246

Asbtract ( )

HTML ( )

PDF (1549KB) ( )

Figures and Tables | References | Related Articles | Metrics

A weakly supervised action localization method， which integrated snippet contrastive learning， was proposed to address the issue of misclassification of snippets at action boundaries in existing attention-based methods. First， an attention mechanism with three branches was introduced to measure the possibility of each video frame being an action instance， context， or background. Second， the Class Activation Sequences （CAS） corresponding to each branch were constructed based on the obtained attention values. Then， positive and negative sample pairs were generated using a snippet mining algorithm. Finally， the network was guided through snippet contrastive learning to correctly classify hard snippets. Experimental results indicated that at an Intersection over Union （IoU） of 0.5， the mean Average Precisions （mAP） of the proposed method on THUMOS14 and ActivityNet1.3 datasets are 33.9% and 40.1% respectively， with improvements of 1.1 and 2.9 percentage points compared to the DGCNN （Dynamic Graph modeling for weakly-supervised temporal action localization Convolutional Neural Network） weakly supervised action localization model， validating the effectiveness of the proposed method.

Channel compensation algorithm for speaker recognition based on probabilistic spherical discriminant analysis

Weipeng JING, Qingxin XIAO, Hui LUO

2024, 44(2): 556-562. DOI: 10.11772/j.issn.1001-9081.2023020157

Asbtract ( )

HTML ( )

PDF (1543KB) ( )

Figures and Tables | References | Related Articles | Metrics

In speaker recognition tasks， the Probabilistic Linear Discriminant Analysis （PLDA） model is a commonly used classification backend. However， due to the inaccurate fitting of the real speaker feature distribution by the distribution assumption of Gaussian PLDA model， length normalization-based channel compensation methods based on the Gaussian distribution assumption may destroy the independence of the within-class distribution of speaker features， making the Gaussian PLDA unable to fully utilize the speaker information contained in the upstream task feature extraction， thereby affecting the recognition results. To address this issue， a Channel Compensation algorithm for speaker recognition based on Probabilistic Spherical Discriminant Analysis（CC-PSDA） was proposed， which introduced a Probabilistic Spherical Discriminant Analysis （PSDA） model with Von Mises-Fisher （VMF） distribution assumption and a feature transformation method to replace the PLDA method based on the Gaussian distribution assumption， for avoiding the impact of channel compensation on the independence of the within-class distribution of speaker features. Firstly，in order to make the speaker features conform to the VMF distribution prior assumption and fit the backend classification model，a nonlinear transformation was used to transform the distribution of the speaker features at the feature level. Then， by utilizing the characteristic of the PSDA model based on the VMF distribution assumption that does not destroy the within-class distribution structure of speaker features， the transformed speaker features were defined on a hypersphere of a specific dimension， maximizing the inter-class distance of features. The proposed model was solved by the EM （Expectation Maximum） algorithm， and the classification task was ultimately completed. Experimental results show that the improved algorithm has the lowest recognition equal error rates compared to the PSDA and Gaussian PLDA models on three test sets. Therefore， the proposed algorithm can effectively distinguish speaker features and improve recognition performance.

Infrared dim small target tracking method based on Siamese network and Transformer

Chenhui CUI, Suzhen LIN, Dawei LI, Xiaofei LU, Jie WU

2024, 44(2): 563-571. DOI: 10.11772/j.issn.1001-9081.2023020167

Asbtract ( )

HTML ( )

PDF (3513KB) ( )

Figures and Tables | References | Related Articles | Metrics

A method based on Siamese network and Transformer was proposed to address the low accuracy problem of infrared dim small target tracking. First， a multi-feature extraction cascading moduling was constructed to separately extract the deep features of the infrared dim small target template frame and the search frame， and concatenate them with their corresponding HOG features at the dimension level. Second， a multi-head attention mechanism Transformer was introduced to perform cross-correlation operations between the template feature map and the search feature map， generating a response map. Finally， the target’s center position in the image and the regression bounding box were obtained through the response map upsampling network and bounding box prediction network to complete the tracking of the infrared dim small targets. Test results on a dataset of 13 655 infrared images show that compared with KeepTrack tracking method， the success rate is improved by 5.9 percentage points and the precision is improved by 1.8 percentage points； compared with TransT （Transformer Tracking） method， the success rate is improved by 14.2 percentage points and the precision is improved by 14.6 percentage points. The proposed method is proved to be more accurate in tracking infrared dim small targets in complex backgrounds.

Improved image inpainting network incorporating supervised attention module and cross-stage feature fusion

Qiaoling HUANG, Bochuan ZHENG, Zicheng DING, Zedong WU

2024, 44(2): 572-579. DOI: 10.11772/j.issn.1001-9081.2023020123

Asbtract ( )

HTML ( )

PDF (4672KB) ( )

Figures and Tables | References | Related Articles | Metrics

Image inpainting techniques for non-regular missing regions are versatile but challenging. To address the problem that existing inpainting methods may produce artifacts， distorted structures， and blurred textures for high-resolution images， an improved image inpainting network， named Gconv_CS（Gated convolution based CSFF and SAM） incorporating Supervised Attention Module （SAM） and Cross-Stage Feature Fusion （CSFF） was proposed. In Gconv_CS， the SAM and CSFF were introduced to Cconv， a two-stage network model with gated convolution. SAM ensured the effectiveness of the incoming feature information to the next stage by providing a real image to supervise the output features of the previous stage. CSFF fused the features from the encoder-decoder of the previous stage and fed them to the encoder of the next stage to compensate for the loss of feature information in the previous stage. The experimental results show that， at a percentage of missing regions of 1% to 10%， compared with the baseline model Gconv， on CelebA-HQ dataset， Gconv_CS improved the Peak Signal-to-Noise Ratio （PSNR） and Structural SIMilarity index （SSIM） by 1.5% and 0.5% respectively， reduced Fréchet Inception Distance （FID） and L1 loss by 21.8% and 14.8% respectively； on Place2 dataset， the first two indicators increased by 26.7% and 0.8% respectively， and the latter two indicators decreased by 7.9% and 37.9% respectively. A good restoration effect was achieved when Gconv_CS was used to remove masks from a giant panda’s face.

Reconstruction algorithm for undersampled magnetic resonance images based on complex convolution dual-domain cascade network

Hualu QIU, Suzhen LIN, Yanbo WANG, Feng LIU, Dawei LI

2024, 44(2): 580-587. DOI: 10.11772/j.issn.1001-9081.2023020187

Asbtract ( )

HTML ( )

PDF (2360KB) ( )

Figures and Tables | References | Related Articles | Metrics

At present， most accelerated Magnetic Resonance Imaging （MRI） reconstruction algorithms reconstruct undersampled amplitude images and use real-value convolution for feature extraction， without considering that the MRI data itself is complex， which limits the feature extraction ability of MRI complex data. In order to improve the feature extraction ability of single slice MRI complex data， and thus reconstruct single slice MRI images with clearer details， a Complex Convolution Dual-Domain Cascade Network （ComConDuDoCNet） was proposed. The original undersampled MRI data was used as input， and Residual Feature Aggregation （RFA） blocks were used to alternately extract the dual domain features of the MRI data， ultimately reconstructing the Magnetic Resonance （MR） images with clear texture details. Complex convolution was used as a feature extractor for each RFA block. Different domains were cascaded through Fourier transform or inverse transform， and data consistency layer was added to achieve data fidelity. A large number of experiments were conducted on publicly available knee joint dataset. The comparison results with the Dual-task Dual-domain Network （DDNet） under three different sampling masks with a sampling rate of 20% show that： under the two-dimensional Gaussian sampling mask， the proposed algorithm decreases Normalized Root Mean Square Error （NRMSE） by 13.6%， increases Peak Signal-to-Noise Ratio （PSNR） by 4.3%， and increases Structural SIMilarity （SSIM） by 0.8%； under the Poisson sampling mask， the proposed algorithm decreases NRMSE by 11.0%， increases PSNR by 3.5%， and increases SSIM by 0.1%； under the radial sampling mask， the proposed algorithm decreases NRMSE by 12.3%， increases PSNR by 3.8%， and increases SSIM by 0.2%. The experimental results show that ComConDuDoCNet， combined with complex convolution and dual-domain learning， can reconstruct MR images with clearer details and more realistic visual effects.

Automatic preoperative planning algorithm for three-dimensional wedge osteotomy of radius

Zhiliang SHI, Shiqi LIAO, Zibo GAN, Shaobo ZHU

2024, 44(2): 588-594. DOI: 10.11772/j.issn.1001-9081.2022111716

Asbtract ( )

HTML ( )

PDF (2804KB) ( )

Figures and Tables | References | Related Articles | Metrics

For radial angulation deformity， it is difficult to accurately locate the osteotomy position only by experience， thus a three-Dimensional （3D） automatic planning algorithm for radial angulation wedge osteotomy was proposed to accurately determine the specific osteotomy position and calculate the best reset angle. Firstly， the contralateral radius mirror model with compensation difference was used as the reference template to calculate the bone deformity area. Secondly， the distal radius joint was registered based on the weight of the joint anatomical area to create the rotation axis direction vector， and the deformity contour curve of the XOZ plane was solved by the cubic spline interpolation method to determine the orientation of the rotation axis. Finally， the single-objective optimization algorithm was used to optimize the iteration， calculate the optimal osteotomy position and reset angle， and automatically generate the preoperative plan of wedge osteotomy. Six cases of radial angulation were selected to compare the registration accuracy of the joint anatomical area with the surgeon’s manual osteotomy planning method in 3D space as the experimental control group. Experimental results show that compared with manual osteotomy and reset by surgeons proposed by Miyake et al.， the Root Mean Square Error （RMSE） of the registration of the joint anatomical area obtained by the proposed algorithm is decreased by 0.09 to 0.42 mm； compared with the automatic planning method proposed by Fürnstahl et al.， the proposed algorithm can clarify the type of wedge and has higher clinical feasibility.

Competitive location model and algorithm of new energy vehicle battery recycling outlets

Yong LIU, Kun YANG

2024, 44(2): 595-603. DOI: 10.11772/j.issn.1001-9081.2023020182

Asbtract ( )

HTML ( )

PDF (1538KB) ( )

Figures and Tables | References | Related Articles | Metrics

To solve the competitive facility location problem of new energy vehicle battery recycling outlets considering queuing theory， an Improved Human Learning Optimization （IHLO） algorithm was proposed. First， the competitive facility location model of new energy vehicle battery recycling outlets was constructed， which included queuing time constraints， capacity constraints， threshold constraints and other constraints. Then， considering that this problem belongs to NP-hard problem， in view of the shortcomings of Human Learning Optimization （HLO） algorithm， such as low convergence speed，optimization accuracy and solving stability in the early stage， IHLO algorithm was proposed by adopting elite population reverse learning strategy， group mutual learning operator and adaptive strategy of harmonic parameter. Finally， taking Shanghai and the Yangtze River Delta as examples for numerical experiments， IHLO was compared with Improved Binary Grey Wolf Optimization （IBGWO） algorithm， Improved Binary Particle Swarm Optimization （IBPSO） algorithm， HLO and Human Learning Optimization based on Learning Psychology （LPHLO） algorithm. For large， medium and small scales， the experimental results show that IHLO algorithm has the best performance in 14 of the 15 indicators； compared with IBGWO algorithm， the solution accuracy of IHLO algorithm is improved by at least 0.13%， the solution stability is improved by at least 10.05%， and the solution speed is improved by at least 17.48%. The results show that the proposed algorithm has high computational accuracy and fast optimization speed， which can effectively solve the competitive facility location problem.

Gimbal system control algorithm of unmanned aerial vehicle based on extended state observer

Huzhen GAO, Changping DU, Yao ZHENG

2024, 44(2): 604-610. DOI: 10.11772/j.issn.1001-9081.2023020241

Asbtract ( )

HTML ( )

PDF (3518KB) ( )

Figures and Tables | References | Related Articles | Metrics

To handle the problem of variable coupling in Unmanned Aerial Vehicle （UAV） three-axis gimbal stabilization control， an UAV gimbal system control algorithm based on Extended State Observer （ESO） was proposed. Firstly， an attitude solution algorithm model for the desired angle of the UAV gimbal was developed. Secondly， serial PID （Proportional-Integral-Derivative） control loops of position and velocity were constructed. Finally， an ESO was introduced to estimate the angular velocity term online in real-time， which solves the problem that the angular velocity term is difficult to measure directly due to high coupling and multiple external disturbances， and the control input of each channel was compensated. The experimental results show that in scenarios including without command， with command， and composite tasks， the root mean square errors of the proposed algorithm for angle measurement are 0.235 7°， 0.631 7°， and 0.946 3°， respectively. Compared to the traditional PID algorithm， the proposed algorithm achieves angle error reduction rates of 69.43%， 53.29%， and 50.43%， respectively. The proposed algorithm exhibits greater resistance to disturbances and higher control accuracy.

Optimization strategy of tandem composite turbine energy storage based on self-adaptive particle swarm optimization algorithm

Zhen WANG, Shanshan ZHANG, Binyang WU, Wanhua SU

2024, 44(2): 611-618. DOI: 10.11772/j.issn.1001-9081.2023020197

Asbtract ( )

HTML ( )

PDF (4317KB) ( )

Figures and Tables | References | Related Articles | Metrics

A new Maximum Power Point Tracking （MPPT） method， based on Self-Adaptive Particle Swarm Optimization （SAPSO）， was proposed to address the energy storage challenge in engine tandem composite turbine power generation systems. A Hybrid Energy Storage System （HESS） was introduced to augment the power capture capability of the generation system and replace single battery storage， achieving efficient and stable electrical energy storage. A control simulation model of energy storage optimization based on tandem composite turbine power generation was established using Matlab/Simulink software. The power tracking performance for various control methods and the energy storage characteristics of hybrid energy storage systems were compared and analyzed under predetermined operating conditions. Simulation results reveal that the proposed SAPSO-MPPT method outperforms the conventional P&O （Perturbation and Observation） control method， increasing power generation by 190 W and reducing response time by 0.15 s. Additionally， HESS could effectively track the demand power on the busbar， achieving power recovery efficiency of 95.3% . Finally， a test platform for the tandem composite turbine power generation system was developed using a modified Y24 engine bench to validate the fuel-saving potential of the proposed energy storage optimized control strategy. The test findings indicate that the suggested SAPSO-MPPT+HESS energy storage optimization strategy improves energy recovery efficiency by 0.53 percentage points compared to the original engine.

Opinion propagation model considering user initiative and mobility

Yuanyuan MA, Leilei XIE, Nan DONG, Na LIU

2024, 44(2): 619-627. DOI: 10.11772/j.issn.1001-9081.2023020154

Asbtract ( )

HTML ( )

PDF (3396KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the issue of existing information diffusion models overlooking user subjectivity and social network dynamics， an SCBRD （Susceptible-Commented-Believed-Recovered-Defensed） opinion propagation model that considers user initiative and mobility in heterogeneous networks was proposed.Firstly， the basic reproduction number was determined using the next-generation matrix method， and the system’s dynamics and optimal control were investigated by applying Lyapunov’s stability theorem and Pontryagin’s principle. Then， a simulation analysis was performed based on BA （Barabási-Albert） scale-free network to identify the significant factors affecting the opinion propagation. The results reveal that users’ curiosity， forwarding behavior， and admission rate play dominant roles in information diffusion and the system has an optimal control solution. Finally， the model’s rationality was validated based on actual data. Compared to the SCIR （Susceptible-inCubation-Infective-Refractory） model， the SCBRD model improves fitting accuracy by 27.40% and reduces the Root Mean Square Error （RMSE） of prediction by 39.02%. Therefore， the proposed model can adapt to the complex and changing circumstances of information diffusion and provide better guidance for official public opinion regulation.

Performance evaluation of industry-university-research based on statistics and adaptive ParNet

Rui ZHANG, Siqi SONG, Jing HU, Yongmei ZHANG, Yanfeng CHAI

2024, 44(2): 628-637. DOI: 10.11772/j.issn.1001-9081.2023020196

Asbtract ( )

HTML ( )

PDF (3247KB) ( )

Figures and Tables | References | Related Articles | Metrics

The existing industry-university-research performance evaluation systems and methods have problems such as single coverage of evaluation indicators， insufficient expression of evaluation sample features， and self-optimization ability of evaluation models to be improved， the system and method of subjective and objective intelligent evaluation of industry-university-research comprehensive performance were proposed. Firstly， for the three-party cooperation subjects， the factors and the connections between these factors that affect performance in the process of industry-university-research cooperation were excavated， and the three-level subjective and objective performance evaluation system of industry-university-research was self-constructed. Secondly， the features expression of discrete samples was enhanced by mapping the collected discrete sequence evaluation samples to different high-dimensional spatial domains， such as polar coordinate space and Markov transfer matrix. Then， through the chaotic optimization strategy design based on elite reverse somersault foraging， the depth model redundancy compression and hyperparameter global optimization efficiency were improved， and the ParNet （Parallel Network） classification model with lightweight compression and high-dimensional superparameter Adaptive optimization （AParNet） was constructed. Finally， the model was applied to industry-university-research performance evaluation to achieve high-performance intelligent performance evaluation. The experimental results show that this method fits well with the applications of discrete sequence non-linear classification and improves the classification performance while reducing the computational load when an optimization strategy is added to the model. Specifically， compared to ParNet， AParNet reduces the number of parameters by 10.8%， effectively achieving model compression， and its classification accuracy in performance evaluation of industry-university-research cooperation can reach 98.6%. Therefore， in the applications of intelligent performance evaluation of industry-university-research cooperation， the proposed method improves the adaptive ability of evaluation model and achieves accurate and efficient industry-university-research performance evaluation.

New dish recognition network based on lightweight YOLOv5

Chenghanyu ZHANG, Yuzhe LIN, Chengke TAN, Junfan WANG, Yeting GU, Zhekang DONG, Mingyu GAO

2024, 44(2): 638-644. DOI: 10.11772/j.issn.1001-9081.2023030271

Asbtract ( )

HTML ( )

PDF (2914KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to better meet the accuracy and timeliness requirements of Chinese food dish recognition， a new type of dish recognition network was designed. The original YOLOv5 model was pruned by combining Supermask method and structured channel pruning method， and lightweighted finally by Int8 quantization technology. This ensured that the proposed model could balance accuracy and speed in dish recognition， achieving a good trade-off while improving the model portability. Experimental results show that the proposed model achieves a mean Average Precision （mAP） of 99.00% and an average recognition speed of 59.54 ms /frame at an Intersection over Union （IoU） of 0.5， which is 20 ms/frame faster than that of the original YOLOv5 model while maintaining the same level of accuracy. In addition， the new dish recognition network was ported to the Renesas RZ/G2L board by Qt. Based on this， an intelligent service system was constructed to realize the whole process of ordering， generating orders， and automatic meal distribution. A theoretical and practical foundation was provided for the future construction and application of truly intelligent service systems in restaurants.

Dynamic multi-domain adversarial learning method for cross-subject motor imagery EEG signals

Xuan CAO, Tianjian LUO

2024, 44(2): 645-653. DOI: 10.11772/j.issn.1001-9081.2023030286

Asbtract ( )

HTML ( )

PDF (3364KB) ( )

Figures and Tables | References | Related Articles | Metrics

Decoding motor imagery EEG （ElectroEncephaloGraphy） signal is one of the crucial techniques for building Brain Computer Interface （BCI） system. Due to EEG signal’s high cost of acquisition， large inter-subject discrepancy， and characteristics of strong time variability and low signal-to-noise ratio， constructing cross-subject pattern recognition methods become the key problem of such study. To solve the existing problem， a cross-subject dynamic multi-domain adversarial learning method was proposed. Firstly， the covariance matrix alignment method was used to align the given EEG samples. Then， a global discriminator was adapted for marginal distribution of different domains， and multiple class-wise local discriminators were adapted to conditional distribution for each class. The self-adaptive adversarial factor for multi-domain discriminator was automatically learned during training iterations. Based on dynamic multi-domain adversarial learning strategy， the Dynamic Multi-Domain Adversarial Network （DMDAN） model could learn deep features with generalization ability between cross-subject domains. Experimental results on public BCI Competition IV 2A and 2B datasets show that， DMDAN model improves the ability of learning domain-invariant features， achieving 1.80 and 2.52 percentage points higher average classification accuracy on dataset 2A and dataset 2B compared with the existing adversarial learning method Deep Representation Domain Adaptation （DRDA）. It can be seen that DMDAN model improves the decoding performance of cross-subject motor imagery EEG signals， and has generalization ability on different datasets.

Sleep physiological time series classification method based on adaptive multi-task learning

Yudan SONG, Jing WANG, Xuehui WANG, Zhaoyang MA, Youfang LIN

2024, 44(2): 654-662. DOI: 10.11772/j.issn.1001-9081.2023020191

Asbtract ( )

HTML ( )

PDF (1999KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the correlation problem between sleep stages and sleep apnea hypopnea， a sleep physiological time series classification method based on adaptive multi-task learning was proposed. Single-channel electroencephalogram and electrocardiogram were used for sleep staging and Sleep Apnea Hypopnea Syndrome （SAHS） detection. A two-stream time dependence learning module was utilized to extract shared features under joint supervision of the two tasks. The correlation between sleep stages and sleep apnea hypopnea was modeled by the adaptive inter-task correlation learning module with channel attention mechanism. The experimental results on two public datasets indicate that the proposed method can complete sleep staging and SAHS detection simultaneously. On UCD dataset， the accuracy， MF1（Macro F1-score）， and Area Under the receiver characteristic Curve （AUC） for sleep staging of the proposed method were 1.21 percentage points， 1.22 percentage points， and 0.008 3 higher than those of TinySleepNet； its MF2 （Macro F2-score）， AUC， and recall of SAHS detection were 11.08 percentage points， 0.053 7， and 15.75 percentage points higher than those of the 6-layer CNN model， which meant more disease segments could be detected. The proposed method could be applied to home sleep monitoring or mobile medical to achieve efficient and convenient sleep quality assessment， assisting doctors in preliminary diagnosis of SAHS.

Table of Content