Journal of Computer Applications

Personalized federated learning method based on dual stream neural network

Zheyuan SHEN, Keke YANG, Jing LI

2024, 44(8): 2319-2325. DOI: 10.11772/j.issn.1001-9081.2023081207

Asbtract ( )

HTML ( )

PDF (2185KB) ( )

Figures and Tables | References | Related Articles | Metrics

Classic Federated Learning （FL） algorithms are difficult to achieve good results in scenarios where data is highly heterogeneous. In Personalized FL （PFL）， a new solution was proposed aiming at the problem of data heterogeneity in federated learning， which is to “tailor” a dedicated model for each client. In this way， the models had good performance. However， it brought the difficulty in extending federated learning to new clients at the same time. Focusing on the challenges of performance and scalability in PFL， FedDual， a FL model with dual stream neural network structure， was proposed. By adding an encoder for analyzing the personalized characteristics of clients， this model was not only able to have the performance of personalized models， but also able to be extended to new clients easily. Experimental results show that compared to the classic Federated Averaging （FedAvg） algorithm on datasets such as MNIST and FashionMNIST， FedDual obviously improves the accuracy； on CIFAR10 dataset， FedDual improves the accuracy by more than 10 percentage points， FedDual achieves “plug and play” for new clients without decrease of the accuracy， solving the problem of difficult scalability for new clients.

Semi-supervised object detection framework guided by curriculum learning

Yingjun ZHANG, Niuniu LI, Binhong XIE, Rui ZHANG, Wangdong LU

2024, 44(8): 2326-2333. DOI: 10.11772/j.issn.1001-9081.2023081062

Asbtract ( )

HTML ( )

PDF (2042KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to enhance the quality of pseudo labels， address the issue of confirmation bias in Semi-Supervised Object Detection （SSOD）， and tackle the challenge of ignoring complexities in unlabeled data leading to erroneous pseudo labels in existing algorithms， an SSOD framework guided by Curriculum Learning （CL） was proposed. The framework consisted of two modules： the ICSD （IoU-Confidence-Standard-Deviation） difficulty measurer and the BP （Batch-Package） training scheduler. The ICSD difficulty measurer comprehensively considered information such as IoU （Intersection over Union） between pseudo-bounding boxes， confidence， class label， etc.，and the C_IOU （Checkpoint_IOU） method was introduced to evaluate the reliability of unlabeled data. The BP training scheduler designed two efficient scheduling strategies， starting from the perspectives of Batch and Package respectively， giving priority to unlabeled data with high reliability indicators to achieve full utilization of the entire unlabeled data set in the form of course learning. Extensive comparative experimental results on the Pascal VOC and MS-COCO datasets demonstrate that the proposed framework applies to existing SSOD algorithms and exhibits significant improvements in detection accuracy and stability.

Proximal policy optimization algorithm based on clipping optimization and policy guidance

Yi ZHOU, Hua GAO, Yongshen TIAN

2024, 44(8): 2334-2341. DOI: 10.11772/j.issn.1001-9081.2023081079

Asbtract ( )

HTML ( )

PDF (3877KB) ( )

Figures and Tables | References | Related Articles | Metrics

Addressing the two issues in the Proximal Policy Optimization （PPO） algorithm， the difficulty in strictly constraining the difference between old and new policies and the relatively low efficiency in exploration and utilization， a PPO based on Clipping Optimization And Policy Guidance （COAPG-PPO） algorithm was proposed. Firstly， by analyzing the clipping mechanism of PPO， a trust-region clipping approach based on the Wasserstein distance was devised， strengthening the constraint on the difference between old and new policies. Secondly， within the policy updating process， ideas from simulated annealing and greedy algorithms were incorporated， improving the exploration efficiency and learning speed of algorithm. To validate the effectiveness of COAPG-PPO algorithm， comparative experiments were conducted using the MuJoCo testing benchmarks between PPO based on Clipping Optimization （CO-PPO）， PPO with Covariance Matrix Adaptation （PPO-CMA）， Trust Region-based PPO with RollBack （TR-PPO-RB）， and PPO algorithm. The experimental results indicate that COAPG-PPO algorithm demonstrates stricter constraint capabilities， higher exploration and exploitation efficiencies， and higher reward values in most environments.

Rumor detection by fusing ambiguity in comment sequences and generating user privacy features

Wenfan MENG, Lihua ZHOU, Xiaoxu WANG

2024, 44(8): 2342-2350. DOI: 10.11772/j.issn.1001-9081.2023081176

Asbtract ( )

HTML ( )

PDF (3676KB) ( )

Figures and Tables | References | Related Articles | Metrics

There are some problems in existing rumor detection works， such as not fully integrating the information within propagation structure because of the deficiency of simultaneously capturing text semantic features and time periodic features in comment sequences and the inability to access the user personal profiles in a privacy-protected environment. To address the above problems， a Rumor Detection model fusing ambiguity in Comment Sequences and Generating User privacy features （RD-CSGU） was proposed. Text semantic features and time periodic features from different perspectives of comment sequences were comprehensively considered. Meanwhile， a heterogeneous network of rumor propagation for describing the social interaction relationship among users during the propagation process was constructed， based on which user privacy features were generated through a Generative Adversarial Network （GAN） based on the semantic relationships， overcoming the limitation of user personal profiles. The effectiveness of the proposed model was validated on Twitter15， Twitter16 and Weibo datasets. Compared with the suboptimal baseline model GLAN （Global-Local Attention Network）， RD-CSGU achieved improvements of 0.9， 2.2 and 1.8 percentage points in Accuracy （Acc）， as well as improvements of 2.6， 6.8 and 1.9 percentage points in TR （True Rumor）-F1 score. The results combined with those from ablation experiments and analysis of GAN-generated embeddings show that RD-CSGU can effectively detect rumor posts on social media platforms.

Personalized exercise recommendation based on cognitive diagnosis

Yike HAN, Bin XU, Shuo ZHANG

2024, 44(8): 2351-2356. DOI: 10.11772/j.issn.1001-9081.2023081205

Asbtract ( )

HTML ( )

PDF (1652KB) ( )

Figures and Tables | References | Related Articles | Metrics

A personalized exercise recommendation method that combines cognitive diagnosis and deep factorization machine was proposed to address the problems of single modeling angle and unreasonable exercise recommendation results of the existing exercise recommendation based on cognitive diagnosis. Firstly， a new method for calculating the relationship between knowledge points was designed to construct a course knowledge tree， and the concept of enhanced Q matrix to accurately represent the relationship between knowledge points contained in exercises was proposed. Secondly， the Neural Cognitive Diagnosis with Knowledge-based Discernment （NeuralCD-KD） model was proposed to calculate the enhanced Q matrix. In the model， the feature second-order cross and attention mechanism were used to fuse internal and external factors of exercise difficulty， and the students’ cognitive states were simulated. The effectiveness of the proposed cognitive diagnosis model was verified on private and public datasets， and this method was able to give reasonable explanations for students’ cognitive states. To personalize exercise recommendation， a Neural Knowledge-based Cognitive Diagnosis with Deep Bilinear Factorization Machine （NKD-DBFM） method was proposed by combining the diagnostic model with deep bilinear factorization machine， and the effectiveness of this proposed exercise recommendation method was verified on the private dataset. Compared with the optimal baseline model Neural Cognitive Diagnosis Model （NeuralCDM）， the proposed method improves the Area Under Curve （AUC） by 3.7 percentage points.

Session-based recommendation based on graph co-occurrence enhanced multi-layer perceptron

Tingjie TANG, Jiajin HUANG, Jin QIN, Hui LU

2024, 44(8): 2357-2364. DOI: 10.11772/j.issn.1001-9081.2023081063

Asbtract ( )

HTML ( )

PDF (1743KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem that the Multi-Layer Perceptron （MLP） architecture can not capture the co-occurrence relationship in the context of session sequence， a session-based recommendation model based on Graph Co-occurrence Enhanced MLP （GCE-MLP） was proposed. Firstly， the sequential dependency of the session sequence was captured by the MLP architecture， and at the same time， the co-occurrence relationship in the sequence context was obtained through the co-occurrence relationship learning layer， and the session representation was obtained through the information fusion module. Secondly， a specific feature selection layer was designed to amplify the diversity of input features of different relation learning layers. Finally， the representation learning of sessional interest was further enhanced by maximizing the mutual information between two relational representations via a noise contrastive task. Experimental results on multiple real datasets show that the recommendation performance of the GCE-MLP is better than those of the current mainstream models， which verifies the effectiveness of GCE-MLP. Compared with the optimal MLP architecture model FMLP-Rec（Filter-enhanced MLP for Recommendation）， GCE-MLP achieves the P@20 of 54.08% and the MRR@20 of 18.87% for Diginetica dataset， which are respectively increased by about 2.14 and 1.43 percentage points； GCE-MLP achieves the P@20 of 71.77% and the MRR@20 of 31.78% for Yoochoose dataset， which are respectively increased by about 0.48 and 1.77 percentage points.

Purchase behavior prediction model based on two-stage dynamic interest recognition

Chunxue ZHANG, Liqing QIU, Cheng’ai SUN, Caixia JING

2024, 44(8): 2365-2371. DOI: 10.11772/j.issn.1001-9081.2023081201

Asbtract ( )

HTML ( )

PDF (1474KB) ( )

Figures and Tables | References | Related Articles | Metrics

Online purchase prediction aims to predict users’ purchase behaviors， which can generate considerable commercial value for shopping websites. To address the problem of traditional models’ inability to learn implicit interest preferences from users’ historical behaviors accurately， a two-stage dynamic interest recognition model for online purchase prediction was proposed to predict the probability of users purchasing products. Firstly， at the first stage of the model， a click frequency graph of user-product pairs was constructed， and Light-Graph Convolutional Network （LightGCN） was utilized to learn contextual features of the graph as the static interest’s representation of users. Then， at the second stage， Bidirectional Gated Recurrent Unit （Bi-GRU） with attention mechanism was applied to explore the transformation process of user preferences. Finally， aiming at the potential high-dimensional features， a purchase prediction model integrating dynamic interest and implicit features was built. The extensive experimental results on two real e-commerce datasets show that compared with Graph Convolutional Network （GCN） model， the proposed model has the accuracy improved by at least 0.3 percentage points， and the F1 score improved by at least 2.05 percentage points.

Age estimation method combining improved CloFormer model and ordinal regression

Shuai FU, Xiaoying GUO, Ruyi BAI, Tao YAN, Bin CHEN

2024, 44(8): 2372-2380. DOI: 10.11772/j.issn.1001-9081.2023081199

Asbtract ( )

HTML ( )

PDF (3667KB) ( )

Figures and Tables | References | Related Articles | Metrics

Existing methods for age estimation typically employ ordinal regression based on Convolutional Neural Network （CNN）. However， when predicting adjacent ages， CNN is difficult in capturing global feature representations， resulting in a decrease in prediction accuracy. In order to solve the problem， an age estimation method was proposed， which combined an enhanced CloFormer model with ordinal regression. Compared to traditional CNN-based ordinal regression， CloFormer， when capturing image features， can better utilize self-attention mechanism to capture relationships between different regions in an image， thereby improving the learning of feature differences between adjacent ages. In the proposed method， firstly， the CloFormer model was optimized， and then the optimized CloFormer model was combined with ordinal regression to better utilize the age sequence information， achieving more precise age estimation. Subsequently， through end-to-end optimization training of the improved CloFormer model and ordinal regression model， the proposed method was able to better learn the relationships between facial features and age sequences. Finally， comparative experiments were conducted on multiple publicly available datasets. Experimental results show that on CACD， AFAD， and UTKFace datasets， the Root Mean Square Error （RMSE） of the proposed method is 7.36， 4.62， and 8.28， respectively. In comparison to existing age estimation methods such as Ordinal Regression with CNN （OR-CNN） and COnsistent RAnk Logits （CORAL）， the RMSEs are reduced by 0.25 and 0.05 respectively on CACD dataset， 0.18 and 0.03 respectively on AFAD dataset， and 0.97 and 0.53 respectively on UTKFace dataset， illustrating that the proposed method has better age estimation results.

Consistency preserving age estimation method by ensemble ranking

Chun SUN, Chunlong HU, Shucheng HUANG

2024, 44(8): 2381-2386. DOI: 10.11772/j.issn.1001-9081.2023081173

Asbtract ( )

HTML ( )

PDF (2290KB) ( )

Figures and Tables | References | Related Articles | Metrics

The traditional age estimation methods based on ranking and regression cannot effectively utilize the evolutionary characteristics of human faces and build correlation between different ranking labels. Moreover， using binary classification methods for age estimation may result in inconsistent ranking issues. To solve above problems， an age estimation method based on integrated ranking matrix encoding and consistency preserving was proposed to fully utilize the correlation between age and ranking value and suppress the problem of inconsistent ranking. A new indicator， the proportion of samples with inconsistent ranking， was proposed to evaluate the problem of inconsistent rankings in the two-class ranking method. First， age categories were converted into a ranking matrix form through a designed coding method. Then， the ResNet34 （Residual Network） feature extraction network was used to extract facial features， which were then learned through the proposed encoding learning module. Finally， the network prediction results were decoded into the predicted age of the image through a ranking decoder based on a metric method. The experimental results show that： the proposed method achieves a Mean Absolute Error （MAE） of 2.18 on MORPH Ⅱ dataset， and has better results on other publicly available datasets compared to methods also based on ranking and ordinal regression， such as OR-CNN （Ordinal Regression with CNN） and CORAL （COnsistent RAnk Logits）； at the same time， the proposed method decreases the proportion of samples with inconsistent ranking， and improves the measurement performance of ranking inconsistency by about 65% compared to the OR-CNN method.

Multimodal sentiment analysis network with self-supervision and multi-layer cross attention

Kaipeng XUE, Tao XU, Chunjie LIAO

2024, 44(8): 2387-2392. DOI: 10.11772/j.issn.1001-9081.2023081209

Asbtract ( )

HTML ( )

PDF (1572KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problems of incomplete intra-modal information， poor inter-modal interaction， and difficulty in training in multimodal sentiment analysis， a Multimodal Sentiment analysis network with Self-supervision and Multi-layer cross Attention fusion （MSSM） was proposed with Visual-and-Language Pre-training （VLP） model applied to the field of multimodal sentiment analysis. The visual encoder module was enhanced through self-supervised learning， and multi-layer cross attention was added to better model textual and visual features. Thus， the intra-modal information was made more abundant and complete， and the inter-modal information interaction was made more sufficient. Besides， the fast and memory-efficient exact attention with IO-awareness： FlashAttention was adopted in the proposed algorithm to address the high complexity of attention computation in Transformer. Experimental results show that compared with the current mainstream model Contrastive Language-Image Pre-training （CLIP）， MSSM improves the accuracy by 3.6 percentage points on the processed MVSA-S dataset and 2.2 percentage points on MVSA-M dataset， proving that the proposed network can effectively improve the integrity of multimodal information fusion while reducing computational cost.

Sentiment classification model of psychological counseling text based on attention over attention mechanism

Yuqing WANG, Guangli ZHU, Wenjie DUAN, Shuyu LI, Ruotong ZHOU

2024, 44(8): 2393-2399. DOI: 10.11772/j.issn.1001-9081.2023081168

Asbtract ( )

HTML ( )

PDF (1474KB) ( )

Figures and Tables | References | Related Articles | Metrics

Sentiment classification in psychological counseling scenes aims to obtain the sentiment polarity of the inquirer’s utterance， which can provide support for establishing psychological counseling Artificial Intelligence （AI） assistants. Existing methods obtain the sentiment polarity of text through contextual information， failing to consider the sentiment transmission between the current sentence and the forward neighbor sentences in the dialogue record. To address the issue， a model for sentiment classification of psychological counseling text was proposed based on Attention Over Attention （AOA） mechanism. Historical sentiment words were assigned weights by temporal sequence， which improved the accuracy of sentiment classification for psychological counseling text. In a dialogue， historical sentiment word sequences of both sides were extracted by constructed sentiment lexicon of mental health. Subsequently， the current sentence and two sequences of historical sentiment words were input into the Bidirectional Long Short-Term Memory （BiLSTM） network to get corresponding feature vectors. The Ebbinghaus forgetting curve was used to allocate internal weights to the sequences of historical sentiment words. Both inertia features and interaction features were captured by AOA mechanism. Then， the above two features along with the text features were input into the classification layer， calculating the probability of sentiment polarity. Experimental results on public dataset Emotional First Aid Dataset show that the proposed model improves F1 value by 1.55% compared with Capsule network and Directional Graph Convolutional Network （Caps-DGCN） model. Hence the proposed model can effectively improve the sentiment classification effect of psychological counseling text.

Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition

Pengqi GAO, Heming HUANG, Yonghong FAN

2024, 44(8): 2400-2406. DOI: 10.11772/j.issn.1001-9081.2023081160

Asbtract ( )

HTML ( )

PDF (2819KB) ( )

Figures and Tables | References | Related Articles | Metrics

Speech Emotion Recognition （SER） is an important and challenging task in human-computer interaction systems. To address the issues of single-feature representation and weak feature interaction in current SER systems，a Multi-input Interactive Attention Network （MIAN） was proposed. The proposed network consists of two sub-networks，namely the specific feature coordinate residual attention network and the shared feature multi-head attention network. The former utilized Res2Net and coordinate attention modules to learn specific features extracted from raw speech and generate multiscale feature representations， enhancing the model’s ability to represent emotion-related information. The latter integrated the features obtained from the forward network to form shared features， which were then input into the multi-head attention module via Bidirectional Long Short-Term Memory（BiLSTM） network. This setup allowed for simultaneous attention to relevant information in different feature subspaces， enhancing the interaction among features and capturing highly discriminative features. The collaboration of the two sub-networks mentioned above increased the diversity of features and improve the interaction capability among features. During the training process， a dual-loss function was applied for joint supervision，aiming to make the samples of the same class more compact and the samples of different classes more separated. The experimental results demonstrate that the proposed model achieves a weighted average accuracy of 91.43% on EMO-DB corpus and 76.33% on IEMOCAP corpus. Compared to other state-of-the-art models，the proposed model exhibits superior classification performance.

Construction method of voiceprint library based on multi-scale frequency-channel attention fusion

Tong CHEN, Fengyu YANG, Yu XIONG, Hong YAN, Fuxing QIU

2024, 44(8): 2407-2413. DOI: 10.11772/j.issn.1001-9081.2023081276

Asbtract ( )

HTML ( )

PDF (2240KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the problem that the accuracy of speaker verification is easily affected by external factors， a speaker verification algorithm was proposed based on a Multi-scale Frequency-Channel Attention fused Time-Delay Neural Network （MFCA-TDNN） model. Three improvements were made to MFCA-TDNN on the basis of the ECAPA-TDNN （Emphasized Channel Attention Propagation Aggregation Time Delay Neural Network）， including： incorporating a multi-scale frequency-channel attention front-end to obtain high-resolution feature representations from speech， adding a multi-scale channel attention module to fuse multi-scale information by combining local and global features， and embedding a feature attention fusion module to weight the fusion features of multiple scales. These improvements enabled the model to make better use of multi-scale time-frequency information and improve recognition capability. Experimental results show that compared to the ECAPA-TDNN model， MFCA-TDNN model achieves a reduction of 5.9% and 7.9% in Equal Error Rate （EER） and minimum Detection Cost Function （minDCF）， respectively， with the lowest EER of 3.83% and the lowest minDCF of 0.220 2.

Multivariate controllable text generation based on diffusion sequences

Chenyang LI, Long ZHANG, Qiusheng ZHENG, Shaohua QIAN

2024, 44(8): 2414-2420. DOI: 10.11772/j.issn.1001-9081.2023081137

Asbtract ( )

HTML ( )

PDF (1267KB) ( )

Figures and Tables | References | Related Articles | Metrics

With the emergence of large-scale pre-trained language models， text generation technology has made breakthrough progress. However， in the field of open text generation， the generated content lacks anthropomorphic emotional features， making it difficult for the generated text to resonate and connect emotionally. Controllable text generation is of great significance in compensating for the shortcomings of current text generation technology. Firstly， the extension of theme and emotional attributes was completed on the basis of the ChnSensiCorp dataset. At the same time， in order to construct a multivariate controllable text generation model that could generate smooth text with rich emotion， a diffusion sequence based controllable text generation model DiffuSeq-PT was proposed based on a diffusion model architecture. Theme emotion attributes and text data were used to perform the diffusion process on the sequences without the guidance of classifier. The encoding and decoding capabilities of the pre-trained model ERNIE 3.0（Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation） were used to fit the noising and denoising process of the diffusion model， and ultimately， target text that matched the relevant theme and multiple sentiment granularities were generated. Compared with the benchmark model DiffuSeq， the proposed model achieved an improvement of 0.13 and 0.01 in BERTScore on two publicly available real datasets （ChnSentiCorp and Debate dataset）， and decreased the perplexity by 14.318 and 9.46.

Relation extraction between discipline knowledge entities based on improved piecewise convolutional neural network and knowledge distillation

Yubo ZHAO, Liping ZHANG, Sheng YAN, Min HOU, Mao GAO

2024, 44(8): 2421-2429. DOI: 10.11772/j.issn.1001-9081.2023081065

Asbtract ( )

HTML ( )

PDF (2292KB) ( )

Figures and Tables | References | Related Articles | Metrics

Relational extraction is an important means of sorting out discipline knowledge as well as an important step in the construction of educational knowledge graph. In the current research， most of the pre-trained language models based on the Transformer architecture， such as the Bidirectional Encoder Representations from Transformers （BERT）， suffer from large number of parameters and excessive complexity， which make them difficult to be deployed on end devices and limite their applications in real educational scenarios. In addition， most traditional lightweight relation extraction models do not model the data through text structure， which are easy to ignore the structural information between entities， and the generated word embedding vectors are difficult to capture the contextual features of the text， have poor ability to solve the problem of multiple meanings of words， and are difficult to fit the unstructured nature of discipline knowledge texts and the high proportion of proper nouns， which is not conducive to high-quality relation extraction. In order to solve the above problems， a relation extraction method between discipline knowledge entities based on improved Piecewise Convolutional Neural Network （PCNN） and Knowledge Distillation （KD） was proposed. Firstly， BERT was used to generate high-quality domain text word vectors to improve the input layer of the PCNN model， so as to effectively capture the text context features and solve the problem of multiple meanings of words to a certain extent. Then， convolution and piecewise max pooling operations were utilized to deeply mine inter-entity structural information， constructing the BERT-PCNN model， and achieving high-quality relation extraction. Lastly， by taking into account the demands for efficient and lightweight models in educational scenarios， the knowledge of the output layer and middle layer of the BERT-PCNN model was distilled for guiding the PCNN model to complete the construction of the KD-PCNN model. The experimental results show that， the weighted-average F1 of the BERT-PCNN model reaches 94%， which is improved by 1 and 2 percentage points compared with the R-BERT and EC_BERT models； the weighted-average F1 of the KD-PCNN model reaches 92%， which is the same as the EC_BERT model， and the parameter quantity of the KD-PCNN model decreased by 3 orders of magnitude compared with the BERT-PCNN and KD-RB-l models. It can be seen that the proposed method can achieve a better trade-off between the performance evaluation index and the network parameter quantity， which is conducive to the improvement of the automated construction level of educational knowledge graph and the development and deployment of new educational applications.

Aspect-opinion pair extraction of new energy vehicle complaint text based on context enhancement

Caiqin WANG, Yuhao ZHOU, Shunxiang ZHANG, Yanhui WANG, Xiaolong WANG

2024, 44(8): 2430-2436. DOI: 10.11772/j.issn.1001-9081.2023081167

Asbtract ( )

HTML ( )

PDF (1921KB) ( )

Figures and Tables | References | Related Articles | Metrics

Mining users’ multi-dimensional opinions on products from the complaint texts of new energy vehicles can provide support for product design decisions. Because the complaint text has the characteristics of high entity density and lengthy sentence structure， the existing methods for Aspect-Opinion Pair Extraction （AOPE） suffer from weak correlations between aspect terms and opinion terms. To address this problem， an Aspect-Opinion pair Extraction model based on Context Enhancement （AOE-CE） was proposed， fusing topic features and text features as contextual representation to enhance the correlations between entities. This model was consisted of an entity recognition module and a relation detection module. Firstly， in the entity recognition module， the text was encoded by using a pre-trained model and a part-of-speech tagging tool. Secondly， Bi-directional Long Short-Term Memory （Bi-LSTM） network combined with multi-head attention was employed to capture contextual information and then derive text features. Subsequently， these text features were input into a Conditional Random Field （CRF） model to obtain the entity set. In the relation detection module， the topic features were obtained through BERT （Bidirectional Encoder Representations from Transformers） and fused with the text features to obtain the enhanced contextual representation. Then the tri-affine mechanism was used to enhance the correlations between entities with the help of contextual representation. Finally， the extraction result was obtained by Sigmoid. The experimental results show that the precision， recall， and F1 value of AOE-CE are 2.19， 1.08， and 1.60 percentage points higher than those of SDRN （Synchronous Double-channel Recurrent Network） model respectively， indicating that AOE-CE has better AOPE effect.

Help-seeking information extraction model for flood event in social media data

Huanliang SUN, Siyi WANG, Junling LIU, Jingke XU

2024, 44(8): 2437-2445. DOI: 10.11772/j.issn.1001-9081.2023081080

Asbtract ( )

HTML ( )

PDF (2708KB) ( )

Figures and Tables | References | Related Articles | Metrics

Because of data inconsistency and different information importance， how to extract desired information from the social media precisely and automatically becomes a challenging task. To solve the above problem， through Formal Concept Analysis （FCA）， word co-occurrence relationship and contextual semantics， the knowledge system of flood event was built up. Using the constructed knowledge system， a type of fine-tuned Large Language Model （LLM）， ChatFlowFlood， an information extraction model based on the TencentPretrain framework， was developed. The in-live disaster information such as locations and material shortage could be extracted only with few mannual annotations. Based on the information extraction model， Fuzzy Analytic Hierarchy Process （FAHP） and CRITIC （CRiteria Importance Through Intercriteria Correlation） methods were combined to evaluate the rescue priority of help-seeking information subjectively and objectively， which helped decision makers understand the emergency degree of the disaster. The experimental results show that on Chinese social media data， compared with the ChatFlow-7B model， the F_BERT index of the ChatFlowFlood model is improved by 73.09%.

Temporal shortest path counting query algorithm based on tree decomposition

Yuan LI, Qiulan LIN, Anzhi CHEN, Guoli YANG, Wei SONG, Guoren WANG

2024, 44(8): 2446-2454. DOI: 10.11772/j.issn.1001-9081.2023081128

Asbtract ( )

HTML ( )

PDF (1666KB) ( )

Figures and Tables | References | Related Articles | Metrics

The shortest path counting is an important research problem in graph computing. It aims to query the number of shortest paths between vertices， which is widely used in path planning and recommendation， social network analysis， betweenness centrality calculation and so on. At present， more and more networks can be modeled as temporal graphs， but there is no research work on the shortest path counting query problem of temporal graphs. Compared with the static graph， the temporal graph increases the time information， the structure is more complex， and the activation time of the edge must be considered when querying the number of paths between vertices. Therefore， the shortest path counting method for the static graphs is no longer applicable to the temporal graphs， and querying on large-scale temporal graphs is more challenging. In order to solve the shortest path counting problem of temporal graphs， a method of TG-TL （Temporal Graph-Tree Label） index based on tree decomposition was proposed. The method consists of two stages： index construction and online query. In the index construction stage， the temporal tree decomposition algorithm was designed according to the attributes of the temporal graph， and the temporal graph was transformed into a tree structure. Then， according to the structure information of tree decomposition and convex path definition， an efficient index building algorithm was proposed. In the online query stage， an efficient temporal shortest path counting query algorithm was proposed based on TG-TL index. Experiments were carried out on 4 real datasets， and the experimental results showed that compared with the query algorithm based on TG-base （Temporal Graph-base） index， the proposed algorithm improved the query efficiency by 61% at least. Therefore， the proposed algorithm is efficient and effective for the shortest path counting problem of temporal graphs.

Top-K optimal route query problem with keyword search support

Haoyu ZHAO, Ziqiang YU, Xiaomeng CHEN, Guoxiang CHEN, Hui ZHU, Bohan LI

2024, 44(8): 2455-2465. DOI: 10.11772/j.issn.1001-9081.2023081267

Asbtract ( )

HTML ( )

PDF (2879KB) ( )

Figures and Tables | References | Related Articles | Metrics

The problems of top-K optimal route query with keyword search support is a route query with given road network， a set of points of interest， a starting point and multiple keywords. The goal of query is to find k optimal routes that pass through multiple points of interest matching the query keywords. However， some existing research simplified the algorithm by using the order of user input keywords as the order of reaching points of interest， which is not suitable for scenarios where there is no requirement for the order of reaching points of interest， thereby reducing practicality. Additionally， some research aims to enhance query efficiency by setting distance thresholds to prune points of interest that do not meet the requirements， but such algorithms cannot guarantee that the pruned points of interest cannot form the optimal route. To address the problems of the above algorithms， a Keyword-aware top-K optimal Routes Search （KKRS） algorithm was proposed. Firstly， the entire road network was divided into multiple subnetworks. Then， a heuristic search strategy was employed to gradually expand the search scope starting from the subnetwork within query’s starting point until the top-K optimal routes were found or the entire road network was traversed. During the expansion process， a subgraph pruning strategy was introduced to remove subnetworks that do not contain the top-K optimal routes， thus reducing the search scope. Furthermore， to avoid computing each potentially optimal set of points of interest one by one， a pruning strategy for the sequence of points of interest was designed to quickly filter out those sequences that cannot form the optimal route， thereby reducing the computational cost. Finally， experiments were conducted on real and synthetic datasets with the two proposed pruning algorithms. These two algorithms achieved the pruning rates of subgraph pruning over 70%， and the pruning rates of points of interest sequence pruning ensured over 60% on all datasets. Compared to the advanced algorithms Keyword-aware Optimal Route query on Large-scale Road Networks （KORL）， ROSE-GM （Recurrent Optimal Subroute Expansion using Greedy Merge Strategy）， OSSCaling， and StarKOSR （finding Top-K Optional Sequenced Routes with A^*）， The KKRS algorithm is 40% more efficient than the StarKOSR algorithm， which is the more query efficient of compared algorithms.

Oversampling method for imbalanced data based on sample potential and noise evolution

Qiangkui LENG, Xuezi SUN, Xiangfu MENG

2024, 44(8): 2466-2475. DOI: 10.11772/j.issn.1001-9081.2023081145

Asbtract ( )

HTML ( )

PDF (2780KB) ( )

Figures and Tables | References | Related Articles | Metrics

In dealing with the problem of imbalanced data classification， oversampling methods are effective strategies. Existing methods mostly employ K-Nearest Neighbor （KNN） technique to select oversampling seed samples， but changes in KNN parameter values often lead to significant instability for most oversampling methods. Radial-Basis Oversampling （RBO） method can address this issue， but it tends to introduce a substantial amount of noise after oversampling. An imbalanced data oversampling method based on sample potential and noise evolution was proposed to further iteratively refine the oversampled dataset. Firstly， the RBO method was used to synthesize minority class samples and improve the imbalance of the original data by calculating sample potential. Secondly， Natural Neighbor （NaN） was employed as an error detection technique to identify suspected noise samples in the oversampled dataset. Finally， an improved Differential Evolution （DE） method was applied to iteratively refine the detected suspected noise samples. Compared to traditional oversampling methods， the proposed method can better explore important boundary information in the dataset， thus providing more assistance to classifiers to improve their classification performance. Extensive comparative experiments were conducted on 22 benchmark datasets with seven classical sampling methods （combined with three different classifiers）. The experiment results show that the proposed method achieves higher F1values and G-mean values and is superior in noise handling compared to sampling methods with post-filters， which can more effectively deal with the problem of imbalanced data classification. In addition， statistical analysis also indicates the proposed method achieves a higher Friedman ranking.

Automatic international classification of disease coding method incorporating heterogeneous information

Quanmei ZHANG, Runping HUANG, Fei TENG, Haibo ZHANG, Nan ZHOU

2024, 44(8): 2476-2482. DOI: 10.11772/j.issn.1001-9081.2023081166

Asbtract ( )

HTML ( )

PDF (2137KB) ( )

Figures and Tables | References | Related Articles | Metrics

Concerning the structural diversity of medical Electronic Health Record （EHR） and the complicated correlation between coding in the automatic International Classification of Disease （ICD） coding task， an Automatic ICD Coding method integrating Heterogeneous Information （AIC-HI） was proposed. Firstly， various feature extractors were designed based on the distinctive characteristics of structured coding， semi-structured description， and unstructured medical text in the coding task. At the same time， the coding knowledge graph was constructed to fit the hierarchical relationship of coding， and the association relationships between different branches were transformed into triples containing head and tail coding. Then representation learning was used to fuse encoding and description information to calculate label features. Finally， the attention mechanism was used to extract the most relevant feature representation in unstructured documents. The experimental results show that， compared with the suboptimal baseline model MARN （Multitask bAlanced and Recalibrated Network）， the microscopic F1-score of the model AIC-HI on the real clinical dataset MIMIC-Ⅲ is increased by 4.3 percentage points.

Graph data generation approach for graph neural network model extraction attacks

Ying YANG, Xiaoyan HAO, Dan YU, Yao MA, Yongle CHEN

2024, 44(8): 2483-2492. DOI: 10.11772/j.issn.1001-9081.2023081110

Asbtract ( )

HTML ( )

PDF (3213KB) ( )

Figures and Tables | References | Related Articles | Metrics

Data-free model extraction attacks are a class of machine learning security problems based on the fact that the attacker has no knowledge of the training data information required to carry out the attack. Aiming at the research gap of data-free model extraction attacks in the field of Graphical Neural Network （GNN）， a GNN model extraction attack method was proposed. The graph node feature information and edge information were optimized with the graph neural network interpretability method GNNExplainer and the graph data enhancement method GAUG-M， respectively， so as to generate the required graph data and achieve the final GNN model extraction. Firstly， the GNNExplainer method was used to obtain the important graph node feature information from the interpretable analysis of the response results of the target model. Secondly， the overall optimization of the graph node feature information was achieved by up weighting the important graph node features and downweighting the non-important graph node features. Then， the graph autoencoder was used as the edge information prediction module， which obtained the connection probability information between nodes according to the optimized graph node features. Finally， the edge information was optimized by adding or deleting the corresponding edges according to the probability. Three GNN model architectures trained on five graph datasets were experimented as the target models for extraction attacks， and the obtained alternative models achieve 73% to 87% accuracy in node classification task and 76% to 89% fidelity with the target model performance， which verifies the effectiveness of the proposed method.

Class-imbalanced traffic abnormal detection based on 1D-CNN and BiGRU

Hong CHEN, Bing QI, Haibo JIN, Cong WU, Li’ang ZHANG

2024, 44(8): 2493-2499. DOI: 10.11772/j.issn.1001-9081.2023081112

Asbtract ( )

HTML ( )

PDF (1194KB) ( )

Figures and Tables | References | Related Articles | Metrics

Network traffic anomaly detection is a network security defense method that involves analyzing and determining network traffic to identify potential attacks. A new approach was proposed to address the issue of low detection accuracy and high false positive rate caused by imbalanced high-dimensional network traffic data and different attack categories. One Dimensional Convolutional Neural Network（1D-CNN） and Bidirectional Gated Recurrent Unit （BiGRU） were combined to construct a model for traffic anomaly detection. For class-imbalanced data， balanced processing was performed by using an improved Synthetic Minority Oversampling TEchnique （SMOTE）， namely Borderline-SMOTE， and an undersampling clustering technique based on Gaussian Mixture Model （GMM）. Subsequently， a one-dimensional CNN was utilized to extract local features in the data， and BiGRU was used to better extract the time series features in the data. Finally， the proposed model was evaluated on the UNSW-NB15 dataset， achieving an accuracy of 98.12% and a false positive rate of 1.28%. The experimental results demonstrate that the proposed model outperforms other classic machine learning and deep learning models， it improves the recognition rate for minority attacks and achieves higher detection accuracy.

Dynamic ciphertext sorting and retrieval scheme based on blockchain

Xiaoling SUN, Danhui WANG, Shanshan LI

2024, 44(8): 2500-2505. DOI: 10.11772/j.issn.1001-9081.2023081114

Asbtract ( )

HTML ( )

PDF (1741KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the untrusted issue of cloud storage servers， a Dynamic ciphertext sorting and retrieval scheme based on blockchain was proposed. A balanced binary tree was utilized as the index tree to achieve sublinear search efficiency. A vector space model was employed to reduce text complexity. The sorting of search results for multiple keywords was achieved through the TF-IDF （Term Frequency-Inverse Document Frequency） weighted statistical algorithm. By employing a separate index tree for newly added files and maintaining a revocation list for deleted files， dynamic updating was enabled for the blockchain-based searchable encryption solution. Through leakage function， it is proven that the proposed scheme is secure against adaptive chosen keyword attacks. Performance testing analysis demonstrates that compared to the ｛key， value｝ index structure， the tree index structure adopted in the proposed scheme reduces index tree generation time by 98%， file search time by 7% and dynamic updating time by 99% averagely， with significant efficiency improvements on each step.

Application of improved hunter-prey optimization algorithm in WSN coverage

Le YANG, Damin ZHANG, Qing HE, Jiaxin DENG, Fengqin ZUO

2024, 44(8): 2506-2513. DOI: 10.11772/j.issn.1001-9081.2023081208

Asbtract ( )

HTML ( )

PDF (6422KB) ( )

Figures and Tables | References | Related Articles | Metrics

An Improved Hunter-Prey Optimization （IHPO） algorithm was proposed to improve network coverage in order to tackle the issues of node deployment coverage blind areas and uneven distribution of conventional Wireless Sensor Network （WSN）. Firstly， with the goal to improve population information exchange， Differential Evolution （DE） was introduced， and cross-variation with dynamic proportional factors was implemented during the prey position update stage. Secondly， adaptive α variation was proposed on the basis of α stable distribution in the phase of updating the global optimal location to disturb the location， so as to balance the algorithm’s performance demands over time. Finally， the population was guided to complete dynamic reverse learning by using the global optimal location with adaptive α variation perturbation， thereby increasing the population’s variety and capacity for global search. In WSN coverage challenge， the network nodes optimized by IHPO were distributed more uniformly and had a higher coverage rate. When the sensor perception capacity was insufficient， the coverage rate increased to 92.56%， which was 25.74% higher than that of the nodes optimized by the original HPO algorithm， and 13.98% and 16.41% higher than those of the nodes optimized by Improved Particle Swarm Optimization （IPSO） and Improved Grey Wolf Optimizer （IGWO）， respectively. At the same time， the energy consumption of the nodes optimized by IHOP was more evenly distributed， and those nodes had the network working duration increased to 2 500 cycles in routing test.

Channel estimation method for low earth orbit satellite MIMO-OTFS system based on improved generalized orthogonal matching pursuit

Fang LEI, Yongcai NIU

2024, 44(8): 2514-2520. DOI: 10.11772/j.issn.1001-9081.2023081170

Asbtract ( )

HTML ( )

PDF (2232KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the difficulty of channel estimation caused by the complexity of Low Earth Orbit (LEO) satellite systems based on Multiple-Input Multiple-Output （MIMO） technology and Orthogonal Time-Frequency Space （OTFS） modulation， a channel estimation method based on improved Generalized Orthogonal Matching Pursuit （GOMP） was proposed. According to the input-output relationship of the Single-Input Single-Output （SISO）-OTFS system and the propagation characteristics of the LEO satellite channel， a low-orbit satellite channel model based on MIMO-OTFS was established， and the channel estimation problem of the system was transformed into a sparse signal recovery problem. Considering that the traditional GOMP algorithm has the problems of excessive dependence on sparsity and poor reconstruction accuracy of sparse signals， the proposed method combined the weak selection idea of Stagewise Weak Orthogonal Matching Pursuit （SWOMP） and the generalized Jaccard coefficient-based similarity criterion for fast and accurate reconstruction of sparse signals. The simulation results show that when the number of antennas is 16 and the pilot overhead ratio is 0.5， compared with the Orthogonal Matching Pursuit （OMP） algorithm， the proposed method reduces the Normalized Mean Square Error （NMSE） by about 2.5 dB， and reduces the Bit Error Rate （BER） by approximately 5 dB.

Adversarial sample attack algorithm of modulation signal based on equalization of feature gradient

Rui SHI, Yong LI, Yanhan ZHU

2024, 44(8): 2521-2527. DOI: 10.11772/j.issn.1001-9081.2023081165

Asbtract ( )

HTML ( )

PDF (2546KB) ( )

Figures and Tables | References | Related Articles | Metrics

Concerning the issue that modulation aiming jamming reduces the communication performance by identifying the modulation mode of signal through Deep Neural Network （DNN）， an adversarial attack algorithm of modulation signal based on equalization of feature gradient was proposed. Different from the traditional method of label back propagation to obtain the gradient， rich space-time features of the modulation signal in the DNN high-dimensional space were used to calculate the gradients， and local average feature gradient was used to replace the single point feature gradient for the algorithm iteration， which solved the problem of unreliable gradient caused by the local oscillation of the loss function surface. Based on the processed gradient and existing momentum attack method， more subtle adversarial disturbance was generated and superimposed on the normal communication signal to construct the adversarial sample， so as to reduce the recognition rate of DNN to the communication signal and weaken the effect of modulation aiming jamming. The experimental results on RADIOML 2016.10A dataset showed that， compared to FGSM （Fast Gradient Sign Method） and MI-FGSM （Momentum Iterative Fast Gradient Sign Method）， although the running time of the proposed algorithm on VTCNN2 （Visual Transformer Convolutional Neural Network） model respectively improved by 1.36 h and 0.58 h， the attack effect of the no-target adversarial samples generated by the proposed algorithm was significant； at a signal-to-noise ratio of 10 dB， the success rate of white box attack respectively improved by 36 and 26 percentage points； when directly transferred to the CLDNN （Convolutional Long Short-Term Memory-Deep Neural Network） model， the success rate of black box attack increased by 19 and 14 percentage points respectively. The proposed algorithm improves the attack success rate of adversarial samples and has good transferability.

Coverage-guided fuzzing based on adaptive sensitive region mutation

Hang XU, Zhi YANG, Xingyuan CHEN, Bing HAN, Xuehui DU

2024, 44(8): 2528-2535. DOI: 10.11772/j.issn.1001-9081.2023081177

Asbtract ( )

HTML ( )

PDF (2341KB) ( )

Figures and Tables | References | Related Articles | Metrics

To solve the problem that there are a lot of invalid mutations， and the performance is wasted in Coverage-Guided Fuzzing （CGF）， an adaptive sensitive region mutation algorithm was proposed. Firstly， the mutation locations were divided into effective mutation location set and invalid mutation location set according to whether the mutated test case executed a new path. Then， the sensitive region was determined based on the effective mutation location， and the subsequent mutations were concentrated in the sensitive region. In the subsequent fuzzing process， the sensitive region of the corresponding seed was adjusted adaptively according to the execution results of test cases， so as to reduce the invalid mutations. In addition， a new seed selection strategy was designed to assist the sensitive region mutation algorithm. The adaptive sensitive region mutation algorithm was integrated into the American Fuzzy Lop （AFL） to form Sensitive-region-based Mutation American Fuzzy Lop （SMAFL）. SMAFL was evaluated on 12 popular applications and the experimental results showed that compared to AFL，when there was one initial seed， SMAFL found 31.4% more paths on average， increased the number of fuzzed counts by 3.4 times， and achieved higher code coverage across all 12 programs. In the testing of the LAVA-M dataset， SMAFL found 2 more bugs than AFL， and found the same bugs in a shorter time. Overall， the adaptive sensitive region mutation algorithm can improve the exploration efficiency of fuzzers.

Architecture design of data fusion pipeline for unmanned systems

Yi LIU, Guoli YANG, Qibin ZHENG, Xiang LI, Yangsen ZHOU, Depeng CHEN

2024, 44(8): 2536-2543. DOI: 10.11772/j.issn.1001-9081.2023081184

Asbtract ( )

HTML ( )

PDF (2572KB) ( )

Figures and Tables | References | Related Articles | Metrics

Sensors are the basis for unmanned systems to perform intelligent actions. The fusion of multi-sensor data can enhance intelligent perception and autonomous decision-making capabilities of unmanned systems， and improve the reliability and robustness of these systems. Data fusion of unmanned systems encounters many challenges such as diverse sensor types， heterogeneous data formats， real-time needs of data fusion and analysis， as well as complex types and fast evolution of algorithm models. Traditional methods of developing fusion models through customization on front end and approaches based on fusion platform running on back end are difficult to apply in these cases. Therefore， a pipeline platform for data fusion was proposed. This platform has capabilities to support automatic data transformation， flexible algorithm combination， dynamic model configuration， and rapid iteration of functions to achieve dynamic and quick data fusion model construction and provide information service for different tasks. Based on the analysis of data fusion process and techniques， the pipeline framework and its key functions and components were characterized， the key technologies that urgently need breakthroughs were analyzed， the running way and actual case of the framework were given， and research directions for future development were pointed out.

Review of end-to-end person search algorithms based on images

Cui WANG, Miaolei DENG, Dexian ZHANG, Lei LI, Xiaoyan YANG

2024, 44(8): 2544-2550. DOI: 10.11772/j.issn.1001-9081.2023081195

Asbtract ( )

HTML ( )

PDF (1456KB) ( )

Figures and Tables | References | Related Articles | Metrics

Person search is one of the important research directions in the field of computer vision. Its research goal is to detect and identify characters in uncropped image libraries. In order to deeply understand the person search algorithms， a large number of related literature were summarized and analyzed. First of all， according to the network structure， the person search algorithms were divided into two categories： two-step methods and end-to-end one-step methods. The key technologies of the one-step methods， feature learning and measurement learning， were analyzed and introduced. The datasets and evaluation indicators in the field of person search were discussed， and the performance comparison and analysis of the mainstream algorithms were given. The experimental results show that， although the two-step methods have good performance， most of them have high calculation costs and take long time； the one-step methods can solve the two sub-tasks pedestrian detection and person re-identification， in a more efficient learning framework and achieve better results. Finally， the person search algorithms were summarized and their future development directions were prospected.

Video pedestrian anomaly detection method based on skeleton graph and mixed attention

Yuhan LIU, Genlin JI, Hongping ZHANG

2024, 44(8): 2551-2557. DOI: 10.11772/j.issn.1001-9081.2023081157

Asbtract ( )

HTML ( )

PDF (2081KB) ( )

Figures and Tables | References | Related Articles | Metrics

In recent years， many studies that use human skeleton graph for video anomaly detection only consider the directly connected nodes when describing the strength of human skeleton connection， focusing on a small moving region and ignoring local features， so it is still very difficult to accurately detect pedestrian abnormal events. To solve the above problems， a video pedestrian anomaly detection method called PAD-SGMA （video Pedestrian Anomaly Detection method based on Skeleton Graph and Mixed Attention） was proposed. The association between skeleton points was extended， the root node was connected with the nodes that were not directly connected， and the human skeleton graph was divided to obtain the local features of the human skeleton. In the graph convolution module， static global skeleton， local region skeleton and attention-based adjacency matrix were used to capture the hierarchical representation. Secondly， a new convolutional network of spatio-temporal channel mixed attention was proposed， in which a mixed attention module was added to focus on spatial and channel relationships， to help the model enhance distinguishing features and pay different degrees of attention to different joints. In order to verify the proposed model， experiments were carried out on a large-scale open standard dataset ShanghaiTech Campus dataset， and the experimental results showed that the AUC（Area Under Curve） of PAD-SGMA was increased by 0.018 compared with GEPC （Graph Embedded Pose Clustering）.

Correlation filtering based target tracking with nonlinear temporal consistency

Wentao JIANG, Wanxuan LI, Shengchong ZHANG

2024, 44(8): 2558-2570. DOI: 10.11772/j.issn.1001-9081.2023081121

Asbtract ( )

HTML ( )

PDF (7942KB) ( )

Figures and Tables | References | Related Articles | Metrics

Concerning the problem that existing target tracking algorithms mainly use the linear constraint mechanism LADCF （Learning Adaptive Discriminative Correlation Filters）， which easily causes model drift， a correlation filtering based target tracking algorithm with nonlinear temporal consistency was proposed. First， a nonlinear temporal consistency term was proposed based on Stevens’ Law， which aligned closely with the characteristics of human visual perception. The nonlinear temporal consistency term allowed the model to track the target relatively smoothly， thus ensuring tracking continuity and preventing model drift. Next， the Alternating Direction Method of Multipliers （ADMM） was employed to compute the optimal function value， ensuring real-time tracking of the algorithm. Lastly， Stevens’ Law was used for nonlinear filter updating， enabling the filter update factor to enhance and suppress the filter according to the change of the target， thereby adapting to target changes and preventing filter degradation. Comparison experiments with mainstream correlation filtering and deep learning algorithms were performed on four standard datasets. Compared with the baseline algorithm LADCF， the tracking precision and success rate of the proposed algorithm were improved by 2.4 and 3.8 percentage points on OTB100 dataset， and 1.5 and 2.5 percentage points on UAV123 dataset. The experimental results show that the proposed algorithm effectively avoids tracking model drift， reduces the likelihood of filter degradation， has higher tracking precision and success rate， and stronger robustness in complicated situations such as occlusion and illumination changes.

Image denoising network based on local and global feature decoupling

Yuwei DING, Hongbo SHI, Jie LI, Min LIANG

2024, 44(8): 2571-2579. DOI: 10.11772/j.issn.1001-9081.2023081131

Asbtract ( )

HTML ( )

PDF (2935KB) ( )

Figures and Tables | References | Related Articles | Metrics

Concerning the problem that current Transformer-based algorithms focus on capturing the global features of images， but ignore the key role of local features to restore image details， an image denoising network based on local and global feature decoupling was proposed. The proposed network included two multi-scale branches based on Hybrid Transformer Block （HTB） and a single-scale branch based on Convolutional Neural Network （CNN）， aiming at combining powerful global modeling capability of HTB with local modeling advantage of HTB， and yielding outputs with enriched contextual information and precise spatial details. Within the HTB， self-attention mechanism was employed to adaptively model spatial- and channel-dimensional dependencies， activating a wider range of input pixels for reconstruction. Given the potential information conflicts across different branches， feature transfer block was designed to facilitate cross-branch propagation of global features and suppress low-frequency information， thereby ensuring collaborative interactions among the branches. Experimental results showed that： on the real-world image dataset SIDD， compared with Transformer-based denoising network Uformer， the proposed network improved Peak Signal-to-Noise Ratio （PSNR） by 0.09 dB and Structural SIMilarity （SSIM） by 0.001； on the synthetic image dataset Urban100， compared with multi-stage denoising network MSPNet （Multi-Stage Progressive denoising Network）， the average PSNR of the proposed network was improved by 0.41 dB. It can be seen that the proposed network effectively removes image noise and reconstructs finer texture details.

Logo detection algorithm based on improved YOLOv5

Yeheng LI, Guangsheng LUO, Qianmin SU

2024, 44(8): 2580-2587. DOI: 10.11772/j.issn.1001-9081.2023081113

Asbtract ( )

HTML ( )

PDF (4682KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the challenges posed by complex background and varying size of logo images， an improved detection algorithm based on YOLOv5 was proposed. Firstly， in combination with the Channel Block Attention Module （CBAM）， compression was applied in both image channels and spatial dimensions to extract critical information and significant regions within the image. Subsequently， the Switchable Atrous Convolution （SAC） was employed to allow the network to adaptively adjust the receptive field size in feature maps at different scales， improving the detection effects of objects across multiple scales. Finally， the Normalized Wasserstein Distance （NWD） was embedded into the loss function. The bounding boxes were modeled as 2D Gaussian distributions， the similarity between corresponding Gaussian distributions was calculated to better measure the similarity among objects， thereby enhancing the detection performance for small objects， and improving model robustness and stability. Compared to the original YOLOv5 algorithm： in small dataset FlickrLogos?32， the improved algorithm achieved a mean of Average Precision （mAP@0.5） of 90.6%， with an increase of 1 percentage point； in large dataset QMULOpenLogo， the improved algorithm achieved an mAP@0.5 of 62.7%， with an increase of 2.3 percentage points； in LogoDet3K for three types of logos， the improved algorithm increased the mAP@0.5 by 1.2， 1.4， and 1.4 percentage points respectively. Experimental results demonstrate that the improved algorithm has better small object detection ability of logo images.

Low illumination face detection based on image enhancement

Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO

2024, 44(8): 2588-2594. DOI: 10.11772/j.issn.1001-9081.2023081198

Asbtract ( )

HTML ( )

PDF (2413KB) ( )

Figures and Tables | References | Related Articles | Metrics

In response to the issue of significantly reduced detection performance of face detection models in low-light conditions， a low-light face detection method based on image enhancement was developed. Firstly， image enhancement techniques were applied to preprocess low-light images， enhancing the effective facial features. Secondly， an attention mechanism was introduced after the model’s backbone network to increase the network’s focus on facial regions and reduce the negative impact of non-uniform lighting and noise simultaneously. Furthermore， an attention-based bounding box loss function — Wise Intersection over Union （WIoU） was incorporated to improve the network’s accuracy in detecting low-quality faces. Finally， a more efficient feature fusion module was used to replace the original model structure. Experimental results on the low-light face dataset DARK FACE compared to the original YOLOv7 model indicate that the improved method achieves an increase of 2.4 percentage points in average detection precision AP@0.5 and an increase of 1.4 percentage points in mean value of average precision AP@0.5：0.95， all without introducing additional parameters or computational complexity. Additionally， the results on two other low-light face datasets confirm the effectiveness and robustness of the proposed method， approving the applicability of the method for low-light face detection in diverse scenarios.

Industrial defect detection method with improved masked autoencoder

Kaili DENG, Weibo WEI, Zhenkuan PAN

2024, 44(8): 2595-2603. DOI: 10.11772/j.issn.1001-9081.2023081122

Asbtract ( )

HTML ( )

PDF (4261KB) ( )

Figures and Tables | References | Related Articles | Metrics

Considering the problem of missed detection or over detection in the existing defect detection methods that only need normal samples， an method that combined an improved masked autoencoder with an improved Unet was constructed to achieve pixel-level defect detection. Firstly， a defect fitting module was used to generate the defect mask image and the defect image corresponding to the normal image. Secondly， the defect image was randomly masked to remove most of the defect information from the defect image. The autoencoder with Transformer structure was stimulated to learn the representations from unmasked normal regions and to repair the defect image based on context. In order to improve the model’s ability to repair details of the image， a new loss function was designed. Finally， in order to achieve pixel-level defect detection， the defect image and the repaired image were concatenated and input into the Unet with the channel cross-fusion Transformer structure. Experimental results on MVTec AD dataset show that the average image-based and pixel-based Area Under the Receiver Operating Characteristic Curve （ROC AUC） of the proposed method reached 0.984 and 0.982 respectively； compared with DRAEM （Discriminatively trained Reconstruction Anomaly Embedding Model）， it was increased by 2.9 and 3.2 percentage points； compared with CFLOW-AD （Anomaly Detection via Conditional normalizing FLOWs）， it was increased by 3.1 and 0.8 percentage points. It verifies that the proposed method has high recognition rate and detection accuracy.

Ultrasound carotid plaque segmentation method based on semi-supervision and multi-scale cascaded attention

Chenqian LI, Jun LIU

2024, 44(8): 2604-2610. DOI: 10.11772/j.issn.1001-9081.2023081197

Asbtract ( )

HTML ( )

PDF (1974KB) ( )

Figures and Tables | References | Related Articles | Metrics

Obtaining reliable labels is time-consuming and laborious caused by the characteristics of ultrasonic images such as strong noise， low quality and blurred boundary. Therefore， a semi-supervision and multi-scale cascaded attention based ultrasound carotid plaque segmentation method was proposed. Firstly， a semi-supervised segmentation method of Uncertainty Rectified Pyramid Consistency （URPC） was used to make full use of unlabeled data to train the model， so as to reduce the time-consuming and laborious labeling pressure. Then， a dual encoder structure based on edge detection was proposed， and the edge detection encoder was used to assist the ultrasonic plaque image feature encoder to fully acquire the edge information. In addition， a Multi-Scale Fusion Module （MSFM） was designed to improve the extraction of irregularly shaped plaques by adaptive fusion of multi-scale features， and a Cascaded Channel Spatial Attention （CCSA） module was combined to better focus on the plaque region. Finally， the proposed method was evaluated on the ultrasonic carotid plaque image dataset. Experimental results show that the Dice index and IoU （Intersection over Union） index of the proposed method on the dataset are 2.8 and 6.3 percentage points higher than those of the supervised method CA-Net （Comprehensive Attention convolutional neural Network） respectively， and 1.8 and 1.3 percentage points higher than those of the semi-supervised method Cyclic Prototype Consistency Learning （CPCL） respectively. It can be seen that this method can effectively improve the segmentation accuracy of ultrasound carotid plaque image.

Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding

Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU

2024, 44(8): 2611-2617. DOI: 10.11772/j.issn.1001-9081.2023081141

Asbtract ( )

HTML ( )

PDF (2371KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to address the issues of insufficient acoustic feature extraction and severe decoding feature loss in single-channel speech enhancement networks based on convolutional encoder-decoder architecture， a single-channel speech enhancement network called Multi-Channel Information Aggregation and Collaborative Decoding （MIACD） was proposed. A dual-channel encoder was utilized to extract the speech magnitude spectrum and complex spectrum features， which were enriched with Self-Supervised Learning （SSL） representations. A four-layer Conformer block was employed to model the extracted features in time and frequency domains. By incorporating residual connections， the speech magnitude and complex features extracted by the dual-channel encoder were introduced into a three-channel information aggregation decoder. Additionally， a Channel-Time-Frequency Attention （CTF-Attention） mechanism was proposed to adjust the aggregated information in the decoder based on the distribution of speech energy， effectively alleviating the problem of severe acoustic information loss during decoding. Experimental results on the publicly available dataset Voice Bank DEMAND demonstrate that， compared to Glance and Gaze： a collaborative learning framework for Single-channel speech enhancement （GaGNet）， the proposed method achieves a 5.1% improvement on the objective metric WB-PESQ （Wide Band Perceptual Evaluation of Speech Quality） and 96.7% on STOI （Short-Time Objective Intelligibility）， validating that the proposed method effectively utilizes speech information for signal reconstruction， noise suppression， and speech intelligibility enhancement.

Traffic flow forecasting via spatial-temporal multi-graph fusion

Yanjie GU, Yingjun ZHANG, Xiaoqian LIU, Wei ZHOU, Wei SUN

2024, 44(8): 2618-2625. DOI: 10.11772/j.issn.1001-9081.2023081226

Asbtract ( )

HTML ( )

PDF (1979KB) ( )

Figures and Tables | References | Related Articles | Metrics

Traffic prediction is a fundamental task in Intelligent Transportation System （ITS）， as accurate Traffic Flow Forecasting （TFF） can significantly improve the utilization efficiency of public resources. To address the limitations of insufficient utilization of contextual information， imbalanced graph fusion techniques， and consideration of only static spatial relationships in existing multi-graph neural network models， a TFF model based on Spatio-Temporal Multi-Graph Fusion （STMGF） was proposed. Firstly， different spatial correlations across different regions were extracted by the model through the fusion of spatial graphs， semantic graphs， and spatial-semantic graphs. Spatial attention mechanism and graph attention mechanism were utilized to dynamically learn the importance of different graph structures for different neighbors. Then， a multi-kernel temporal attention mechanism was employed to capture both local and global temporal dependencies. Finally， a multi-layer perceptron was utilized to predict traffic flow， obtaining the final prediction values. The validity of the model was verified on NYCTaxi dataset and NYCBike dataset. Experimental results showed that the Root Mean Square Errors （RMSE） of the proposed model STMGF were 8.46%， 2.70%， and 2.20% lower than those of Spatio-Temporal Graph Convolutional Network （STGCN）， Attention based Spatial-Temporal Graph Neural Network （ASTGNN）， and Meta-graph Convolutional Recurrent Network （MegaCRN）， respectively in the 36 steps forecast task of the NYCBike dataset.

Multi-robot path following and formation based on deep reinforcement learning

Haodong HE, Hao FU, Qiang WANG, Shuai ZHOU, Wei LIU

2024, 44(8): 2626-2633. DOI: 10.11772/j.issn.1001-9081.2023081120

Asbtract ( )

HTML ( )

PDF (3411KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the obstacle avoidance and trajectory smoothness problem of multi-robot path following and formation in crowd environment， a multi-robot path following and formation algorithm based on deep reinforcement learning was proposed. Firstly， a pedestrian danger priority mechanism was established， which was combined with reinforcement learning to design a danger awareness network to enhance the safety of multi-robot formation. Subsequently， a virtual robot was introduced as the reference target for multiple robots， thus transforming path following into tracking control of the virtual robot by the multiple robots， with the purpose of enhancing the smoothness of the robot trajectories. Finally， quantitative and qualitative analysis was conducted through simulation experiments to compare the proposed algorithm with existing ones. The experimental results show that compared with the existing point-to-point path following algorithms， the proposed algorithm has excellent obstacle avoidance performance in crowd environments， which ensures the smoothness of multi-robot motion trajectories.

Credit card fraud detection model based on graph attention Transformation neural network

Fan YANG, Yao ZOU, Mingzhi ZHU, Zhenwei MA, Dawei CHENG, Changjun JIANG

2024, 44(8): 2634-2642. DOI: 10.11772/j.issn.1001-9081.2023081153

Asbtract ( )

HTML ( )

PDF (2474KB) ( )

Figures and Tables | References | Related Articles | Metrics

For the issue of existing models’ inability to accurately identify intricate and diverse patterns of gang fraud， a new practical credit card fraud detection model based on complex transaction graph was proposed. Firstly， the association transaction graph was constructed based on the original transaction information of the users， then the graph Transformer neural network module was employed to mine the gang fraud characteristics directly from the transaction network without cumbersome feature engineering. Finally， the high-precision detection of fraud transactions was realized by jointly optimizing the topological features and sequential transaction features by the fraud detection network. The credit card anti-fraud experiment results showed that the proposed model outperformed seven benchmark models in all evaluation indexes. The Average-Precision （AP） improved by 20% and the Area Under the ROC Curve （AUC） increased by an average of 2.7% over the best benchmark Graph Attention Network （GAT） model in transaction fraud detection tasks. These results indicate that the proposed model is effective in the detection of credit card fraud transactions.

Multi-granularity abrupt change fitting network for air quality prediction

Qianhong SHI, Yan YANG, Yongquan JIANG, Xiaocao OUYANG, Wubo FAN, Qiang CHEN, Tao JIANG, Yuan LI

2024, 44(8): 2643-2650. DOI: 10.11772/j.issn.1001-9081.2023081169

Asbtract ( )

HTML ( )

PDF (1283KB) ( )

Figures and Tables | References | Related Articles | Metrics

Air quality data， as a typical spatio-temporal data， exhibits complex multi-scale intrinsic characteristics and has abrupt change problem. Concerning the problem that existing air quality prediction methods perform poorly when dealing with air quality prediction tasks containing large amount of abrupt change， a Multi-Granularity abrupt Change Fitting Network （MACFN） for air quality prediction was proposed. Firstly， multi-granularity feature extraction was first performed on the input data according to the periodicity of air quality data in time. Then， a graph convolution network and a temporal convolution network were used to extract the spatial correlation and temporal dependence of the air quality data， respectively. Finally， to reduce the prediction error， an abrupt change fitting network was designed to adaptively learn the abrupt change part of the data. The proposed network was experimentally evaluated on three real air quality datasets， and the Root Mean Square Error （RMSE） decreased by about 11.6%， 6.3%， and 2.2% respectively， when compared to the Multi-Scale Spatial Temporal Network （MSSTN）. The experimental results show that MACFN can efficiently capture complex spatio-temporal relationships and performs better in the task of predicting air quality that is prone to abrupt change with a large magnitude of variability.

Table of Content