Journal of Computer Applications

Technology application prospects and risk challenges of large language models

Yuemei XU, Ling HU, Jiayi ZHAO, Wanze DU, Wenqing WANG

2024, 44(6): 1655-1662. DOI: 10.11772/j.issn.1001-9081.2023060885

Asbtract ( )

HTML ( )

PDF (1142KB) ( )

Figures and Tables | References | Related Articles | Metrics

In view of the rapid development of Large Language Model （LLM） technology， a comprehensive analysis was conducted on its technical application prospects and risk challenges which has great reference value for the development and governance of Artificial General Intelligence （AGI）. Firstly， with representative language models such as Multi-BERT （Multilingual Bidirectional Encoder Representations from Transformer）， GPT （Generative Pre-trained Transformer） and ChatGPT （Chat Generative Pre-trained Transformer） as examples， the development process， key technologies and evaluation systems of LLM were reviewed. Then， a detailed analysis of LLM on technical limitations and security risks was conducted. Finally， suggestions were put forward for technical improvement and policy follow-up of the LLM. The analysis indicates that at a developing status， the current LLMs still produce non-truthful and biased output， lack real-time autonomous learning ability， require huge computing power， highly rely on data quality and quantity， and tend towards monotonous language style. They have security risks related to data privacy， information security， ethics， and other aspects. Their future developments can continue to improve technically， from “large-scale” to “lightweight”， from “single-modal” to “multi-modal”， from “general-purpose” to “vertical”； for real-time follow-up in policy， their applications and developments should be regulated by targeted regulatory measures.

Review on security threats and defense measures in federated learning

Xuebin CHEN, Zhiqiang REN, Hongyang ZHANG

2024, 44(6): 1663-1672. DOI: 10.11772/j.issn.1001-9081.2023060832

Asbtract ( )

HTML ( )

PDF (1072KB) ( )

Figures and Tables | References | Related Articles | Metrics

Federated learning is a distributed learning approach for solving the data sharing problem and privacy protection problem in machine learning， in which multiple parties jointly train a machine learning model and protect the privacy of data. However， there are security threats inherent in federated learning， which makes federated learning face great challenges in practical applications. Therefore， analyzing the attacks faced by federation learning and the corresponding defensive measures are crucial for the development and application of federation learning. First， the definition， process and classification of federated learning were introduced， and the attacker model in federated learning was introduced. Then， the possible attacks in terms of both robustness and privacy of federated learning systems were introduced， and the corresponding defense measures were introduced as well. Furthermore， the shortcomings of the defense schemes were also pointed out. Finally， a secure federated learning system was envisioned.

Survey of incomplete multi-view clustering

Yao DONG, Yixue FU, Yongfeng DONG, Jin SHI, Chen CHEN

2024, 44(6): 1673-1682. DOI: 10.11772/j.issn.1001-9081.2023060813

Asbtract ( )

HTML ( )

PDF (2050KB) ( )

Figures and Tables | References | Related Articles | Metrics

Multi-view clustering has recently been a hot topic in graph data mining. However， due to the limitations of data collection technology or human factors， multi-view data often has the problem of missing views or samples. Reducing the impact of incomplete views on clustering performance is a major challenge currently faced by multi-view clustering. In order to better understand the development of Incomplete Multi-view Clustering （IMC） in recent years， a comprehensive review is of great theoretical significance and practical value. Firstly， the missing types of incomplete multi-view data were summarized and analyzed. Secondly， four types of IMC methods， based on Multiple Kernel Learning （MKL）， Matrix Factorization （MF） learning， deep learning， and graph learning were compared， and the technical characteristics and differences among the methods were analyzed. Thirdly， from the perspectives of dataset types， the numbers of views and categories， and application fields， twenty-two public incomplete multi-view datasets were summarized. Then， the evaluation metrics were outlined， and the performance of existing incomplete multi-view clustering methods on homogeneous and heterogeneous datasets were evaluated. Finally， the existing problems， future research directions， and existing application fields of incomplete multi-view clustering were discussed.

Review of online education learner knowledge tracing

Yajuan ZHAO, Fanjun MENG, Xingjian XU

2024, 44(6): 1683-1698. DOI: 10.11772/j.issn.1001-9081.2023060852

Asbtract ( )

HTML ( )

PDF (2932KB) ( )

Figures and Tables | References | Related Articles | Metrics

Knowledge Tracing （KT） is a fundamental and challenging task in online education， and it involves the establishment of learner knowledge state model based on the learning history； by which learners can better understand their knowledge states， while teachers can better understand the learning situation of learners. The KT research for learners of online education was summarized. Firstly， the main tasks and historical progress of KT were introduced. Subsequently， traditional KT models and deep learning KT models were explained. Furthermore， relevant datasets and evaluation metrics were summarized， alongside a compilation of the applications of KT. In conclusion， the current status of knowledge tracing was summarized， and the limitations and future prospects for KT were discussed.

Cognitive graph based on business process

Yao LIU, Yumeng LI, Miaomiao SONG

2024, 44(6): 1699-1705. DOI: 10.11772/j.issn.1001-9081.2023060850

Asbtract ( )

HTML ( )

PDF (988KB) ( )

Figures and Tables | References | Related Articles | Metrics

Concerning the inability to make full use of existing business resources in the current software project development process， which leads to low development efficiency and weak capabilities， a cognitive graph based on software development process was proposed by studying the interrelations among business resources. First， a method for building knowledge hierarchy by extracting business knowledge from formal documents was developed and corrected. Second， a network representation model for software codes was constructed through code feature extraction and code entity similarity investigation. Finally， the model was tested using real business data and was compared with three other methods： Vector Space Model （VSM）， diverse ranking method and deep learning. Experimental results show that the established cognitive graph method based on business process is superior to current text matching and deep learning algorithms in code retrieval； the cognitive graph method improves precision@5， mean Average Precision （mAP） and Normalized Discounted Cumulative Gain （?-NDCG） by 4.30， 0.38 and 2.74 percentage points respectively compared with ranking-based code search effectively method， solving many problems such as potential business vocabulary identification and business cognitive reasoning representation， and improving the code retrieval effect and business resource utilization.

Chinese named entity recognition model incorporating multi-granularity linguistic knowledge and hierarchical information

Youren YU, Yangsen ZHANG, Yuru JIANG, Gaijuan HUANG

2024, 44(6): 1706-1712. DOI: 10.11772/j.issn.1001-9081.2023060833

Asbtract ( )

HTML ( )

PDF (1485KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem that most of the current Named Entity Recognition （NER） models only use character-level information encoding and lack text hierarchical information extraction， a Chinese NER （CNER） model incorporating Multi-granularity linguistic knowledge and Hierarchical information （CMH） was proposed. First， the text was encoded using a model that had been pre-trained with multi-granularity linguistic knowledge， so that the model could capture both fine-grained and coarse-grained linguistic information of the text， and thus better characterize the corpus. Second， hierarchical information was extracted using the ON-LSTM （Ordered Neurons Long Short-Term Memory network） model， in order to utilize the hierarchical structural information of the text itself and enhance the temporal relationships between codes. Finally， at the decoding end of the model， incorporated with the word segmentation Information of the text， the entity recognition problem was transformed into a table filling problem in order to better solve the entity overlapping problem and obtain more accurate entity recognition results. Meanwhile， in order to solve the problem of poor migration ability of the current models in different domains， the concept of universal entity recognition was proposed， and a set of universal NER dataset MDNER （Multi-Domain NER dataset） was constructed to enhance the generalization ability of the model in multiple domains by filtering the universal entity types in multiple domains. To validate the effectiveness of the proposed model， experiments were conducted on the datasets Resume， Weibo， and MSRA， and the F1 values were improved by 0.94， 4.95 and 1.58 percentage points， respectively， compared to the MECT （Multi-metadata Embedding based Cross-Transformer） model. In order to verify the proposed model’s entity recognition effect in multi-domain， experiments were conducted on MDNER， and the F1 value reached 95.29%. The experimental results show that the pre-training of multi-granularity linguistic knowledge， the extraction of hierarchical structural information of the text， and the efficient pointer decoder are crucial for the performance promotion of the model.

Relation extraction method based on mask prompt and gated memory network calibration

Chao WEI, Yanping CHEN, Kai WANG, Yongbin QIN, Ruizhang HUANG

2024, 44(6): 1713-1719. DOI: 10.11772/j.issn.1001-9081.2023060818

Asbtract ( )

HTML ( )

PDF (1155KB) ( )

Figures and Tables | References | Related Articles | Metrics

To tackle the difficulty in semantic mining of entity relations and biased relation prediction in Relation Extraction （RE） tasks， a RE method based on Mask prompt and Gated Memory Network Calibration （MGMNC） was proposed. First， the latent semantics between entities within the Pre-trained Language Model （PLM） semantic space was learned through the utilization of masks in prompts. By constructing a mask attention weight matrix， the discrete masked semantic spaces were interconnected. Then， the gated calibration networks were used to integrate the masked representations containing entity and relation semantics into the global semantics of the sentence. Besides， these calibrated representations were served as prompts to adjust the relation information， and the final representation of the calibrated sentence was mapped to the corresponding relation class. Finally， the potential of PLM was fully exploited by the proposed approach through harnessing masks in prompts and combining them with the advantages of traditional fine-tuning methods. The experimental results highlight the effectiveness of the proposed method. On the SemEval （SemEval-2010 Task 8） dataset， the F1 score reached impressive 91.4%， outperforming the RELA （Relation Extraction with Label Augmentation） generative method by 1.0 percentage point. Additionally， the F1 scores on the SciERC （Entities， Relations， and Coreference for Scientific knowledge graph construction） and CLTC （Chinese Literature Text Corpus） datasets were remarkable， achieving 91.0% and 82.8% respectively. The effectiveness of the proposed method was evident as it consistently outperformed the comparative methods on all three datasets mentioned above. Furthermore， the proposed method achieved superior extraction performance compared to generative methods.

Keyword extraction method for scientific text based on improved TextRank

Dongju YANG, Chengfu HU

2024, 44(6): 1720-1726. DOI: 10.11772/j.issn.1001-9081.2023060845

Asbtract ( )

HTML ( )

PDF (1171KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the poor extraction effect of words that appear less frequently but can better express the theme of the text in the keyword extraction task of scientific text， a keyword extraction method based on improved TextRank was proposed. Firstly， the Term Frequency-Inverse Document Frequency （TF-IDF） statistical features and positional features of the words were used to optimize the probability transfer matrix between the words in the co-occurrence graph， and the initial scores of the words were obtained through iterative computation. Then， K-Core （K-Core decomposition） algorithm was used to mine the K-Core subgraphs to get the hierarchical features of the words， and the average information entropy feature was used to measure the thematic representation ability of the words. Finally， on the basis of the initial score of the word， the hierarchical feature and the average information entropy feature were fused to determine the keyword. The experimental results show that： on the public dataset， compared with the TextRank method and the OTextRank （Optimized TextRank） method， the proposed method increases the average F1 by 6.5 and 3.3 percentage points respectively for extracting different numbers of keywords； on the science and technology service project dataset， compared with the TextRank method and the OtexTRank method， the proposed method increases the average F1 by 7.4 and 3.2 percentage points for extracting different numbers of keywords. Experimental results verified the effectiveness of the proposed method for extracting keywords with low frequency but better expressing the theme of the text.

Distributed observation point classifier for big data with random sample partition

Xu LI, Yulin HE, Laizhong CUI, Zhexue HUANG, Fournier‑Viger PHILIPPE

2024, 44(6): 1727-1733. DOI: 10.11772/j.issn.1001-9081.2023060847

Asbtract ( )

HTML ( )

PDF (2503KB) ( )

Figures and Tables | References | Related Articles | Metrics

Observation Point Classifier （OPC） is a supervised learning model which tries to transform a multi-dimensional linearly-inseparable problem in original data space into a one-dimensional linearly-separable problem in projective distance space and it is good at high-dimensional data classification. In order to alleviate the high train complexity when applying OPC to handle the big data classification problem， a Random Sample Partition （RSP）-based Distributed OPC （DOPC） for big data was designed under the Spark framework. First， RSP data blocks were generated and transformed into Resilient Distributed Dataset （RDD） under the distributed computation environment. Second， a set of OPCs was collaboratively trained on RSP data blocks with high Spark parallelizability. Finally， different OPCs were fused into a DOPC to predict the final label of unknow sample. The persuasive experiments on eight big datasets were conducted to validate the feasibility， rationality and effectiveness of designed DOPC. Experimental results show that DOPC trained on multiple computation nodes gets the higher testing accuracy than OPC trained on single computation node with less time consumption， and meanwhile compared to the RSP model based Neural Network （NN）， Decision Tree （DT）， Naive Bayesian （NB）， and K-Nearest Neighbor （KNN） classifiers under the Spark framework， DOPC obtains stronger generalization capability. The superior testing performances demonstrate that DOPC is a highly effective and low consumptive supervised learning algorithm for handling big data classification problems.

Deep event clustering method based on event representation and contrastive learning

Xiaoxia JIANG, Ruizhang HUANG, Ruina BAI, Lina REN, Yanping CHEN

2024, 44(6): 1734-1742. DOI: 10.11772/j.issn.1001-9081.2023060851

Asbtract ( )

HTML ( )

PDF (5604KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem that the existing deep clustering methods can not efficiently divide event types without considering event information and its structural characteristics， a Deep Event Clustering method based on Event Representation and Contrastive Learning （DEC_ERCL） was proposed. Firstly， information recognition was utilized to identify structured event information from unstructured text， thus the impact of redundant information on event semantics was avoided. Secondly， the structural information of the event was integrated into the autoencoder to learn the low-dimensional dense event representation， which was used as the basis for downstream clustering. Finally， in order to effectively model the subtle differences between events， a contrast loss with multiple positive examples was added to the feature learning process. Experimental results on the datasets DuEE， FewFC， Military and ACE2005 show that the proposed method performs better than other deep clustering methods in accuracy and Normalized Mutual Information （NMI） evaluation indexes. Compared with the suboptimal method， the accuracy of DEC_ERCL is increased by 17.85%，9.26%，7.36% and 33.54%， respectively， indicating that DEC_ERCL has better event clustering effect.

Industrial multivariate time series data quality assessment method

Hongtao SONG, Jiangsheng YU, Qilong HAN

2024, 44(6): 1743-1750. DOI: 10.11772/j.issn.1001-9081.2023060824

Asbtract ( )

HTML ( )

PDF (789KB) ( )

Figures and Tables | References | Related Articles | Metrics

The existing Data Quality Assessment （DQA） methods often only analyze the basic concept of a specific Data Quality Dimension （DQD）， ignoring the influence of fine-grained sub-dimensions that reflect key information of Data Quality （DQ） on the assessment results. To address these problems， an Industrial Multivariate Time Series Data Quality Assessment （IMTSDQA） method was proposed. Firstly， the DQDs to be evaluated such as completeness， normativeness， consistency， uniqueness， and accuracy were fine-grainedly divided， and the correlation of the sub-dimensions within the same DQD or between different DQDs was considered to determine the measurements of these sub-dimensions. Secondly， the sub-dimensions of attribute completeness， record completeness， numerical completeness， type normativeness， precision normativeness， sequential consistency， logical consistency， attribute uniqueness， record uniqueness， range accuracy， and numerical accuracy were weighted to fully mine the deep-level information of DQDs， so as to obtain the evaluation results reflecting the details of DQ. Experimental results show that compared to existing approaches based on qualitative analysis of frameworks and model construction according to basic definitions of DQDs， the proposed method can assess DQ more effectively and comprehensively， and the assessment results of different DQDs can reflect DQ problems more objectively and accurately.

Multi-relation approximate reasoning model based on uncertain knowledge graph embedding

Jianjing LI, Guanfeng LI, Feizhou QIN, Weijun LI

2024, 44(6): 1751-1759. DOI: 10.11772/j.issn.1001-9081.2023060762

Asbtract ( )

HTML ( )

PDF (1027KB) ( )

Figures and Tables | References | Related Articles | Metrics

Because the uncertain embedding model of large-scale Knowledge Graph （KG） can not perform approximate reasoning on multiple logical relationships， a multi-relation approximate reasoning model based on Uncertain KG Embedding （UKGE） named UDConEx （Uncertainty DistMult （Distance Multiplicative） and complex Convolution Embedding） was proposed. Firstly， the UDConEx combined the characteristics of DistMult and ComplEx （Complex Embedding）， enabling it to infer symmetric and asymmetric relationships. Subsequently， Convolutional Neural Network（CNN） was employed by the UDConEx to capture the interactive information in the uncertain KG， thereby enabling it to reason inverse and transitive relationships. Lastly， the neural network was employed to carry out confidence learning of uncertain KG information， enabling the UDConEx to perform approximate reasoning within the UKGE space. The experimental results on three public data sets of CN15k， NL27k， and PPI5k show that， compared with MUKGE （Multiplex UKGE） model， the Mean Absolute Error （MAE） of confidence prediction is reduced by 6.3%， 30.1% and 44.9% for CN15k， NL27k and PPI5k respectively； in the task of relation fact ranking， the linear-based Normalized Discounted Cumulative Gain （NDCG） is improved by 5.8% and 2.6% for CN15k and NL27k respectively； in the multi-relation approximate reasoning task， it is verified that the UDConEx has the approximate reasoning ability of multiple logical relationships. The inability of traditional embedding models to predict confidence is compensated for by the UDConEx， which achieves approximate reasoning for multiple logical relationships and offers enhanced accuracy and interpretability in uncertainty KG reasoning.

Information diffusion prediction model based on Transformer and relational graph convolutional network

Xiting LYU, Jinghua ZHAO, Haiying RONG, Jiale ZHAO

2024, 44(6): 1760-1766. DOI: 10.11772/j.issn.1001-9081.2023060884

Asbtract ( )

HTML ( )

PDF (2150KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem that in the dynamic evolution of information diffusion， it is difficult to effectively capture structural features， temporal features， and the interactive expression between them， an information diffusion prediction model based on Transformer and Relational Graph Convolutional Network （TRGCN） was proposed. Firstly， a dynamic heterogeneous graph composed of the social network graph and the diffusion cascade graph was constructed. The structural features of each node in this graph were then extracted using Relational Graph Convolutional Network （RGCN）. Secondly， the time embedding of each node was re-encoded using Bi-directional Long Short-Term Memory （Bi-LSTM） network. Then a time decay term was introduced to give different weights to the nodes at different time positions， so as to obtain the temporal features of nodes. Finally， structural features and temporal features were input into Transformer and then merged. Finally， the spatial-temporal features were obtained for information diffusion prediction. The experimental results on three real datasets of Twitter， Douban and Memetracker show that compared with the optimal model in the comparison experiment， the Hits@100 of TRGCN increase by 3.18%， 5.96% and 3.34% respectively， the Map@100 of TRGCN increase by 11.60%， 19.72% and 8.47% respectively， proving its validity and rationality.

Few-shot news topic classification method based on knowledge enhancement and prompt learning

Xinyan YU, Cheng ZENG, Qian WANG, Peng HE, Xiaoyu DING

2024, 44(6): 1767-1774. DOI: 10.11772/j.issn.1001-9081.2023050709

Asbtract ( )

HTML ( )

PDF (2029KB) ( )

Figures and Tables | References | Related Articles | Metrics

Classification methods based on fine-tuning pre-trained models usually require a large amount of annotated data， resulting in the inability to be used for few-shot classification tasks. Therefore， a Knowledge enhancement and Prompt Learning （KPL） method was proposed for Chinese few-shot news topic classification. Firstly， an optimal prompt template was learned from the training set by using a pre-trained model. Then the template was integrated with the input text， effectively transforming the classification task into a cloze-filling task， simultaneously external knowledge was utilized to expand the label word space， enhancing the semantic richness of label words. Finally， predicted label words were subsequently mapped back to the original labels. Experiments were conducted on a few-shot training set and a few-shot validation set randomly sampled from three news datasets， THUCNews， SHNews and Toutiao. The experimental results show that the proposed method improves the overall performance on the 1-shot， 5-shot， 10-shot and 20-shot tasks on the above datasets. Notably， a significant improvement is observed in the 1-shot task. Compared to baseline few-shot classification method， the accuracy increases by at least 7.59， 2.11 and 3.10 percentage points， respectively， confirming the effectiveness of KPL in few-shot news topic classification tasks.

Information retrieval method based on multi-granularity semantic fusion

Zhengyu ZHAO, Jing LUO, Xinhui TU

2024, 44(6): 1775-1780. DOI: 10.11772/j.issn.1001-9081.2023050646

Asbtract ( )

HTML ( )

PDF (1551KB) ( )

Figures and Tables | References | Related Articles | Metrics

Information Retrieval （IR） is a process that organizes and processes information using specific techniques and methods to meet users’ information needs. In recent years， dense retrieval methods based on pre-trained models have achieved significant success. However， these methods only utilize vector representations of text and words to calculate the relevance between query and document， ignoring the semantic information at the phrase level. To address this issue， an IR method called MSIR （Multi-Scale Information Retrieval） was proposed. IR performance was enhanced by integrating semantic information of different granularities from the query and the document. First， semantic units of three different granularities — word， phrase， and text — were constructed in the query and the document. Then， the pre-trained model was used to encode these three semantic units separately to obtain their semantic representations. Finally， these semantic representations were used to calculate the relevance between the query and the document. Comparison experiments were conducted on three classic datasets of different sizes， including Corvid-19， TREC2019 and Robust04. Compared with ColBERT （ranking model based on Contextualized late interaction over BERT （Bidirectional Encoder Representation from Transformers））， MSIR shows an approximately 8% improvement in the P@10， P@20， NDCG@10 and NDCG@20 indicators on Robust04 dataset， as well as some improvements on Corvid-19 and TREC2019 datasets. Experimental results demonstrate that MSIR can effectively integrate multi-granularity semantic information， thereby enhancing retrieval accuracy.

Generative label adversarial text classification model

Xun YAO, Zhongzheng QIN, Jie YANG

2024, 44(6): 1781-1785. DOI: 10.11772/j.issn.1001-9081.2023050662

Asbtract ( )

HTML ( )

PDF (1142KB) ( )

Figures and Tables | References | Related Articles | Metrics

Text classification is a fundamental task in Natural Language Processing （NLP）， aiming to assign text data to predefined categories. The combination of Graph Convolutional neural Network （GCN） and large-scale pre-trained model BERT （Bidirectional Encoder Representations from Transformer） has achieved excellent results in text classification tasks. Undirected information transmission of GCN in large-scale heterogeneous graphs produces information noise， which affects the judgment of the model and reduce the classification ability of the model. To solve this problem， a generative label adversarial model， the Class Adversarial Graph Convolutional Network （CAGCN） model， was proposed to reduce the interference of irrelevant information during classification and improve the classification performance of the model. Firstly， the composition method in TextGCN （Text Graph Convolutional Network） was used to construct the adjacency matrix， which was combined with GCN and BERT models as a Class Generator （CG）. Secondly， the pseudo-label feature training method was used in the model training to construct a clueter. The cluster and the class generator were jointly trained. Finally， experiments were carried out on several widely used datasets. Experimental results show that the classification accuracy of CAGCN model is 1.2， 0.1， 0.5， 1.7 and 0.5 percentage points higher than that of RoBERTaGCN model on the widely used classification datasets 20NG， R8， R52， Ohsumed and MR， respectively.

Aspect-level sentiment analysis model combining strong association dependency and concise syntax

Tianci KE, Jianhua LIU, Shuihua SUN, Zhixiong ZHENG, Zijie CAI

2024, 44(6): 1786-1795. DOI: 10.11772/j.issn.1001-9081.2023050638

Asbtract ( )

HTML ( )

PDF (2399KB) ( )

Figures and Tables | References | Related Articles | Metrics

In response to several issues related to the interference of multiple aspect words in the syntactic dependency tree， redundant information caused by invalid words and punctuation marks， as well as weak correlations between aspect words and corresponding sentiment words， an aspect-level sentiment analysis model combining Strong Association Dependencies and Concise Syntax （SADCS） was proposed. Firstly， a sentiment Part-Of-Speech （POS） list was constructed to enhance the association between aspect words and corresponding sentiments. Then， a joint list incorporating POS list and dependency relationships was constructed to eliminate redundant information of invalid words and punctuation marks in the optimized dependency tree. Next， optimized dependency tree was combined with a Graph ATtention network （GAT） to model and extract contextual features. Finally， contextual feature information and the feature information of dependency relationship types were learned and fused to enhance the feature representation， enabling the classifier to efficiently predict the sentiment polarity of each aspect word. The proposed model was experimentally analyzed on four public datasets. Compared with the DMF-GAT-BERT （Dynamic Multichannel Fusion mechanism based on the GAT and BERT （Bidirectional Encoder Representations from Transformers）） model， the accuracy of the proposed model increased by 1.48， 1.81， 0.09 and 0.44 percentage points， respectively. Experimental results demonstrate that the proposed model effectively enhances the association between aspect words and sentiment words， resulting in more accurate prediction of aspect word sentiment polarity.

Dual-channel sentiment analysis model based on improved prompt learning method

Junfeng SHEN, Xingchen ZHOU, Can TANG

2024, 44(6): 1796-1806. DOI: 10.11772/j.issn.1001-9081.2023060733

Asbtract ( )

HTML ( )

PDF (3205KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problems of long template iterative update cycle and poor generalization ability in the previous prompt learning method， a dual-channel sentiment analysis model was proposed based on the improved prompt learning method.First， The serialized prompt templates and the input word vectors were introduced into the attention mechanism structure， and meanwhile， the templates were iteratively updated as the input word vectors were updated in the multi-layer attention mechanism. Then， the semantic information was extracted by the ALBERT （A Lite BERT （Bidirectional Encoder Representations from Transformers）） model in another channel. Finally， the extracted semantic features were integrated to improve the generalization ability of the overall model. The model was tested on the Laptop and Restaurants datasets in SemEval2014， the ACL’s （Association for Computational Linguistics） Twitter dataset， and the SST-2 dataset created by Stanford University. The proposed model achieved the classification accuracy of 80.88%， 91.78%， 76.78% and 95.53%， respectively； compared with the baseline model BERT_Large， it increased the classification accuracy by 0.99%， 1.13%， 3.39% and 2.84% respectively； compared with P-tuning v2， the proposed model had 2.88%， 3.60% and 2.06% improvements in classification accuracy on Restaurants， Twitter， and SST-2 datasets respectively， and it reached the convergence state earlier than the original method.

Fast adversarial training method based on random noise and adaptive step size

Jinfu WU, Yi LIU

2024, 44(6): 1807-1815. DOI: 10.11772/j.issn.1001-9081.2023060774

Asbtract ( )

HTML ( )

PDF (1851KB) ( )

Figures and Tables | References | Related Articles | Metrics

Adversarial Training （AT） and its variants have been proven to be the most effective methods for defending against adversarial attacks. However， the process of generating adversarial examples requires extensive computational resources， resulting in low model training efficiency and limited feasibility. On the other hand， Fast AT （Fast-AT） uses single-step adversarial attacks to replace multi-step attacks for accelerating the training process， but its model robustness is much lower than that of multi-step AT methods， and it is susceptible to Catastrophic Overfitting （CO）. To address these issues， a Fast-AT method based on random noise and adaptive step size was proposed. Firstly， in each iteration of generating adversarial examples， random noise was added to the original input images for data augmentation. Then， the gradients of each adversarial example during the training process were accumulated， and the step size of the adversarial examples was adaptively adjusted based on the gradient information. Finally， adversarial attacks were performed according to the perturbation step size and gradient information to generate adversarial examples for model training. Various adversarial attacks were conducted on the CIFAR-10 and CIFAR-100 datasets， and compared to N-FGSM （Noise Fast Gradient Sign Method）， the proposed method achieved at least a 0.35 percentage point improvement in robust accuracy. The experimental results demonstrate that the proposed method can avoid CO issue in Fast-AT and enhance the robustness of deep learning models.

Semi-supervised heterophilic graph representation learning model based on Graph Transformer

Shibin LI, Jun GONG, Shengjun TANG

2024, 44(6): 1816-1823. DOI: 10.11772/j.issn.1001-9081.2023060811

Asbtract ( )

HTML ( )

PDF (2420KB) ( )

Figures and Tables | References | Related Articles | Metrics

Existing Graph Convolutional Network （GCN） methods are based on the assumption of homophily， which cannot be directly applied to heterophilic graph representation learning， and many studies on heterophilic graph representation learning are limited by message-passing mechanism， which leads to the problem of over-smoothing due to the confusion and over-squeezing of node features. To address these issues， a semi-supervised heterophilic graph representation learning model based on Graph Transformer，named HPGT（HeteroPhilic Graph Transformer）， was proposed. Firstly， the path neighborhood of a node was sampled using the degree connection probability matrix， then the heterophilic connection patterns of nodes on the path were adaptively aggregated through the self-attention mechanism， which were encoded to obtain the structural information of nodes， and the original attribute information and structural information of nodes were used to construct the self-attention module of the Transformer layer. Secondly， the hidden layer representation of each node itself was separated from those of its neighboring nodes and updated to avoid the node aggregating too much information about itself through the self-attention module， and then the representation and the neighborhood representation of nodes were connected to get the output of a single Transformer layer； in addition， the outputs of all Transformer layers were connected to get the final node hidden layer representation so as to prevent the loss of information in middle layers. Finally， the linear layer and Softmax layer were used to map the hidden layer representations of nodes to the predictive labels of nodes. In the comparison experiments with the model without Structural Encoding （SE）， SE based on degree connection probability provides effective deviation information for self-attention modules of Transformer layers， and improves the average accuracy of HPGT by 0.99% to 11.98%. Compared with the comparative models， on the heterophilic datasets （Texas， Cornell， Wisconsin， and Actor）， the node classification accuracies of HPGT are improved by 0.21% to 1.69%， and on homophilic datasets （Cora， CiteSeer， and PubMed）， the node classification accuracies reach 0.837 9， 0.746 7 and 0.886 2， respectively. The experimental results show that HPGT has a strong ability for heterogeneous graph representation learning， and is particularly suitable for node classification tasks of strong heterophilic graphs.

Shorter long-sequence time series forecasting model

Zexin XU, Lei YANG, Kangshun LI

2024, 44(6): 1824-1831. DOI: 10.11772/j.issn.1001-9081.2023060799

Asbtract ( )

HTML ( )

PDF (2751KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem that most of the existing researches study short-sequence time series forecasting and long-sequence time series forecasting separately， which leads to the poor forecasting accuracy of the model in the shorter long-sequence time series， a Shorter Long-sequence Time Series Forecasting Model （SLTSFM） was proposed. Firstly， a Sequence-to-Sequence （Seq2Seq） structure was constructed using Convolutional Neural Network （CNN） and PBUSM （Probsparse Based on Uniform Selection Mechanism） self-attention mechanism， which was used to extract the features of the long-sequence input. Secondly， “far light， near heavy” strategy was designed to apply to reallocate the features of each time period extracted from multiple Long Short-Term Memory （LSTM） modules， which were more capable of short-sequence input feature extraction. Finally， the reallocated features were used to enhance the extracted long-sequence input features to improve the forecasting accuracy and realize the time series forecasting. Four publicly available time series datasets were utilized to verify the effectiveness of the proposed model. The experimental results demonstrate that， compared with the suboptimal comprehensive performing model Gated Recurrent Unit （GRU）， the Mean Absolute Error （MAE） metrics of SLTSFM were reduced by 61.54%， 13.48%， 0.92% and 19.58% for univariate time series forecasting， and were reduced by 17.01%， 18.13%， 3.24% and 6.73% for multivariate time series forecasting on the four datasets. It’s verified that SLTSFM is effective in improving the accuracy of shorter long-sequence time series forecasting.

Early classification model of multivariate time series based on orthogonal locality preserving projection and cost optimization

Zixuan YUAN, Xiaoqing WENG, Ningzhen GE

2024, 44(6): 1832-1841. DOI: 10.11772/j.issn.1001-9081.2023060761

Asbtract ( )

HTML ( )

PDF (1806KB) ( )

Figures and Tables | References | Related Articles | Metrics

Early Time Series Classification （ETSC） has two contradictory goals： earliness and accuracy. The realization of early classification is always at the expense of its accuracy. The existing optimization-based early classification methods of Multivariate Time Series （MTS） consider the costs of wrong classification and delayed decision-making in the cost function， but ignore the influence of local structure between samples in MTS dataset on classification performance. To solve the problem， an early classification model of MTS based on Orthogonal Locality Preserving Projection （OLPP） and cost Optimization for Accuracy and Earliness （OLPPMOAE） was proposed. First， MTS sample prefixes were mapped to a low-dimensional space by using OLPP to keep the local structure of the original dataset. Then， a group of Gaussian Process （GP） classifiers were trained in low-dimensional space， and the class probabilities of the training set at each moment were generated. Finally， Particle Swarm Optimization （PSO） algorithm was used to learn the optimal parameters in the stopping rule from these kinds of probabilities. The experimental results on six MTS datasets show that， the accuracy of OLPPMOAE is significantly higher than that of the cost-based model $R 1_C l r$ （stopping Rule and Cost function with regularization term l₁ and l₂） with essentially the same earliness， the average accuracy is improved by 11.33% to 15.35%， and the Harmonic Mean （HM） is improved by 4.71% to 9.01%. Therefore， the proposed model can classify MTS as early as possible with high accuracy.

Time series classification method based on multi-scale cross-attention fusion in time-frequency domain

Mei WANG, Xuesong SU, Jia LIU, Ruonan YIN, Shan HUANG

2024, 44(6): 1842-1847. DOI: 10.11772/j.issn.1001-9081.2023060731

Asbtract ( )

HTML ( )

PDF (2511KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the problem of low classification accuracy caused by insufficient potential information interaction between time series subsequences， a time series classification method based on multi-scale cross-attention fusion in time-frequency domain called TFFormer （Time-Frequency Transformer） was proposed. First， time and frequency spectrums of the original time series were divided into subsequences with the same length respectively， and the point-value coupling problem was solved by adding positional embedding after linear projection. Then， the long-term time series dependency problem was solved because the model was made to focus on more important time series features by Improved Multi-Head self-Attention （IMHA） mechanism. Finally， a multi-scale Cross-Modality Attention （CMA） module was proposed to enhance the interaction between the time domain and frequency domain， so that the model could further mine the frequency information of the time series. The experimental results show that compared with Fully Convolutional Network （FCN）， the classification accuracy of the proposed method on Trace， StarLightCurves and UWaveGestureLibraryAll datasets increased by 0.3， 0.9 and 1.4 percentage points. It is proved that by enhancing the information interaction between time domain and frequency domain of the time series， the model convergence speed and classification accuracy can be improved.

Distributed temporal index for temporal aggregation range query

Fanjun MENG, Bin HAN, Shucheng HUANG, Xiangdong MEI

2024, 44(6): 1848-1854. DOI: 10.11772/j.issn.1001-9081.2023060830

Asbtract ( )

HTML ( )

PDF (1444KB) ( )

Figures and Tables | References | Related Articles | Metrics

In the era of big data and cloud computing， querying and analyzing temporal big data faces many important challenges. Focused on the issues such as poor query performance and ineffective utilization of indexes for temporal aggregation range query， a Distributed Temporal Index （DTI） for temporal aggregation range query was proposed. Firstly， random or round-robin strategy was used to partition the temporal data. Secondly， intra-partition index construction algorithm based on timestamp’s bit array prefix was used to build intra-partition index， and partition statistics including time span were recorded. Thirdly， the data partitions whose time span overlapped with the query time interval were selected by predicate pushdown operation， and were pre-aggregated by index scan. Finally， all pre-aggregated values obtained from each partition were merged and aggregated by time. The experimental results show that the execution time of intra-partition index construction algorithm of the index for processing data with density of 2 400 entries per unit of time is similar to the execution time for processing data with density of 0.001 entries per unit of time. Compared to ParTime， the temporal aggregation range query algorithm with index takes at least 22% less time for each step when querying the data in the first 75% of timeline and at least 11% less time for each step when executing selective aggregation. Therefore， the algorithm with index is faster in most temporal aggregate range query tasks and its intra-partition index construction algorithm is capable to solve data sparsity problem with high efficiency.

Academic anomaly citation group detection based on local extended community detection

Xinrui LIN, Xiaofei WANG, Yan ZHU

2024, 44(6): 1855-1861. DOI: 10.11772/j.issn.1001-9081.2023050702

Asbtract ( )

HTML ( )

PDF (1689KB) ( )

Figures and Tables | References | Related Articles | Metrics

Some scholars in the academic social network may form anomaly citation groups， and excessively cite each other’s papers for profit. Most of the existing anomaly group detection algorithms separate community detection from node representation learning， which leads to the limited performance of anomaly group detection. To deal with the issue， a Group Anomaly Detection based on Local extended community detection （GADL） algorithm was proposed. The author anomaly citation features were extracted by using semantic information such as research field and title content of the paper. An extension metric function based on node transition similarity， node community membership， citation anomaly and BFS （Breath First Search） depth was defined. The optimal anomaly detection performance could be obtained by combining anomaly community detection and anomaly node detection， and jointly optimizing them in a unified framework. Compared with ALP algorithm， the proposed algorithm improved the Area Under Curve （AUC） by 6.07%， 5.35% and 3.38% respectively on the ACM， DBLP1， and DBLP2 datasets.Experimental results on real datasets show that GADL can effectively detect academic anomaly citations.

Multi-object cache side-channel attack detection model based on machine learning

Zihao YAO, Yuanming LI, Ziqiang MA, Yang LI, Lianggen WEI

2024, 44(6): 1862-1871. DOI: 10.11772/j.issn.1001-9081.2023060787

Asbtract ( )

HTML ( )

PDF (3387KB) ( )

Figures and Tables | References | Related Articles | Metrics

Current cache side-channel attack detection technology mainly aims at a single attack mode. The detection methods for two to three attacks are limited and cannot fully cover them. In addition， although the detection accuracy of a single attack is high， as the number of attacks increases， the accuracy decreases and false positives are easily generated. To effectively detect cache side-channel attacks， a multi-object cache side-channel attack detection model based on machine learning was proposed， which utilized Hardware Performance Counter （HPC） to collect various cache side-channel attack features. Firstly， relevant feature analysis was conducted on various cache side-channel attack modes， and key features were selected and data sets were collected. Then， independent training was carried out to establish a detection model for each attack mode. Finally， during detection， test data was input into multiple models in parallel. The detection results from multiple models were employed to ascertain the presence of any cache side-channel attack. Experimental results show that the proposed model reaches high accuracies of 99.91%， 98.69% and 99.54% respectively when detecting three cache side-channel attacks： Flush+Reload， Flush+Flush and Prime+Probe. Even when multiple attacks exist at the same time， various attack modes can be accurately identified.

Dual vertical federated learning framework incorporating secret sharing technology

Wei LUO, Jinquan LIU, Zheng ZHANG

2024, 44(6): 1872-1879. DOI: 10.11772/j.issn.1001-9081.2023060862

Asbtract ( )

HTML ( )

PDF (1227KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the issues of cross-media data fusion modeling and privacy protection in the hydropower industry， a dual vertical federated learning framework incorporating secret sharing technology was proposed. First， the participant nodes were stratified， with lower-tier nodes responsible for preliminary modeling， intermediate-tier nodes overseeing pre-model aggregation and optimization， and central nodes generating the final model. Then， in order to strengthen data privacy protection and prevent inference attacks， an intermediate parameter protection mechanism based on secret sharing technology was introduced， the communication data between the data owner and the model trainer was fragmented and divided， which ensured the covertness of the correspondence between the model parameters and the trainers， thereby increasing the complexity of inference attacks. Finally， in order to optimize the model aggregation process of federated learning， a node evaluation mechanism based on the disparity in information quantities was introduced， in which the node dissimilarity and data volume were assessed comprehensively. The weights of different nodes in model aggregation were finely adjusted， and the contribution of suspected malicious nodes was eliminated， thus optimizing the performance and convergence speed of the model. The real data of Guodian Dadu River Basin Hydropower Development Company Limited was selected for experiments. The results showed that： the intermediate parameter protection mechanism based on secret sharing technology was more stable during the convergence process and improves the convergence speed by approximately 14.6% compared to the differential privacy protection mechanism； by incorporating a node evaluation mechanism based on information disparity， the convergence speed was increased by approximately 13.5% compared to the federated averaging algorithm. It is verified that the proposed framework addresses the issue of cross-media data fusion modeling for hydropower data， and it possesses the advantages of data privacy protection and model convergence acceleration.

Efficient reversible data hiding scheme based on two-dimensional modulo operations

Yue LI, Dan TANG, Minjun SUN, Xie WANG, Hongliang CAI, Qiong ZENG

2024, 44(6): 1880-1888. DOI: 10.11772/j.issn.1001-9081.2023060900

Asbtract ( )

HTML ( )

PDF (6712KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem of low embedding efficiency and anti-detection capability when using Reversible Data Hiding （RDH） to protect data in scenarios with large data volume delivery， an efficient reversible data hiding scheme based on two-dimensional modulo operations was proposed. Firstly， a larger amount of information was embedded in smaller modified pixel values by modulo operations， then the number of embedded bits was increased by combining with an enhanced base conversion system， and finally the reversibility was conferred by using the halving method in combination with a double image. Simulation experiments conducted on the USC-SIPI standard image library show that the Peak Signal-to-Noise Ratio （PSNR） value of the steganographic image by the proposed scheme is about 40 dB when embedding up to 1 million bits of secret information， and the steganographic image can effectively resist the static attacks of RS （Regular Singular） steganalysis， Pixel-Difference Histogram （PDH） steganalysis， and bit-plane steganalysis. Therefore， the proposed scheme effectively improves the embedding efficiency of the original image， and has good anti-detection capability at the same time.

Blockchain sharding method for reducing cross-shard transaction proportion

Jiao LI, Xiushan ZHANG, Yuanhang NING

2024, 44(6): 1889-1896. DOI: 10.11772/j.issn.1001-9081.2023060757

Asbtract ( )

HTML ( )

PDF (2573KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the problems of high cross-shard transaction proportion and complex cross-shard transaction validation in optimizing blockchain performance， a blockchain sharding method for reducing cross-shard transaction proportion was proposed. Firstly， from the perspective of data sharding， a blockchain transaction sharding model was constructed， and evaluation indicators for sharding performance were provided. Then， for the long-term historical transaction data in blockchain， the sets of transaction frequencies for the sender and the receiver were constructed from the perspective of accounts’ correlation. Finally， a Frequency-considered Blockchain Transcation Sharding algorithm （FBTS） was designed to solve the problem of high cross-shard proportion in transaction sharding. The proposed algorithm was compared with Random Sharding Algorithm （RSA） and Modular Sharding Algorithm （MSA） under the sharding size of 2，3，5，7，15，20，30 and 50. The proposed algorithm outperformed RSA and MSA in terms of performance indicators such as cross-shard transaction proportion， average cross-shard number of accounts， and weighted average cross-shard number of accounts. In addition， the most accounts and transactions were concentrated at low cross-shard number， indicating that the completion of transaction does not involve multiple shards. The experimental results show that proposed algorithm can effectively reduce the cross-shard transaction proportion and shorten the delay of cross-shard transaction.

Improved adaptive large neighborhood search algorithm for multi-depot vehicle routing problem with time window

Yan LI, Dazhi PAN, Siqing ZHENG

2024, 44(6): 1897-1904. DOI: 10.11772/j.issn.1001-9081.2023060760

Asbtract ( )

HTML ( )

PDF (2184KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the Multi-Depot Vehicle Routing Problem with Time Window （MDVRPTW）， an Improved Adaptive Large Neighborhood Search algorithm （IALNS） was proposed. Firstly， a path segmentation algorithm was improved in the stage of constructing the initial solution. Then， in the optimization stage， the designed removal and repair heuristic operators were used to compete with each other to select the optimal operator， a scoring mechanism was introduced for the operators， and the heuristic operator was selected by roulette. At the same time， the iteration cycle was segmented and the operator weight information was dynamically adjusted in each cycle， effectively to prevent the algorithm from falling into local optimum. Finally， simulated annealing mechanism was adopted as the acceptance criterion of the solution. The relevant parameters of the IALNS were determined by experiments on the Cordeau normative instances， and the solution results of the proposed algorithm were compared with other representative research results in this field. The experimental results show that the solution error between IALNS and Variable Neighborhood Search （VNS） algorithm does not exceed 0.8%， even better in some cases； compared with the multi-phase improved shuffled frog leaping algorithm， the average time-consuming of the proposed algorithm is reduced by 12.8%， and the runtime is shorter for most instances. So the above results verify IALNS is an effective algorithm for solving MDVRPTW.

Hybrid classical-quantum classification model based on DenseNet

Feiyu ZHAI, Handa MA

2024, 44(6): 1905-1910. DOI: 10.11772/j.issn.1001-9081.2023050656

Asbtract ( )

HTML ( )

PDF (2793KB) ( )

Figures and Tables | References | Related Articles | Metrics

Existing image classification models are becoming more and more complex， and the hardware resources and computation time required for computation are increasing. A hybrid Classical-Quantum classification model based on DenseNet （CQDenseNet model） was proposed to address this problem. First， a Variational Quantum Circuit （VQC） that could operate on a Noisy Intermediate-Scale Quantum （NISQ） device was used as a classifier to replace the fully connected layer of DenseNet. Secondly， by using transfer learning， a pre-trained DenseNet model on the ImageNet dataset was utilized as a pre-training model for CQDenseNet. Finally， CQDenseNet model was compared with the benchmark models AlexNet， GoogLeNet， VGG19， ResNet and DenseNet-169 on Chinese Medicine and CIFAR-100 datasets. Experimental results show that CQDenseNet model is more effective than the best-performing benchmark model， with improvements of 2.2 and 7.4 percentage points in accuracy， 2.2 and 7.3 percentage points in precision， 2.2 and 7.1 percentage points in recall， and 2.3 and 6.4 percentage points in F1-score， respectively. It shows that the performance of the hybrid classical-quantum model is better than the classical models.

Progressive enhancement algorithm for low-light images based on layer guidance

Mengyuan HUANG, Kan CHANG, Mingyang LING, Xinjie WEI, Tuanfa QIN

2024, 44(6): 1911-1919. DOI: 10.11772/j.issn.1001-9081.2023060736

Asbtract ( )

HTML ( )

PDF (6161KB) ( )

Figures and Tables | References | Related Articles | Metrics

The quality of low-light images is poor and Low-Light Image Enhancement （LLIE） aims to improve the visual quality. Most of LLIE algorithms focus on enhancing luminance and contrast， while neglecting details. To solve this issue， a Progressive Enhancement algorithm for low-light images based on Layer Guidance （PELG） was proposed， which enhanced algorithm images to a suitable illumination level and reconstructed clear details. First， to reduce the task complexity and improve the efficiency， the image was decomposed into several frequency components by Laplace Pyramid （LP） decomposition. Secondly， since different frequency components exhibit correlation， a Transformer-based fusion model and a lightweight fusion model were respectively proposed for layer guidance. The Transformer-based model was applied between the low-frequency and the lowest high-frequency components. The lightweight model was applied between two neighbouring high-frequency components. By doing so， components were enhanced in a coarse-to-fine manner. Finally， the LP was used to reconstruct the image with uniform brightness and clear details. The experimental results show that， the proposed algorithm achieves the Peak Signal-to-Noise Ratio （PSNR） 2.3 dB higher than DSLR （Deep Stacked Laplacian Restorer） on LOL（LOw-Light dataset）-v1 and 0.55 dB higher than UNIE （Unsupervised Night Image Enhancement） on LOL-v2. Compared with other state-of-the-art LLIE algorithms， the proposed algorithm has shorter runtime and achieves significant improvement in objective and subjective quality， which is more suitable for real scenes.

Six degrees of freedom object pose estimation algorithm based on filter learning network

Yaxing BING, Yangping WANG, Jiu YONG, Haomou BAI

2024, 44(6): 1920-1926. DOI: 10.11772/j.issn.1001-9081.2023060866

Asbtract ( )

HTML ( )

PDF (2113KB) ( )

Figures and Tables | References | Related Articles | Metrics

Six Degrees of freedom （6D） object pose estimation algorithm based on filter learning network was proposed to address the accuracy and real-time performance of object pose estimation for weakly textured objects in complex scenes. Firstly， standard convolutions were replaced with Blueprint Separable Convolutions （BSConv） to reduce model parameters， and GeLU （Gaussian error Linear Unit） activation functions were used to better approximate normal distributions， thereby improving the performance of the network model. Secondly， an Upsampling Filtering And Encoding information Module （UFAEM） was proposed to compensate for the loss of key upsampling information. Finally， a Global Attention Mechanism （GAM） was proposed to increase contextual information and more effectively extracted information from input feature maps. The experimental results on publicly available datasets LineMOD， YCB-Video， and Occlusion LineMOD show that the proposed algorithm significantly reduces network parameters while improving accuracy. The network parameter count of the proposed algorithm is reduced by nearly three-quarters. Using the ADD（-S） metric， the accuracy of the proposed algorithm is improved by about 1.2 percentage points compared to the Dual?Stream algorithm on lineMOD dataset， by about 5.2 percentage points compared to the DenseFusion algorithm on YCB-Video dataset， and by about 6.6 percentage points compared to the Pixel-wise Voting Network （PVNet） algorithm on Occlusion LineMOD dataset. Through experimental results， it is known that the proposed algorithm has excellent performance in estimating the pose of weakly textured objects， and has a certain degree of robustness for estimating the pose of occluded objects.

Lightweight infrared road scene detection model based on multiscale and weighted coordinate attention

Xiaohui CHENG, Yuntian HUANG, Ruifang ZHANG

2024, 44(6): 1927-1934. DOI: 10.11772/j.issn.1001-9081.2023060775

Asbtract ( )

HTML ( )

PDF (3120KB) ( )

Figures and Tables | References | Related Articles | Metrics

In view of occlusion and lack of texture details of infrared targets in road scenes， which leads to false detection and missed detection， a lightweight infrared road scene detection YOLO （You Only Look Once） model based on Multi-Scale and weighted Coordinate attention （MSC-YOLO） was proposed. YOLOv7-tiny was taken as the baseline model. Firstly， a multi-scale pyramid module PSA （Pyramid Split Attention） was introduced in different intermediate feature layers of the MobileNetV3， and a lightweight backbone extraction network MSM-Net （Multi-Scale Mobile Network） for multi-scale feature extraction was designed to solve the problem of feature pollution caused by the fixed-size convolution kernel， improving the fine-grained extraction ability of targets of different scales. Secondly， Weighted Coordinate Attention （WCA） mechanism was integrated into the feature fusion network， and the target position information obtained from the vertical and horizontal spatial directions of the intermediate feature map was superimposed to enhance the fusion ability of target features in different dimensions. Finally， the positioning loss function was replaced to Efficient Intersection over Union （EIoU） to calculate the length and width influencing factors of the predicted frame and the real frame separately， accelerating the convergence. The verification experiment was carried out on the Flir dataset. Compared with the YOLOv7-tiny model， the number of parameters is reduced by 67.3%， the number of floating-point operations is reduced by 54.6%， and the model size is reduced by 60.5% under the premise that mAP（IoU=0.5）（mean Average Precision （IoU=0.5）） is only reduced by 0.7 percentage points. The Frames Per Second （FPS） reaches 101 on the RTA 2080Ti， achieving a balance between detection performance and lightweight， and meets the real-time detection requirements of infrared road scenes.

Ship identification model based on ResNet50 and improved attention mechanism

Yuanjiong LIU, Maozheng HE, Yibin HUANG, Cheng QIAN

2024, 44(6): 1935-1941. DOI: 10.11772/j.issn.1001-9081.2023060859

Asbtract ( )

HTML ( )

PDF (5065KB) ( )

Figures and Tables | References | Related Articles | Metrics

Automatic identification of marine ships plays an important role in alleviating the pressure of marine traffic. To address the problem of low automatic ship identification rate， a ship identification model based on ResNet50 （Residual Network50） and improved attention mechanism was proposed. Firstly， a ship data set was made by ourselves， and divided into the training set， the verification set and the test set， which were augmented by blurring and adding noise. Secondly， an improved attention module — Efficient Spatial Pyramid Attention Module （ESPAM） and ship type recognition model ResNet50_ESPAM were designed. Finally， the ResNet50_ESPAM was trained， verified and compared with other commonly used neural network models using ship data sets. The experimental results show that in the verification set， the highest accuracy of ResNet50_ESPAM is 95.5%， and the initial accuracy is 81.2%； compared with AlexNet（Alex Krizhevsky Network）， GoogleNet （Google Inception Net）， ResNet34（Residual Network34）， ResNet50 and ResNet50_CBAM （ResNet50_Convlutional Block Attention Module）， the maximum accuracy of the model validation set increases by 5.1， 4.9， 2.6， 1.6 and 1.4 percentage points respectively， and the initial accuracy of the validation set increases by 49.4， 44.7， 27.7， 3.0 and 2.1 percentage points respectively， indicating that ResNet50_ESPAM has a high recognition accuracy in ship type recognition， and the improved attention module ESPAM is highly effective.

Rectified cross pseudo supervision method with attention mechanism for stroke lesion segmentation

Yan ZHOU, Yang LI

2024, 44(6): 1942-1948. DOI: 10.11772/j.issn.1001-9081.2023060742

Asbtract ( )

HTML ( )

PDF (1757KB) ( )

Figures and Tables | References | Related Articles | Metrics

The automatic segmentation of brain lesions provides a reliable basis for the timely diagnosis and treatment of stroke patients and the formulation of diagnosis and treatment plans， but obtaining large-scale labeled data is expensive and time-consuming. Semi-Supervised Learning （SSL） methods alleviate this problem by utilizing a large number of unlabeled images and a limited number of labeled images. Aiming at the two problems of pseudo-label noise in SSL and the lack of ability of existing Three-Dimensional （3D） networks to focus on smaller objects， a semi-supervised method was proposed， namely， a rectified cross pseudo supervised method with attention mechanism for stroke lesion segmentation RPE-CPS （Rectified Cross Pseudo Supervision with Project & Excite modules）. First， the data was input into two 3D U-Net segmentation networks with the same structure but different initializations， and the obtained pseudo-segmentation graphs were used for cross-supervised training of the segmentation networks， making full use of the pseudo-label data to expand the training set， and encouraging a high similarity between the predictions of different initialized networks for the same input image. Second， a correction strategy about cross-pseudo-supervised approach based on uncertainty estimation was designed to reduce the impact of the noise in pseudo-labels. Finally， in the segmentation network of 3D U-Net， in order to improve the segmentation performance of small object classes， Project & Excite （PE） modules were added behind each encoder module， decoder module and bottleneck module. In order to verify the effectiveness of the proposed method， evaluation experiments were carried out on the Acute Ischemic Stroke （AIS） dataset of the cooperative hospital and the Ischemic Stroke Lesion Segmentation Challenge （ISLES2022） dataset. The experimental results showed that when only using 20% of the labeled data in the training set， the Dice Similarity Coefficient （DSC）， 95% Hausdorff Distance （HD95）， and Average Surface Distance （ASD） on the public dataset ISLES2022 reached 73.87%， 6.08 mm and 1.31 mm； on the AIS dataset， DSC， HD95， and ASD reached 67.74%， 15.38 mm and 1.05 mm， respectively. Compared with the state-of-the-art semi-supervised method Uncertainty Rectified Pyramid Consistency（URPC）， DSC improved by 2.19 and 3.43 percentage points， respectively. The proposed method can effectively utilize unlabeled data to improve segmentation accuracy， outperforms other semi-supervised methods， and is robust.

Review of YOLO algorithm and its applications to object detection in autonomous driving scenes

Yaping DENG, Yingjiang LI

2024, 44(6): 1949-1958. DOI: 10.11772/j.issn.1001-9081.2023060889

Asbtract ( )

HTML ( )

PDF (1175KB) ( )

Figures and Tables | References | Related Articles | Metrics

Object detection in autonomous driving scenes is one of the important research directions in computer vision. The researches focus on ensuring real-time and accurate object detection of objects by autonomous vehicles. Recently， a rapid development in deep learning technology had been witnessed， and its wide application in the field of autonomous driving had prompted substantial progress in this field. An analysis was conducted on the research status of object detection by YOLO （You Only Look Once） algorithms in the field of autonomous driving from the following four aspects. Firstly， the ideas and improvement methods of the single-stage YOLO series of detection algorithms were summarized， and the advantages and disadvantages of the YOLO series of algorithms were analyzed. Secondly， the YOLO algorithm-based object detection applications in autonomous driving scenes were introduced， the research status and applications for the detection and recognition of traffic vehicles， pedestrians， and traffic signals were expounded and summarized respectively. Additionally， the commonly used evaluation indicators in object detection， as well as the object detection datasets and automatic driving scene datasets， were summarized. Lastly， the problems and future development directions of object detection were discussed.

Trajectory planning for autonomous vehicles based on model predictive control

Chao GE, Jiabin ZHANG, Lei WANG, Zhixin LUN

2024, 44(6): 1959-1964. DOI: 10.11772/j.issn.1001-9081.2023050725

Asbtract ( )

HTML ( )

PDF (2720KB) ( )

Figures and Tables | References | Related Articles | Metrics

To help the autonomous vehicle plan a safe， comfortable and efficient driving trajectory， a trajectory planning approach based on model predictive control was proposed. First， to simplify the planning environment， a safe and feasible “three-circle” expansion of the safety zone was introduced， which also eliminates the collision issues caused by the idealized model of the vehicle. Then， the trajectory planning was decoupled in lateral and longitudinal space. A model prediction method was applied for lateral planning to generate a series of candidate trajectories that met the driving requirements， and a dynamic planning approach was utilized for longitudinal planning， which improved the efficiency of the planning process. Eventually， the factors affecting the selection of optimal trajectories were considered comprehensively， and an optimal trajectory evaluation function was proposed for path planning and speed planning more compatible with the driving requirements. The effectiveness of the proposed algorithm was verified by joint simulation with Matlab/Simulink， Prescan and Carsim software. Experimental results indicate that the vehicle achieves the expected effects in terms of comfort metrics， steering wheel angle variation and localization accuracy， and the planning curve also perfectly matches the tracking curve， which validates the advantage of the proposed algorithm.

Gait recognition method based on two-branch convolutional network

Xiaolu WANG, Wangfei QIAN

2024, 44(6): 1965-1971. DOI: 10.11772/j.issn.1001-9081.2023060897

Asbtract ( )

HTML ( )

PDF (1361KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem that gait recognition is easily affected by changes in shooting angle and appearance， a gait recognition method based on a two-branch convolutional network was proposed. Firstly， a data augmentation method of random cropping and random occlusion， named RRDA（Restricted Random Data Augmentation）， was proposed to expand the data samples of appearance changes and improve the robustness of model occlusion. Secondly， the attention mechanism was used to form a two-branch Composite-Convolutional （C-Conv） layer to extract gait features. One branch network extracted the global and most recognizable information of pedestrian appearance through Horizontal Pyramid Mapping （HPM）； the other branch used multiple parallel Micro-Motion Capture Modules （MCMs） to extract short-term gait spatio-temporal information. Finally， the feature information of the two branches was added and fused， and then the gait recognition was achieved through a fully connected layer. A joint loss function was constructed based on the discriminative ability of balanced sample features and the convergence of the model to accelerate the convergence of the model. Experiments were conducted on the gait recognition dataset CASIA-B， the recognition accuracies of the proposed method in three states of walking are 97.40%， 93.67% and 81.19%， which are higher than those of GaitSet method， CapsNet method， two-stream gait method and GaitPart method； compared to GaitSet method， the recognition accuracy of the proposed method is 1.30 percentage points higher in the state of normal walking， 2.87 percentage points higher on carrying backpack， and 10.89 percentage points higher on wearing jacket. Experimental results show that the proposed method is feasible and effective.

3D object detection network based on self-attention mechanism and graph convolution

Yue LIU, Fang LIU, Aoyun WU, Qiuyue CHAI, Tianxiao WANG

2024, 44(6): 1972-1977. DOI: 10.11772/j.issn.1001-9081.2023060767

Asbtract ( )

HTML ( )

PDF (3215KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problems that the detection accuracy of small objects such as cyclists and pedestrians in Three-Dimensional （3D） object detection is low， and it is difficult to adapt to complex urban road conditions， a 3D object detection network based on self-attention mechanism and graph convolution was proposed. Firstly， in order to obtain more discriminative small object features， self-attention mechanism was introduced into the backbone network to make the network more sensitive to small object features and improve the ability to extract network features. Secondly， a feature fusion module was constructed based on the self-attention mechanism to further enrich the information of shallow network and enhance the feature expression ability of deep network. Finally， dynamic graph convolution was used to predict the boundary box of the object， improving the accuracy of object prediction. The proposed network was tested on KITTI dataset， and compared to eight major networks such as TANet （Triple Attention Network） and IA-SSD （Instance-Aware Single-Stage Detector）. The experimental results show that the pedestrian detection accuracy of the proposed network is increased by 12.12， 13.82 and 11.03 percentage points compared with TANet， which has the suboptimal pedestrian detection accuracy， under three difficulty levels of simple， medium，and difficult degrees； the cyclist detection accuracy of the proposed network is 3.06 and 5.34 percentage points higher than that of IA-SSD under medium and difficult degrees. In summary， the network proposed in this paper can be better applied to small object detection tasks.

Human vital signs detection algorithm based on frequency modulated continuous wave radar

Mu LI, Yu LUO, Xizheng KE

2024, 44(6): 1978-1986. DOI: 10.11772/j.issn.1001-9081.2023060737

Asbtract ( )

HTML ( )

PDF (3361KB) ( )

Figures and Tables | References | Related Articles | Metrics

For problems such as low accuracy and poor real-time detection of existing radar non-contact vital signs detection， a human vital signs detection algorithm based on Frequency Modulated Continuous Wave （FMCW） radar was proposed. Firstly，the vital signs signal was obtained through the millimeter wave radar. Then， the adaptive decomposition and reconstruction of the vital signs signal were achieved using the improved Empirical Wavelet Transformation （EWT） algorithm. The best value of the spectrum division line was found by introducing Sparrow Search Algorithm （SSA） and Fuzzy Entropy （FE）. Finally，the heart rate and respiratory rate were calculated using the estimation algorithm with improved frequency interpolation. The superiority and robustness of the proposed algorithm were verified through comparative experiments with a medical critical care monitor. The experimental results showed that compared with Wavelet Transform （WT） algorithm， Complementary Ensemble Empirical Mode Decomposition （CEEMD） algorithm and Variational Mode Decomposition （VMD） algorithm， the Mean Square Error （MSE） was reduced by 77.65， 27.25 and 21.05， the Mean Absolute Percentage （MAPE） was reduced by 7.33， 4.33 and 3.42 percentage points， and the real-time performance was improved by 0.72 s， 16.74 s and 1.87 s. At the same time， the proposed algorithm also achieves the detection of Heart Rate Variability （HRV）.

Table of Content