Loading...

Table of Content

    10 January 2025, Volume 45 Issue 1 Cover Download Catalog Download
    Artificial intelligence
    Survey of fairness in federated learning
    Shufen ZHANG, Hongyang ZHANG, Zhiqiang REN, Xuebin CHEN
    2025, 45(1):  1-14.  DOI: 10.11772/j.issn.1001-9081.2023121881
    Asbtract ( )   HTML ( )   PDF (907KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Federated Learning(FL) has experienced rapid development due to its advantages in distributed structure and privacy security. However, the fairness issues caused by large-scale FL affect the sustainability of FL systems. In response to the fairness issues in FL, recent researches on fairness in FL was reviewed systematically and analyzed deeply. Firstly, the workflow and definitions of FL were explained, and biases and fairness concepts in FL were summarized. Secondly, commonly used datasets in fairness research of FL were detailed, and the challenges faced by fairness research were discussed. Finally, the advantages, disadvantages, applicable scenarios, and experimental setting of relevant research work were summed up from four aspects: data source selection, model optimization, contribution evaluation, and incentive mechanism, and the future research directions and trends in fairness of FL were prospected.

    Link prediction model based on directed hypergraph adaptive convolution
    Wenbo ZHAO, Zitong MA, Zhe YANG
    2025, 45(1):  15-23.  DOI: 10.11772/j.issn.1001-9081.2023121847
    Asbtract ( )   HTML ( )   PDF (2143KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Although diverse solutions for link prediction have been provided by Graph Neural Networks (GNN), the recent models have significant shortcomings in fully utilizing high-order and asymmetric information between vertices caused by the structural constraints of ordinary graphs. To address the above issues, a link prediction model based on directed hypergraph adaptive convolution was proposed. Firstly, the directed hypergraph structure was employed to represent high-order and directional information between vertices more sufficiently, possessing advantages of both hypergraphs and directed graphs. Secondly, an adaptive information propagation method was adopted by directed hypergraph adaptive convolution to replace the directional information propagation method in traditional directed hypergraphs, thereby solving the problem of ineffective updating of embeddings for tail vertices of directed hyperedges, and solving the problem of excessive smoothing of vertices caused by multi-layer convolution. Experimental results based on explicit vertex features on Citeseer dataset show that the proposed model achieves a 2.23 percentage points increase in the Area Under the ROC (Receiver Operating Characteristic) Curve (AUC) and a 1.31 percentage points increase in Average Precision (AP) compared to the Directed Hypergraph Neural Network (DHNN) model in link prediction task. Therefore, it can be concluded that this model expresses the relationships between vertices adequately and improves the accuracy of link prediction task effectively.

    Potential relation mining in internet of things threat intelligence knowledge graph
    Zidong CHENG, Peng LI, Feng ZHU
    2025, 45(1):  24-31.  DOI: 10.11772/j.issn.1001-9081.2024010136
    Asbtract ( )   HTML ( )   PDF (2239KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Knowledge graph plays a crucial role in the sharing and utilization of Internet of Things Threat Intelligence (ITI). Graph Neural Network (GNN) can be applied to tasks of knowledge representation in ITI Knowledge Graph (ITIKG), thereby mining potential relations in ITIKG. However, most existing GNNs fail to consider the influence of node types on node representation capability and employ random sampling strategies for node sampling during node information aggregation, leading to an inability to distinguish neighbors at different distances and a lack of consideration for correlations among or importance of nodes. To address these issues, firstly ITIKG was constructed on the basis of various data sources. Subsequently, a deterministic sampling method was designed to sample the neighbors of root node based on node importance, and consider the distance between neighbors and root node, as well as the centrality measurement of neighbors in the graph, namely Katz centrality and betweenness centrality. Finally, embedding and aggregation methods of node, node modality, and node type were devised. On this basis, a Deterministic Multimodal Heterogeneous Graph Neural Network (DM-HGNN) model was proposed. Experimental results on link prediction in the constructed ITIKG demonstrate that the performance of DM-HGNN model is better than that of knowledge representation models such as metapath2vec, Multi-modal Knowledge Graph Representation Learning (MMKRL), and Complex Graph Convolutional Network (ComplexGCN). Compared to the suboptimal model MMKRL, DM-HGNN model exhibits an improvement of 6.8% in Area Under the Curve (AUC) and 7.1% in F1-score, indicating the effectiveness and advancement of DM-HGNN model in link prediction tasks.

    Knowledge graph multi-hop reasoning model fusing path and subgraph features
    Rui LI, Guanfeng LI, Dezhou HU, Wenxin GAO
    2025, 45(1):  32-39.  DOI: 10.11772/j.issn.1001-9081.2024010050
    Asbtract ( )   HTML ( )   PDF (1294KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the issues that the knowledge reasoning model has difficulties in capturing multi-level semantic information and the lack of consideration for the different influence of weights of the interpretability of single paths on the correct answer, a Knowledge Graph (KG) multi-hop reasoning model fusing path and subgraph features — PS-HAM (Hierarchical Attention Model fusing Path-Subgraph features) was proposed. In PS-HAM, entity neighborhood information and connection path information were fused, and the multi-granularity features were explored aiming at different paths. Firstly, the path-level feature extraction module was used to extract the connection path between each entity pair, and the hierarchical attention mechanism was employed to capture information with different granularities, which was used as the path-level representation. Secondly, the subgraph feature extraction module was used to aggregate the entity's neighborhood information through the Relational Graph Convolutional Network (RGCN). Finally, the path-subgraph feature fusion module was employed to fuse the path-level and subgraph-level feature vectors for the realization of fusion reasoning. Experimental results on two public datasets show that PS-HAM has effective performance improvement in both Mean Reciprocal Rank (MRR) and Hit@kk=1, 3, 10) indices. Compared with the MemoryPath model, the PS-HAM increased MRR index by 1.5 and 1.2 percentage points respectively on FB15k-237 and WN18RR datasets. At the same time, the parameter verification results of the subgraph hop number show that the optimal effect is achieved when the subgraph hop number is 3 on both datasets.

    HTLR: named entity recognition framework with hierarchical fusion of multi-knowledge
    Xueqiang LYU, Tao WANG, Xindong YOU, Ge XU
    2025, 45(1):  40-47.  DOI: 10.11772/j.issn.1001-9081.2023111699
    Asbtract ( )   HTML ( )   PDF (1466KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Chinese Named Entity Recognition (NER) tasks aim to extract entities from unstructured text and assign them to predefined entity categories. Aiming at the issue of insufficient semantic learning caused by the lack of contextual information in most Chinese NER methods, an NER framework with hierarchical fusion of multi-knowledge, named HTLR (Chinese NER method based on Hierarchical Transformer fusing Lexicon and Radical), was proposed to utilize hierarchically fused multi-knowledge to help the model learn richer and more comprehensive contextual and semantic information. Firstly, the lexicon contained in the corpus was identified and vectorized by using a publicly available Chinese lexicon table and word vector table. At the same time, the knowledge about Chinese lexicon was learned by modeling semantic relationships between lexicon and related characters through optimized position encoding. Secondly, the corpus was converted into the corresponding coding sequences to represent the character form information by the coding based on Chinese character radicals provided by Han Dian website, and an RFE-CNN (Radical Feature Extraction-Convolutional Neural Network) model was proposed for extracting radical information. Finally, the Hierarchical Transformer model was proposed, where semantic relationships between characters and lexicon, characters and radical forms in lower-level modules, and multi-knowledge about characters, lexicon, and radical forms were learned at higher-level modules, which helped the model acquire character representations with richer semantics. Experimental results on public datasets Weibo, Resume, MSRA, and OntoNotes4.0 show that the F1 values of the proposed method are improved by 9.43, 0.75, 1.76, and 6.45 percentage points, respectively, compared with those of the mainstream method NFLAT (Non-Flat-LAttice Transformer for Chinese named entity recognition), reaching the optimal level. It can be seen that multi-semantic knowledge, hierarchical fusion, the RFE-CNN structure, and Hierarchical Transformer structure are effective for learning rich semantic knowledge and improving model performance.

    Multi-objective exam paper generation guided by reinforcement learning and matrix completion
    Changzheng XING, Junfeng LIANG, Haibo JIN, Jiayu XU, Hairong WU
    2025, 45(1):  48-58.  DOI: 10.11772/j.issn.1001-9081.2024010010
    Asbtract ( )   HTML ( )   PDF (3169KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In view of the problem that the existing exam paper generation technologies pay too much attention to the difficulty of generated exam papers, while ignoring other related objectives, such as quality, score distribution, and skill coverage, a multi-objective exam paper generation method guided by reinforcement learning and matrix completion was proposed to optimize the specific objectives in the field of exam paper generation. Firstly, deep knowledge tracking method was used to model the interaction information among students and response logs in order to obtain the skill proficiency of the student group. Secondly, matrix factorization and matrix completion methods were used to predict the scores of students' undone exercises. Finally, based on the multi-objective exam paper generation strategy, in order to improve the Q network update efficiency, an Exam Q-Network function approximator was designed to select the appropriate question set automatically for update of the exam paper composition. Experimental results show that compared with the models such as DEGA (Diseased-Enhanced Genetic Algorithm) and SSA-GA (Sparrow Search Algorithm - Genetic Algorithm), it is verified that the proposed model has significant effect in solving multiple dilemmas of exam paper generation scenarios in terms of three indicators — difficulty, rationality and accuracy. The effect of verifying the models mentioned in the solution of the test papers is significantly effective.

    Bearings fault diagnosis method based on multi-pathed hierarchical mixture-of-experts model
    Xinran XU, Shaobing ZHANG, Miao CHENG, Yang ZHANG, Shang ZENG
    2025, 45(1):  59-68.  DOI: 10.11772/j.issn.1001-9081.2024010043
    Asbtract ( )   HTML ( )   PDF (3277KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In response to the issue of low accuracy in handling complex work conditions in rolling bearing fault diagnosis, a Multi-Task Learning (MTL) model naming as Multi-pathed Hierarchical Mixture-of-Experts (MHMoE), and the corresponding hierarchical training mode were proposed. In this model, by combining multi-stage, multi-task joint training, a hierarchical information sharing mode was achieved. The model's generalization and fault recognition accuracy were further improved on the basis of the ordinary MTL mode, enabling the model to perform tasks on both complex and simple datasets excellently. Meanwhile, by incorporating the bottleneck layer structure of one-dimensional ResNet, the depth of the network was ensured while avoiding issues such as vanishing and exploding gradients, so as to extract relevant features of the dataset fully. Experimental results on the Paderborn University bearing fault dataset (PU) as the test dataset demonstrate that under varying degrees of working complexity, compared to the OMoE (One-gate Mixture-of-Experts) -ResNet18 model without MTL, the proposed model has the accuracy improved by 5.45 to 9.30 percentage points. Compared to the models such as Ensemble Empirical Mode Decomposition Hilbert spectral transform (EEMD-Hilbert), MMoE (Multi-gate Mixture-of-Experts), and Multi-Scale multi-Task Attention Convolutional Neural Network (MSTACNN), the proposed model has the accuracy improved by 3.21 to 16.45 percentage points at least.

    Chinese-Vietnamese neural machine translation model incorporating entity translation
    Shengxiang GAO, Zhe HOU, Zhengtao YU, Hua LAI
    2025, 45(1):  69-74.  DOI: 10.11772/j.issn.1001-9081.2023121880
    Asbtract ( )   HTML ( )   PDF (1039KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In low-resource Chinese-Vietnamese translation tasks, translating entity words in sentences accurately is a significant challenge. In order to solve the problems such as the low frequency of entity words in training corpus and the inability of the model to construct the mapping relationship between bilingual entity words, a Chinese-Vietnamese neural machine translation model that incorporates entity translation was constructed. Firstly, the translation results of entity words in the source sentence were obtained through a Chinese-Vietnamese bilingual entity dictionary. Then, these results were concatenated at the end of the source sentence as input to the model, and the “constraint prompt information” was introduced at the encoding end to enhance representation. Finally, a pointer network mechanism was integrated at the decoding end to ensure that the model was able to replicate the vocabulary of the source sentence. Experimental results show that this model achieves increases of 1.37 and 0.21 points in BiLingual Evaluation Understudy (BLEU) for Chinese-Vietnamese translation and Vietnamese-Chinese translation compared to the cross-lingual language model — XLM-R (Cross-lingual Language Model-RoBERTa) and shortens training time by 3.19% and 3.50% compared to Transformer for Chinese-Vietnamese translation and Vietnamese-Chinese translation, enhancing the comprehensive performance of entity word translation in sentences effectively.

    Joint entity-relation extraction method for ancient Chinese books based on prompt learning and global pointer network
    Bin LI, Min LIN, Siriguleng, Yingjie GAO, Yurong WANG, Shujun ZHANG
    2025, 45(1):  75-81.  DOI: 10.11772/j.issn.1001-9081.2023121843
    Asbtract ( )   HTML ( )   PDF (1437KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Joint entity-relation extraction methods based on “pre-training + fine-tuning” paradigm rely on large-scale annotated data. In the small sample scenarios of ancient Chinese books where data annotation is difficult and costly, the fine-tuning efficiency is low and the extraction performance is poor; entity nesting and relation overlapping problems are common in ancient Chinese books, which limit the effect of joint entity-relation extraction; pipeline extraction methods have error propagation problems, which affect the extraction effect. In response to the above problems, a joint entity-relation extraction method for ancient Chinese books based on prompt learning and global pointer network was proposed. Firstly, the prompt learning method of span extraction reading comprehension was used to inject domain knowledge into the Pre-trained Language Model (PLM) to unify the optimization goals of pre-training and fine-tuning, and the input sentences were encoded. Then, the global pointer networks were used to predict and jointly decode the boundaries of subject and object and the boundaries of subject and object of different relationships, so as to align into entity-relation triples, and complete the construction of PTBG (Prompt Tuned BERT with Global pointer) model. As the results, the problem of entity nesting and relation overlapping was solved, and the error propagation problem of pipeline decoding was avoided. Finally, based on the above work, the influence of different prompt templates on extraction performance was analyzed. Experimental results on Records of the Grand Historian dataset show that compared with OneRel model before and after injecting domain knowledge, the PTBG model has the F1-value increased by 1.64 and 1.97 percentage points respectively. It can be seen that the PTBG model can better extract entity-relation jointly in ancient Chinese books, and provides new research ideas and approaches for low-resource, small-sample deep learning scenarios.

    Hierarchical multi-label classification model for public complaints with long-tailed distribution
    Xin LIU, Dawei YANG, Changheng SHAO, Haiwen WANG, Mingjiang PANG, Yanru LI
    2025, 45(1):  82-89.  DOI: 10.11772/j.issn.1001-9081.2024010085
    Asbtract ( )   HTML ( )   PDF (1426KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Swift response to public complaints is an important measure to realize intelligent social governance and improve people’s satisfaction. It is particularly crucial to analyze public complaints accurately to match work order processing departments intelligently, and to realize swift response and efficient handling of public complaints. However, the vague description of complaints, confusion of categories and imbalance of proportion in public complaint data lead to difficulties in analyzing categories of complaints, thus reducing the efficiency and accuracy of intelligent order dispatching. To solve the above problems, a hierarchical multi-label classification model (HMCHotline) for complaints with encoder-decoder structure was proposed. Firstly, the fine-grained keyword prior knowledge in complaint domain was introduced into the text encoder to suppress noise interference, and the spatio-temporal information in complaints was fused to improve the discriminant ability of semantic features. Secondly, the label hierarchy was used to generate label embeddings with hierarchy-awareness and semantic-awareness, and a label decoder based on the Transformer model was constructed to decode labels using the semantic features from the complaints and label features. At the same time,the dynamic label table strategy was introduced based on the hierarchical dependency to limit the decoding range of labels for solving the problem of label inconsistency. Finally, the Softmax grouping strategy was used to divide the label categories with the similar size into the same group for Softmax operation, which alleviated the problem of low classification accuracy caused by the long-tailed distribution of labels. Experimental results on Hotline, RCV1 (Reuters Corpus Volume I) -v2 and WOS (Web Of Science) datasets show that compared with Hierarchy-aware label semantics Matching network (HiMatch), the proposed model improves the Micro-F1 by 1.65, 2.06 and 0.43 percentage points respectively, proving the effectiveness of the proposed model.

    Deep temporal event detection algorithm based on signal temporal logic
    Siqi ZHANG, Jinjun ZHANG, Tianyi WANG, Xiaolin QIN
    2025, 45(1):  90-97.  DOI: 10.11772/j.issn.1001-9081.2024010131
    Asbtract ( )   HTML ( )   PDF (1725KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the issues of insufficient accuracy in detecting complex temporal events and the neglect of inter-event correlations of deep event detection models, a deep temporal event detection algorithm based on temporal logic, DSTL (Deep Signal Temporal Logic), was proposed. In the algorithm, for one thing, a framework of signal temporal logic was introduced, and events in time series were modeled using Signal Temporal Logic (STL) formulae to consider the logicality and temporality of events in time series comprehensively. For another, a neural network-based base classifier was utilized to detect the occurrence of atomic events, and detection of complex events was aided by structures and semantics of STL formulae. Additionally, neural network modules were employed to replace the corresponding logical conjunctions and temporal logic operators to provide neural network modules supporting GPU acceleration and gradient descent. Through experiments on six time series datasets, the effectiveness of the proposed algorithm in temporal event detection was validated, and the model using DSTL algorithm was compared with deep temporal event detection models using MLP (Multilayer Perceptron), Long Short-Term Memory (LSTM) network and Transformer without using this algorithm. The results indicate that the model using DSTL algorithm has an approximate 12% improvement in average F1 score on five event categories, with an approximate 14% improvement in average F1 score for three categories of cross-time point events, and it has better interpretability.

    Deep spatio-temporal network model for multi-time step wind power prediction
    Jianpeng HU, Lichen ZHANG
    2025, 45(1):  98-105.  DOI: 10.11772/j.issn.1001-9081.2023121750
    Asbtract ( )   HTML ( )   PDF (3680KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Reliable guidance and foundation for decision-making in wind power energy industry can be provided by accurate wind power prediction. However, the traditional modeling methods mainly transform wind power prediction problem into a time series prediction problem, ignoring the spatial information among turbines. Therefore, a deep spatio-temporal network model for multi-time step wind power prediction was introduced with an encoder-decoder architecture employed in the model. Firstly, a map was constructed based on historical power information by the encoder, and turbine features integrating spatial information of the wind farm were extracted using Graph ATtention network (GAT). Secondly, the temporal characteristics of the input data were extracted by Gated Recurrent Unit (GRU), thereby obtaining the temporal features of wind energy of this turbine. Finally, after fusing the spatio-temporal features output by the encoder in the decoder, Sample Convolution and Interaction Network (SCINet) was used to integrating spatio-temporal features at different time scale resolutions, and prediction for future wind power over multiple time steps were output. Experimental results on WindFarm1 dataset show that with 72 prediction steps, the proposed model has the Mean Absolute Error (MAE) reduced to 42.38, representing a 4.25% improvement over Bidirectional Gated Recurrent Unit (Bi-GRU); the proposed model has the Root Mean Square Error (RMSE) reduced to 42.71, showing an 8.70% improvement over Autoformer. The results of the generalization experiments on the WindFarm2 dataset demonstrate the proposed model’s applicability to different wind farms, providing a new way to accurately predict future wind power.

    Data science and technology
    Sequential recommendation model based on multi-level graph contrastive learning
    Xiaosheng YU, Zhixin WANG
    2025, 45(1):  106-114.  DOI: 10.11772/j.issn.1001-9081.2024010126
    Asbtract ( )   HTML ( )   PDF (2785KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problem that current sequential recommendation models based on contrastive learning only consider representation learning at the level of items and items or sequences and sequences, and are incapable of capturing fined and distinguishable user representations and item representations, a Multi-Level Graph Contrastive Learning based Sequential Recommendation (MLGCL-SR) model was proposed. Firstly, the item transition graph was constructed on the basis of sequential order of user-clicked items and performed to embedded representation. Secondly, user representation learning was conducted on the embedded item transition graph using an optimized Bidirectional Gated Graph Neural Network (BI-GGNN). Finally, parameter updates for primary recommendation prediction task were performed through a cross-entropy loss function, and the multi-level contrastive learning tasks were used to assist in parameter updates of recommendation prediction tasks at embedding layer, node layer, and sequential layer; specifically, on the embedding layer and node layer for better item representations in recommendation prediction tasks, while on the sequential layer for better user representations in recommendation prediction tasks. Experimental results on three benchmark datasets, Sports, Beauty, and Toys, demonstrated that compared to the optimal baseline model MCLRec (Meta-optimized Contrastive Learning for sequential Recommendation), MLGCL-SR model has significant improvements on Hit Ratio (HR) and Normalized Discounted Cumulative Gain (NDCG) metrics, and increases on NDCG@10, the metric better reflecting recommendation effectiveness by 14.2%, 19.1%, and 23.1% respectively, validating the effectiveness of the model.

    Self-adaptive multi-view clustering algorithm with complementarity based on weighted anchors
    Zhuoyue OU, Xiuqin DENG, Lei CHEN
    2025, 45(1):  115-126.  DOI: 10.11772/j.issn.1001-9081.2023121724
    Asbtract ( )   HTML ( )   PDF (2525KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In multi-view clustering problems, how to fully mine the correlation information among views while reducing the influence of redundant information on clustering performance is an urgent problem that needs to be solved. But the existing related algorithms ignore the complementary information and differences among views,and do not consider the interference brought by redundant information, resulting in poor clustering performance. To address these issues, a Self-adaptive Multi-view clustering algorithm with Complementarity based on Weighted Anchors (SMCWA) was proposed. When dealing with the challenges of high-dimensional multi-view data, firstly, feature concatenation was transferred to anchor mechanism, so as to fuse the anchor graphs to utilize the complementary information among views. Secondly, to weaken the expression of redundant information, the weight of each anchor was determined dynamically through a weighted matrix during the iteration process. Finally, to utilize the differences among views, an auto-weighted mechanism was used to assign appropriate weight to each view adaptively. The complementarity among views, the weakening of redundant information, and the differences among views promoted and learned from each other in multi-step iterations in an integrated algorithm to obtain better clustering effect. Experimental results show that the proposed algorithm improves Matthews Correlation Coefficient (MCC) by 41.75% on dataset BDGP (Berkeley Drosophila Genome Project) compared to spectral clustering algorithm SC-Concat, improves MCC by 11.83% on dataset CCV (Columbia Consumer Video) compared to Large-scale Multi-View Subspace Clustering in linear time (LMVSC) algorithm, and improves MCC by 19.57% on dataset Caltech101-all compared to the spectral clustering algorithm SC-Best, demonstrating that the proposed algorithm makes full consideration of the complementary information, the differences among views and the redundant information to obtain better clustering performance.

    Cyber security
    Federated learning-based statistical prediction and differential privacy protection method for location big data
    Yan YAN, Xingying QIAN, Pengbin YAN, Jie YANG
    2025, 45(1):  127-135.  DOI: 10.11772/j.issn.1001-9081.2024010068
    Asbtract ( )   HTML ( )   PDF (4957KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the information silo problem and the risk of location privacy leakage caused by distributed location big data collection, a statistical prediction and privacy protection method for location big data was proposed on the basis of federated learning. Firstly, a horizontal federated learning-based statistical prediction release framework was constructed for location big data. The framework allowed data collectors in each administrative region to keep their raw data, and multiple participants to collaborate to complete the prediction model’s training task by exchanging training parameters. Secondly, PVTv2-CBAM was developed to improve the accuracy of prediction results at clients, aiming for the problem of statistical prediction location big data density with spatiotemporal sequence characteristics. Finally, combined with the MMA (Modified Moments Accountant) mechanism, a dynamic allocation and adjustment algorithm for differential privacy budget was proposed to achieve diffirential privacy protection of the client models. Experimental results show that compared to models such as Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Convolutional LSTM (ConvLSTM)the proposed PVTv2-CBAM improves the prediction accuracy by 0 to 62% on the Yellow_tripdata dataset and by 39% to 44% on the T-Driver trajectory dataset;the proposed differential privacy budget dynamic allocation and adjustment algorithm enhances the model prediction accuracy by about 5% and 6% at adjustment thresholds of 0.3 and 0.7, respectively, compared with no dynamic adjustment. The above validates the feasibility and effectiveness of the proposed method.

    Location privacy-preserving recommendation scheme based on federated graph neural network
    Liang ZHU, Jingzhe MU, Hongqiang ZUO, Jingzhong GU, Fubao ZHU
    2025, 45(1):  136-143.  DOI: 10.11772/j.issn.1001-9081.2024010044
    Asbtract ( )   HTML ( )   PDF (2880KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Traditional location service recommendation schemes lack consideration of user preferences and potential social relationships, resulting in recommendation results that fail to meet user’s personalized needs. Graph Neural Networks (GNNs) are widely used in the field of location recommendation by its good graph structure data processing capabilities. However, the previous studies’ centralized data paradigm is easy to lead to the issue of location privacy leakage. Therefore, a Location Privacy-preserving Recommendation scheme based on Federated Graph Neural Network (FedGNN-LPR) was proposed. Firstly, the user’s social relationship embedding and Point-Of-Interest (POI) embedding were learned through the graph attention network. Secondly, a POI-based pseudo labelling model was developed to predict the number of user visits to an unknown location, so as to protect user privacy and alleviate the cold-start problem. Finally, a clustered federated learning strategy based on differential privacy was proposed to protect client interaction data and solve the problem of data heterogeneity. Experiments were conducted on two publicly available real datasets, and the results demonstrate that the proposed scheme is reduced by 7.89% and 9.29% respectively compared to the Federated Averaging (FedAvg) algorithm, and 2.32% and 2.75% respectively compared to the FL+HC algorithm, in terms of Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). Moreover, it is shown that FedGNN-LPR exhibits better performance on federated learning location recommendation. Therefore, FedGNN-LPR not only protects user location privacy, but also improves location recommendation performance.

    Dynamic network defense scheme based on programmable software defined networks
    Zhibin ZUO, Kai YANG, Miaolei DENG, Demin WANG, Mimi MA
    2025, 45(1):  144-152.  DOI: 10.11772/j.issn.1001-9081.2024010090
    Asbtract ( )   HTML ( )   PDF (7655KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Sniffing attacks and flooding attacks are two common attack methods in the Internet of Things (IoT): sniffing attacks have strong concealment and aim to steal user data; flooding attacks are destructive and can affect normal network communication and services. Attackers may use sniffing attacks to find their targets, and then attack them through flooding attacks, which poses a serious security threat to IoT. However, defense measures such as endpoint information hopping, false IP hopping, and dual IP hopping focus on single type attacks and are difficult to effectively respond to such attack methods. A dynamic network defense scheme based on Programmable Software Defined Network (SDN) was proposed to address the security issues faced in the IoT environment. In the attack investigation stage, by dynamically changing the protocol number and periodically jumping the quadruple in the data packet, it is possible to successfully obfuscate the endpoint information, thereby effectively resisting sniffing attacks. During the attack implementation phase, by using first packet dropout and source authentication, it is possible to successfully resist flooding attacks and significantly improve network security. The simulation experiment results show that compared with traditional defense schemes against single type attacks, this scheme can effectively resist sniffing attacks and flooding attacks at different stages of network attacks, while maintaining lower communication latency and CPU load.

    Smart contract vulnerability detection method based on echo state network
    Chunxia LIU, Hanying XU, Gaimei GAO, Weichao DANG, Zilu LI
    2025, 45(1):  153-161.  DOI: 10.11772/j.issn.1001-9081.2024010025
    Asbtract ( )   HTML ( )   PDF (1988KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Smart contracts on blockchain platforms are decentralized applications to provide secure and trusted services to multiple parties on the chain. Smart contract vulnerability detection can ensure the security of these contracts. However, the existing methods for detecting smart contract vulnerabilities encountered issues of insufficient feature learning and low vulnerability detection accuracy when dealing with imbalanced sample sizes and incomplete semantic information mining. Moreover, these methods cannot detect new vulnerabilities in contracts. A smart contract vulnerability detection method based on Echo State Network (ESN) was proposed to address the above problems. Firstly, different semantic and syntactic edges were learned on the basis of contract graph, and feature vectors were obtained through Skip-Gram model training. Then, ESN was combined with transfer learning to achieve transfer and extension of new contract vulnerabilities in order to improve the vulnerability detection rate. Finally, experiments were conducted on the smart contract dataset collected on Etherscan platform. Experimental results show that the accuracy, precision, recall, and F1-score of the proposed method reach 94.30%, 97.54%, 91.68%, and 94.52%, respectively. Compared with Bidirectional Long Short-Term Memory (BLSTM) network and Bidirectional Long Short-Term Memory with ATTention mechanism (BLSTM-ATT), the proposed method has the accuracy increased by 5.93 and 11.75 percentage points respectively, and the vulnerability detection performance is better. The ablation experiments also further validate the effectiveness of ESN for smart contract vulnerability detection.

    Access control model for government collaboration
    Dayan ZHAO, Huajun HE, Yuping LI, Junbo ZHANG, Tianrui LI, Yu ZHENG
    2025, 45(1):  162-169.  DOI: 10.11772/j.issn.1001-9081.2024010133
    Asbtract ( )   HTML ( )   PDF (1976KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address characteristics of government collaborative scenarios, such as diverse and complex requirements, difficulty in managing personnel turnover, high data privacy level, and large data size, a Government-Based Access Control (GBAC) model for government collaboration was proposed. Access control in government collaborative scenarios must meet requirement for multiple departments performing different operations to the same resource. The existing access control technologies face issues of inadequate granularity and high maintenance costs, lacking secure, flexible, and precise access control model. Therefore, combining operating mechanisms of government departments, firstly, government organizational structure and administrative division structure were integrated into the access control model, and a belonging relationship tree of government staff, organizations, resources, and administrative divisions was constructed. Secondly, combined with attributes of organizations and positions which the government staff belongs to, a joint subject was constructed to achieve automatic granting and revoking permission. Thirdly, based on organizing functions and administrative division levels, a subject-object attribute matching strategy was designed to break data barriers and improve authentication efficiency. Finally, by introducing idea of permission hierarchy, data levels and functional levels were set for resources to control the access threshold of the subject, which enhanced model flexibility and further ensured data security. Experimental results show that compared with benchmark models such as Role-Based Access Control (RBAC) and Attribute-Based Access Control (ABAC), GBAC model reduces memory consumption and access latency significantly. It can be seen that the proposed model implements access management in government collaborative scenarios securely, effectively and flexibly.

    Credit based committee consensus mechanism
    Min SUN, Shihang JIAO, Chenyan WANG
    2025, 45(1):  170-177.  DOI: 10.11772/j.issn.1001-9081.2024010003
    Asbtract ( )   HTML ( )   PDF (3066KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Delegated Proof of Stake (DPoS), recognized as a mainstream consensus mechanism, encounters key problems including time-consuming election process, lack of active node participation in voting, and difficulties in handling malicious nodes, impeding its rapid development. In response to these problems, a credit-based committee consensus mechanism, Proof of Luck and Credit (PoLaC), was proposed. Firstly, credit value was served as the evaluation criterion for nodes’ historical behaviors, and nodes with high credit value were selected as committee members, thereby simplifying the election process significantly. Secondly, the concept of lucky value was introduced to enhance the successful election probabilities for ordinary nodes, thereby stimulating the participation of ordinary nodes in network consensus. Finally, a delayed forking method was employed to rectify the behavior of malicious nodes. Experimental results demonstrate that in terms of consensus communication overhead, PoLaC network has 30% less communication overhead than DPoS network with 50% voting intention. In terms of low-weighted node revenue, revenues in PoLaC network are three times higher than those in DPoS network. In terms of the percentage of malicious nodes in the committee, during the credit stabilization period, the number of malicious nodes in PoLaC network’s committee is approximately one-fifth of those in DPoS. Compared to other similar credit-based consensus mechanisms, PoLaC exhibits certain advantages in network communication overhead, node activity, and malicious node handling.

    Advanced computing
    Addressing robot path planning issues using S-shaped growth curve integrated grasshopper optimization algorithm
    Yi RAN, Yongsheng LI, Ye JIANG
    2025, 45(1):  178-185.  DOI: 10.11772/j.issn.1001-9081.2024010033
    Asbtract ( )   HTML ( )   PDF (1801KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Given that heuristic algorithms have low convergence accuracy, low search path efficiency and tendency to fall into local optimum in solving robot path planning problem, an S-shaped Growth Curve Integrated Grasshopper Optimization Algorithm (SGCIGOA) was proposed. Firstly, the initial population of grasshoppers was optimized through the introduction of Logistic chaotic sequences, which results in the enhancement of the diversity of the grasshopper population in the early stages of iteration. Secondly, the non-linear inertia weight of S-shaped growth curve was introduced to adjust the decline way of the decline parameter, thus improving algorithm convergence speed and optimization accuracy. Finally, a t-distribution based position disturbance mechanism was introduced during the iteration, enabling full utilization of effective information of the current population, thereby balancing global search and local exploitation and reducing the probability of the algorithm being trapped in local optimum. Experimental results show that compared with 10 comparison algorithms such as MOGOA (Multi-Objective Grasshopper Optimization Algorithm), IGOA (Improved Grasshopper Optimization Algorithm), and IAACO (Improvement Adaptive Ant Colony Optimization), the proposed algorithm reduces the optimal path length by an average of 0-14.78% and the average number of iterations by an of 56.60%-90.00% in simple environment, and has the optimal path length shortened by an average of 0-11.58% and the average number of iterations decreased by an of 45.00%-92.76% in complex environment. It can be seen that SGCIGOA represents an efficient approach to solving the path planning problem for mobile robots.

    Optimization algorithm entropy based on quantum dynamics
    Quan TANG, Peng WANG, Gang XIN
    2025, 45(1):  186-195.  DOI: 10.11772/j.issn.1001-9081.2023121760
    Asbtract ( )   HTML ( )   PDF (1806KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Entropy is a common description method in the analysis and research of optimization system. To address the lack of in-depth analysis of the inherent relationship between the dynamic behavior and entropy of different optimization systems, an optimization algorithm entropy based on quantum dynamics was proposed. Firstly, based on the similarity between Brownian motion and sampling behavior in physics, a Brownian motion description method for optimization problems was proposed. The mechanical expression of optimization problems was transformed into the form of energy and introduced into the Schr?dinger equation, and an optimization algorithm based on quantum dynamics was proposed. Then, the probability expression of optimization problems under the Schr?dinger equation was combined to obtain optimization algorithm entropy. Finally, the random behavior of particles under constraint of the objective function was analyzed, and the relationship between basic search behavior of optimization systems under quantum dynamics and entropy was given. By tracking and analyzing the dynamic behavior and entropy change trend of optimization systems from three different aspects: reference energy, free particle kinetic energy, and objective function disturbance, the correlation between entropy and search behavior of optimization systems was verified through experiments. Experiments results show that optimization algorithm entropy based on quantum dynamics can deeply analyze optimization process, providing a new idea and method for studying optimization algorithms.

    Computer software technology
    Causal intervention-based root cause analysis method for microservice system faults
    Jianli DING, Yufeng HE, Jing WANG
    2025, 45(1):  196-203.  DOI: 10.11772/j.issn.1001-9081.2024010054
    Asbtract ( )   HTML ( )   PDF (2268KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the causality loss, low analysis efficiency in complex environments and lack of analytical capability for non-machine indicator fault type in the existing fault root cause analysis methods, a Causal Intervention-based Microservice system Fault Root Cause Analysis (CIMF-RCA) was proposed. Firstly, the call chains and microservices were filtered by Markov assumption and call patterns, resulting in a reduced search space for intervention recognition and enhanced efficiency of the root cause analysis method in complex environments. Secondly, the joint analysis of machine indicator data and log data was achieved by parsing and integrating unstructured log data. Finally, an improved intervention recognition algorithm and a divide-and-conquer method for fault root cause analysis were proposed by introducing Causal Bayesian Network (CBN) and intervention data. Experimental results on Train-Ticket, a large-scale microservice benchmark platform show that, compared to the best-performing Root Cause Discovery (RCD) method, the proposed method increases the Top-5 average accuracy by 26.33 percentage points and reduces the required time by 41.61%. In type of non-machine indicator faults that RCD cannot recognize, the proposed method has the Top-5 accuracy reached 77.00%. It can be seen that the proposed method can analyze root causes of faults in microservice system effectively.

    Multimedia computing and computer simulation
    Dual-branch network guided by local entropy for dynamic scene high dynamic range imaging
    Ying HUANG, Changsheng LI, Hui PENG, Su LIU
    2025, 45(1):  204-213.  DOI: 10.11772/j.issn.1001-9081.2023121726
    Asbtract ( )   HTML ( )   PDF (7127KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    For addressing the issues of motion artifacts and exposure distortion in High Dynamic Range (HDR) imaging tasks based on a sequence of multiple exposed images when there is camera shake or subject movement, a dual-branch network guided by local entropy for dynamic scene HDR imaging was proposed. Firstly, the Discrete Wavelet Transform (DWT) was employed to separate the low-frequency illumination-related information and high-frequency motion-related information from the input images, enabling the network to address exposure and subject movement purposefully. Secondly, for the low-frequency illumination-related information branch, a module was designed to calculate attention using image local entropy, thereby guiding the network to reduce the extraction of exposure features lacking details. For the high-frequency motion-related information branch, a lightweight feature alignment module was introduced for consistent alignment of scene, thereby reducing the extraction of motion features. Finally, a time-domain self-attention module was constructed by integrating channel attention, thereby enhancing the mutual dependence of exposure image sequence in temporal domain, so as to further improve the quality of the results. Evaluation was performed on public datasets Kalantari, Sen, and Tursun. Experimental results on Kalantari dataset show that the proposed network achieves the first place in PSNR-l (42.20 dB) and the third place in SSIM-l (0.988 9) compared to some latest methods. By integrating experimental results on the remaining datasets, it can be seen that the proposed network can reduce exposure distortion and motion artifacts effectively, and generate images with abundant details and excellent visual effect.

    Point cloud registration method based on coordinate geometric sampling
    Jietao LIANG, Bing LUO, Lanhui FU, Qingling CHANG, Nannan LI, Ningbo YI, Qi FENG, Xin HE, Fuqin DENG
    2025, 45(1):  214-222.  DOI: 10.11772/j.issn.1001-9081.2024010045
    Asbtract ( )   HTML ( )   PDF (1746KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To improve accuracy, robustness, and generalization of point cloud registration and address the problem of the Iterative Closest Point (ICP) algorithm easily falling into local optimal solution, a point cloud registration method of coordinate Geometric Sampling based on Deep Closest Point (GSDCP) was proposed. Firstly, the central point curvature was estimated using coordinates of surrounding points of each point, and points that preserved geometric features of the point cloud were selected through curvature sizes, so as to realize downsampling of the point cloud. Secondly, a Dynamic Graph Convolutional Neural Network (DGCNN) was employed to coordinate with the downsampled point cloud to learn point cloud features that incorporated local geometry information, and contextual information was captured using a Transformer, and soft Pointers facilitate approximate combination and matching between two feature embedders. Finally, a differentiable Single Value Decomposition (SVD) layer was utilized to estimate the final rigid transformation. Point cloud registration experimental results on ModelNet40 dataset show that compared with ICP, Globally optimal ICP (Go-ICP), PointNetLK, Fast Global Registration (FGR), ADGCNNLK (Attention Dynamic Graph Convolutional Neural Network Lucas-Kanade), Deep Closest Point (DCP), and Multi-Features Guidance Network (MFGNet), GSDCP achieves all the best registration accuracy and robustness in scenarios with or without noise, as well as when the point cloud category is invisible. In noise-free scenario, GSDCP reduces rotational Mean Square Error (MSE) by 31.3% and translational MSE by 58.3% compared to MFGNet. In noisy scenario, GSDCP reduces rotational MSE by 33.9% and translational MSE by 73.4% compared to MFGNet. When the point cloud category is invisible, GSDCP reduces rotational MSE by 57.7% and translational MSE by 77.9% compared to MFGNet. Additionally, when dealing with incomplete point cloud data (including random occlusion and fragmentary point cloud), GSDCP exhibits reductions of 35.1% in rotational MSE and 39.8% in translational MSE compared to MFGNet when point cloud integrity is below 75%.

    Lightweight human pose estimation based on decoupled attention and ghost convolution
    Junying CHEN, Shijie GUO, Lingling CHEN
    2025, 45(1):  223-233.  DOI: 10.11772/j.issn.1001-9081.2024010099
    Asbtract ( )   HTML ( )   PDF (3442KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    With the development of lightweight networks, human pose estimation tasks can be performed on devices with limited computational resources. However, improving accuracy has become more challenging. These challenges mainly led by the contradiction between network complexity and computational resources, resulting in the sacrifice of representation capabilities when simplifying the model. To address these issues, a Decoupled attention and Ghost convolution based Lightweight human pose estimation Network (DGLNet) was proposed. Specifically, in DGLNet, with Small High-Resolution Network (Small HRNet) model as basic architecture, by introducing a decoupled attention mechanism, DFDbottleneck module was constructed. The basic modules were redesigned with shuffleblock structure, in which computationally-intensive point convolutions were replaced with lightweight ghost convolutions, and the decoupled attention mechanism was utilized to enhance module performance, leading to the creation of DGBblock module. Additionally, the original transition layer modules were replaced with redesigned depthwise separable convolution modules that incorporated ghost convolution and decoupled attention, resulting in the construction of GSCtransition module. This modification further reduced computational complexity while enhancing feature interaction and performance. Experimental results on COCO validation set show that DGLNet outperforms the state-of-the-art Lite-High-Resolution Network (Lite-HRNet) model, achieving the maximum accuracy of 71.9% without increasing computational complexity or the number of parameters. Compared to common lightweight pose estimation networks such as MobileNetV2 and ShuffleNetV2, DGLNet achieves the precision improvement of 4.6 and 8.3 percentage points respectively, while only utilizing 21.2% and 25.0% of their computational resources. Furthermore, under the AP50 evaluation criterion, DGLNet surpasses the large High-Resolution Network (HRNet) while having significantly less computational and parameters.

    Action recognition algorithm based on attention mechanism and energy function
    Lifang WANG, Jingshuang WU, Pengliang YIN, Lihua HU
    2025, 45(1):  234-239.  DOI: 10.11772/j.issn.1001-9081.2024010004
    Asbtract ( )   HTML ( )   PDF (1695KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Addressing the insufficiency of structural guidance in the framework of Zero-Shot Action Recognition (ZSAR) algorithms, an Action Recognition Algorithm based on Attention mechanism and Energy function (ARAAE) was proposed guided by the Energy-Based Model (EBM) for framework design. Firstly, to obtain the input for EBM, a combination of optical flow and Convolutional 3D (C3D) architecture was designed to extract visual features, achieving spatial non-redundancy. Secondly, Vision Transformer (ViT) was utilized for visual feature extraction to reduce temporal redundancy, and ViT cooperated with combination of optical flow and C3D architecture was used to reduce spatial redundancy, resulting in a non-redundant visual space. Finally, to measure the correlation between visual space and semantic space, an energy score evaluation mechanism was realized with the design of a joint loss function for optimization experiments. Experimental results on HMDB51 and UCF101 datasets using six classical ZSAR algorithms and algorithms in recent literature show that on the HMDB51 dataset with average grouping, the average recognition accuracy of ARAAE is (22.1±1.8)%, which is better than those of CAGE (Coupling Adversarial Graph Embedding), Bi-dir GAN (Bi-directional Generative Adversarial Network) and ETSAN (Energy-based Temporal Summarized Attentive Network). On UCF101 dataset with average grouping, the average recognition accuracy of ARAAE is (22.4±1.6)%, which is better than those of all comparison algorithm slightly. On UCF101 with 81/20 dataset segmentation method, the average recognition accuracy of ARAAE is (40.2±2.6)%, which is higher than those of the comparison algorithms. It can be seen that ARAAE improves the recognition performance in ZSAR effectively.

    Weakly supervised video anomaly detection with local-global temporal dependency
    Pengcheng SONG, Lijun GUO, Rong ZHANG
    2025, 45(1):  240-246.  DOI: 10.11772/j.issn.1001-9081.2024010104
    Asbtract ( )   HTML ( )   PDF (2773KB) ( )   PDF(mobile) (1260KB) ( 5 )  
    Figures and Tables | References | Related Articles | Metrics

    Weakly Supervised Video Anomaly Detection (WS-VAD) is of great significance to the field of intelligent security. Currently WS-VAD tasks face the following problems: the existing methods focus more on the discrimination of the video snippets themselves, ignoring the local and global temporal dependency among the snippets; the temporal structure of anomalous events is ignored in loss function setting; a large amount of normal snippet noise exists in the anomalous video, which interferes with the training convergence. Therefore, a WS-VAD method based on Local-Global Temporal Dependency (LGTD) network was proposed. In this method, the LGTD network utilized a Multi-scale Temporal Feature Fusion (MTFF) module to capture the local temporal correlation of snippets within different time spans. At the same time, a Multi-Head Self-Attention (MHSA) module was employed to integrate the information of all snippets within the video and understand the temporal correlation of the whole video sequence. After that, a Squeeze-and-Excitation (SE) module was used to optimize the internal feature weights of the snippets, so as to capture the temporal and spatial features of the snippets more accurately, and significantly improve the detection performance. In addition, the existing loss function was improved by introducing complementary K-maxmin inner bag loss and Top-K outer bag loss to increase the probability of selecting anomaly snippets from the anomalous video for optimization training. Experimental results show that the proposed method has the average Area Under the Curve (AUC) on UCF-Crime and ShanghaiTech datasets reached 83.18% and 95.41% respectively, which are improved by 0.08 and 7.21 percentage points respectively compared with the Collaborative Normality Learning (CNL) method. It can be seen that the proposed method can effectively improve the detection performance.

    Non-iterative graph capsule network for remote sensing scene classification
    Shun YANG, Xiaoyong BIAN, Xi CHEN
    2025, 45(1):  247-252.  DOI: 10.11772/j.issn.1001-9081.2024010111
    Asbtract ( )   HTML ( )   PDF (1760KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Most of the current capsule network methods improve the classification accuracy by modifying iterative routing, while ignoring the burden brought by complex computation of iterative routing itself. Although there are some methods that use non-iterative routing to train the capsule network, the accuracies of these methods are not good. To address the above problem, a non-iterative routing graph capsule network method for remote sensing scene classification was proposed. Firstly, the preliminary features of the input image were extracted using a simple convolutional layer. Then, by performing dual attention between channels and capsules sequentially, a global attention module with dual fusion between channels and capsules was presented to generate global coefficients that weighed high-level capsule features. As a result, the weighted high-level capsule features became more discriminative to highlight the important capsules, thereby improving the classification performance. Meanwhile, an equivariant regularization term that could compute the similarity among the input images was introduced to model the explicit equivariance of the capsule network, thereby improving network performance potentially. Finally, the whole network was trained based on the loss function combining margin loss and equivariance loss to obtain a discriminative classification model. Experimental results on multiple benchmark scene datasets verified the effectiveness and efficiency of the proposed method. Experimental results show that the proposed method has the classification accuracy reached 90.38% on Canadian Institute For Advanced Research-10 image datasets (CIFAR-10), which is 15.74 percentage points higher than the Dynamic Routing Capsule network (DRCaps) method, and achieves classification accuracy of 98.21% and 86.96% on Affine extended National Institute of Standards and Technology dataset (AffNIST) and Aerial Image Dataset (AID), respectively. It can be seen that the proposed method can improve the performance of remote sensing scene classification effectively.

    Facial attribute estimation and expression recognition based on contextual channel attention mechanism
    Jie XU, Yong ZHONG, Yang WANG, Changfu ZHANG, Guanci YANG
    2025, 45(1):  253-260.  DOI: 10.11772/j.issn.1001-9081.2024010098
    Asbtract ( )   HTML ( )   PDF (2220KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Facial features contain a lot of information and hold significant value in facial attribute and expression analysis tasks, but the diversity and complexity of facial features make facial analysis tasks difficult. Aiming at the above issue, a model of Facial Attribute estimation and Expression Recognition based on contextual channel attention mechanism (FAER) was proposed from the perspective of fine-grained facial features. Firstly, a local feature encoding backbone network based on ConvNext was constructed, and by utilizing the effectiveness of the backbone network in encoding local features, the differences among facial local features were represented adequately. Secondly, a Contextual Channel Attention (CC Attention) mechanism was introduced. By adjusting the weight information on feature channels dynamically and adaptively, both global and local features of deep features were represented, so as to address the limitations of the backbone network ability in encoding global features. Finally, different classification strategies were designed. For Facial Attribute Estimation (FAE) and Facial Expression Recognition (FER) tasks, different combinations of loss functions were employed to encourage the model to learn more fine-grained facial features. Experimental results show that the proposed model achieves an average accuracy of 91.87% on facial attribute dataset CelebA (CelebFaces Attributes), surpassing the suboptimal model SwinFace (Swin transformer for Face) by 0.55 percentage points, and the proposed model achieves accuracies of 91.75% and 66.66% respectively on facial expression datasets RAF-DB and AffectNet, surpassing the suboptimal model TransFER (Transformers for Facial Expression Recognition) by 0.84 and 0.43 percentage points respectively.

    Interpretability study on deformable convolutional network and its application in butterfly species recognition models
    Lu WANG, Dong LIU, Weiguang LIU
    2025, 45(1):  261-274.  DOI: 10.11772/j.issn.1001-9081.2023121776
    Asbtract ( )   HTML ( )   PDF (8020KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In recent years, Deformable Convolutional Network (DCN) has been widely applied in fields such as image recognition and classification. However, research on the interpretability of this model is relatively limited, and its applicability lacks sufficient theoretical support. To address these issues, this paper proposed an interpretability study of DCN and its application in butterfly species recognition model. Firstly, deformable convolution was introduced to improve the VGG16, ResNet50, and DenseNet121 (Dense Convolutional Network121) classification models. Secondly, visualization methods such as deconvolution and Class Activation Mapping (CAM) were used to compare the feature extraction capabilities of deformable convolution and standard convolution. The results of ablation experiments show that deformable convolution performs better when used in the lower layers of the neural network and not continuously. Thirdly, the Saliency Removal (SR) method was proposed to uniformly evaluate the performance of CAM and the importance of activation features. By setting different removal thresholds and other perspectives, the objectivity of the evaluation is improved. Finally, based on the evaluation results, the FullGrad (Full Gradient-weighted) explanation model was used as the basis for the recognition judgment. Experimental results show that on the Archive_80 dataset, the accuracy of the proposed D_v2-DenseNet121 reaches 97.03%, which is 2.82 percentage points higher than that of DenseNet121 classification model. It can be seen that the introduction of deformable convolution endows the neural network model with the ability to extract invariant features and improves the accuracy of the classification model.

    Cross-modal dual-stream alternating interactive network for infrared-visible image classification
    Zongsheng ZHENG, Jia DU, Yuhe CHENG, Zecheng ZHAO, Yuewei ZHANG, Xulong WANG
    2025, 45(1):  275-283.  DOI: 10.11772/j.issn.1001-9081.2024010026
    Asbtract ( )   HTML ( )   PDF (2286KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    When multiple feature modalities are fused, there is a superposition of noise, and the cascaded structure used to reduce the differences between modalities does not fully utilize the feature information between modalities. To address these issues, a cross-modal Dual-stream Alternating Interactive Network (DAINet) method was proposed. Firstly, a Dual-stream Alternating Enhancement (DAE) module was constructed to fuse modal features in interactive dual-branch way. And by learning mapping relationships between modalities and employing bidirectional feedback adjustments of InFrared-VISible-InFrared (IR-VIS-IR) and VISible-InfRared-VISible (VIS-IR-VIS), the cross suppression of inter-modal noise was realized. Secondly, a Cross-Modal Feature Interaction (CMFI) module was constructed, and the residual structure was introduced to integrate low-level and high-level features within and between infrared-visible modalities, thereby minimizing differences and maximizing inter-modal feature utilization. Finally, on a self-constructed infrared-visible multi-modal typhoon dataset and a publicly available RGB-NIR multi-modal dataset, the effectiveness of DAE module and CMFI module was verified. Experimental results demonstrate that compared to the simple cascading fusion method on the self-constructed typhoon dataset, the proposed DAINet-based feature fusion method improves the overall classification accuracy by 6.61 and 3.93 percentage points for the infrared and visible modalities, respectively, with G-mean values increased by 6.24 and 2.48 percentage points, respectively. These results highlight the generalizability of the proposed method for class-imbalanced classification tasks. On the RGB-NIR dataset, the proposed method achieves the overall classification accuracy improvements of 13.47 and 13.90 percentage points, respectively, for the two test modalities. At the same time, experimental results of comparing with IFCNN (general Image Fusion framework based on Convolutional Neural Network) and DenseFuse methods demonstrate that the proposed method improves the overall classification accuracy by 9.82, 6.02, and 17.38, 1.68 percentage points for the two test modalities on the self-constructed typhoon dataset.

    Infrared small target detection method based on information compensation
    Boran YANG, Suzhen LIN, Dawei LI, Xiaofei LU, Chenhui CUI
    2025, 45(1):  284-291.  DOI: 10.11772/j.issn.1001-9081.2024010102
    Asbtract ( )   HTML ( )   PDF (2901KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    An infrared small target method based on information compensation was proposed to address the problem that infrared small targets are prone to losing texture detail information during network iteration, which decreased accuracy of target localization and contour segmentation. Firstly, Image Feature Extraction (IFE) module was used to encode shallow details and deep semantic features of infrared image. Secondly, a Multi-level Information Compensation (MIC) module was constructed to perform information compensation to down-sampled features in the encoding stage by aggregating features from adjacent levels. Thirdly, Global Target Response (GTR) module was introduced to compensate the limitation of convolutional locality by incorporating global contextual information of feature map. Finally, Asymmetric Cross-Fusion (ACF) module was constructed to fuse shallow and deep features, thereby preserving texture and positional information during target decoding, thus achieving detection of infrared small targets. Experimental results of training and testing on publicly available NUAA-SIRST (Nanjing University of Aeronautics and Astronautics-Single-frame InfraRed Small Target) and NUDT-SIRST (National University of Defense Technology-Single-frame InfraRed Small Target) mixed datasets show that compared to methods such as UIUNet (U-Net in U-Net Network), LSPM (Local Similarity Pyramid Modules), and DNANet (Dense Nested Attention Network), the proposed method achieves improvements of 9.2, 8.9, and 5.5 percentage points in Intersection over Union (IoU), respectively, and 6.0, 5.4, and 3.1 percentage points in F1-Score, respectively. The above demonstrates that the proposed method enables accurate detection and effective segmentation of small targets in complex infrared background images.

    Small target detection algorithm in remote sensing images integrating attention and contextual information
    Shang LIU, Yuwei ZHOU, Rao DAI, Linfang DONG, Meng LIU
    2025, 45(1):  292-300.  DOI: 10.11772/j.issn.1001-9081.2024010125
    Asbtract ( )   HTML ( )   PDF (3552KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    When detecting small targets in multi-scale remote sensing images, target detection algorithms based on deep learning are prone to false detection and missed detection. One of the reasons is that the feature extraction module carries out multiple down-sampling operations. The second reason is the failure to pay attention to the contextual information required by different categories and different scales of targets. To solve this problem, a small object detection algorithm in remote sensing images integrating attention and contextual information ACM-YOLO (Attention-Context-Multiscale YOLO) was proposed. Firstly, to reduce the loss of small target feature information, fine-grained query aware sparse attention was applied, thereby avoiding missed detection. Secondly, to pay more attention to the contextual information required by different categories of remote sensing targets, the Local Contextual Enhancement (LCE) function was designed, thereby avoiding false detection. Finally, to strengthen multi-scale feature fusion capability of the feature fusion module on small targets in remote sensing images, the weighted Bi-directional Feature Pyramid Network (BiFPN) was adopted, thereby improving detection effect of the algorithm. Comparison experiments and ablation experiments were performed on DOTA dataset and NWPU VHR-10 dataset to verify effectiveness and generalization of the proposed algorithm. Experimental results show that on the two datasets, the proposed algorithm has the mean Average Precision (mAP) reached 77.33% and 96.12% respectively, and the Recall increases by 10.00 and 7.50 percentage points, respectively, compared with YOLOv5 algorithm. It can be seen that the proposed algorithm improves mAP and recall effectively, which reduces false detection and missed detection.

    Super-resolution reconstruction for low-quality license plate information based on multi-dimensional spatial convolutional information enhancement
    Rui ZHANG, Yongke HUI, Yanjun ZHANG, Lihu PAN
    2025, 45(1):  301-307.  DOI: 10.11772/j.issn.1001-9081.2024010121
    Asbtract ( )   HTML ( )   PDF (3874KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Vehicle images collected by the existing traffic monitoring terminals often have low resolution in distant view, accompanied by uncertain pixel influencing factors such as strong noise, blur, overexposure, and underexposure, making it difficult to ensure accuracy of intelligent recognition of license plate information. In response to the above issue, Super-Resolution reconstruction for Low-quality License plate information based on multi-dimensional spatial convolutional information enhancement (LL-SR) network was proposed. Firstly, the correlation of feature points in space and channels mined by convolution were used to aggregate shallow feature. Secondly, correlation between feature maps was mined from different receptive fields and different dimensions, so as to recover high-frequency details of license plate information. Finally, the obtained features of different scales were fused and corrected at pixel level across channels to reduce propagation of useless features in context, thus achieving super-resolution reconstruction of low-quality license plate information. Experimental results on License plate of Taiyuan (LT) and License plates of the United States of America (LU) datasets show that the Peak Signal-to-Noise Ratio (PSNR) and Structural SIMilarity (SSIM) of the proposed network are 26.682 4 dB, 0.820 3 and 22.356 7 dB, 0.781 3 respectively, which are improved by 0.210 9 dB, 1.736 1 dB; 0.005 7, 0.033 0; and 0.472 8 dB, 1.419 2 dB; 0.019 6, 0.039 9 respectively compared to those of NGramSwin (N-Gram in Swin transformers) and CARN (CAscading Residual Network). Moreover, the license plate information reconstructed by the proposed network has better visual effects.

    Speaker verification system utilizing global-local feature dependency for anti-spoofing
    Jialin ZHANG, Qinghua REN, Qirong MAO
    2025, 45(1):  308-317.  DOI: 10.11772/j.issn.1001-9081.2023121877
    Asbtract ( )   HTML ( )   PDF (2942KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problem that the existing speaker verification systems for anti-spoofing, with convolutional model as main part, cannot capture global feature dependency well, an speaker verification system utilizing global-local feature dependency for anti-spoofing was proposed. Firstly, for the speech spoofing detection module, two filter combination ways were designed to filter the original speech, and sample augmentation was achieved by masking the frequency sub-bands. Secondly, a multi-dimensional global attention mechanism was proposed, where the global dependencies of each dimension were obtained by pooling the channel dimension, frequency dimension, and time dimension, respectively, and the global information was fused with the original features by weighting. Finally, for the speaker verification part, a Statistical Pyramid Dense Time Delay Neural Network (SPD-TDNN) was introduced to compute the standard deviation of the features and add the global information while obtaining the multi-scale time-frequency features. Experimental results show that on ASVspoof2019 dataset, the proposed speech spoofing detection system reduces the Equal Error Rate (EER) by 65.4% compared to Audio Anti-Spoofing using Integrated Spectro-Temporal graph attention network (AASIST) model, the proposed speaker verification system for anti-spoofing reduces the spoofing-aware speaker verification EER by 97.8% compared to the separate pyramid pooling speaker verification system. The above verifies that the proposed two modules achieve better classification results with the help of global feature dependency.

    End-to-end Chinese speech recognition method with byte-level byte pair encoding
    Qiang FU, Zhenping XU, Wenxing SHENG, Qing YE
    2025, 45(1):  318-324.  DOI: 10.11772/j.issn.1001-9081.2023121878
    Asbtract ( )   HTML ( )   PDF (1657KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the problems of large vocabulary size and low training efficiency in speech recognition for complex and large character sets such as Chinese, a method for end-to-end Chinese speech recognition based on Byte-Level Byte Pair Encoding (BBPE) was proposed. Firstly, 256 different bytes were used to initialize the vocabulary. Then, the frequency of each vocabulary unit appeared in the corpus was counted, and the units with the highest frequency were merged together. Finally, this process was repeated until no further merging was possible, thereby resulting in the final vocabulary. On Chinese speech dataset AISHELL-1, the vocabulary generated by this method reduces the number of words compared to the character-level vocabulary by 88.5%, thereby lowering the complexity of model training. Moreover, considering the outstanding performance of the Conformer-Transducer (Conformer-T) model in end-to-end speech recognition, the latest Zipformer model was combined with Transducer model to propose Zipformer-Transducer (Zipformer-T) model for better recognition performance. The BBPE method was validated on this model. Experimental results show that Zipformer-T model using BBPE method reduces the Character Error Rate (CER) by 0.12 and 0.08 percentage points on AISHELL-1 test set and validation set respectively, compared to the character-level tokenization method, with the lowest CERs of 4.26% and 3.98% respectively, which explains the effectiveness of the method in enhancing Chinese speech recognition performance convincingly.

    Frontier and comprehensive applications
    Cooperative visual positioning method of multiple unmanned surface vehicles in subterranean closed water body
    Wenbo CHE, Jianhua WANG, Xiang ZHENG, Gongxing WU, Shun ZHANG, Haozhu WANG
    2025, 45(1):  325-336.  DOI: 10.11772/j.issn.1001-9081.2023121827
    Asbtract ( )   HTML ( )   PDF (6112KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problems of lack of satellite positioning signal, limited communication and weak ambient light of Unmanned Surface Vehicle (USV) in subterranean closed water body, a cooperative visual positioning method of multiple USVs in subterranean closed water body was proposed. Firstly, a vehicle-borne light source cooperative marker was designed, and the marker structure was optimized according to the vehicle structure and application scene. Secondly, monocular vision was used to collect the marker images, and the image coordinates of the feature points were solved. Thirdly, on the basis of camera imaging model, by using the relationship between the spatial coordinates of feature points of the markers and the corresponding image coordinates, the relative positions between adjacent vehicles were calculated through improving direct linear transformation method. Fourthly, the cameras of the front and rear vehicles were used to make look face to face between the vehicles. Through the minimum variance algorithm, the relative positions calculated on the basis of the camera images of the front and rear vehicles were fused to improve the relative positioning accuracy. Finally, the absolute location of each USV was obtained by using the known absolute coordinates in the scene. The factors influencing positioning error were analyzed through simulation, and the proposed method was compared with the traditional direct linear transformation method. The results show that as the distance increases, the effect of this method becomes more obvious. At a distance of 15 m, the position variance solved by the proposed method is stable within 0.2 m2, verifying the accuracy of this method. Static experimental results show that the proposed method can stabilize the relative error within 10.0%; dynamic experimental results in underground river courses show that the absolute positioning navigation trajectory solved by the proposed method achieves accuracy similar to satellite positioning, which verifies the feasibility of this method.

    Traffic signal control algorithm based on overall state prediction and fair experience replay
    Zijun MIAO, Fei LUO, Weichao DING, Wenbo DONG
    2025, 45(1):  337-344.  DOI: 10.11772/j.issn.1001-9081.2024010066
    Asbtract ( )   HTML ( )   PDF (3199KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In order to cope with traffic congestion, efficient traffic signal control algorithms have been designed, which can improve the traffic efficiency of vehicles in the existing transportation network significantly. Although deep reinforcement learning algorithms have shown excellent performance in single intersection traffic signal control problems, their application in multi-intersection environments still faces major challenge — the non-stationarity problem caused by the spatiotemporal partial observability generated by Multi-Agent Reinforcement Learning (MARL) algorithm, resulting in that the deep reinforcement learning algorithms cannot guarantee stable convergence. To this end, a multi-intersection traffic signal control algorithm based on overall state prediction and fair experience replay — IS-DQN was proposed. For one thing, to avoid the problem of non-stationarity caused by spatial observability in algorithm, the state space of IS-DQN was expanded by predicting the overall state of multiple intersections based on historical traffic flow information from different lanes. For another, in order to cope with the time partial observability brought by traditional experience replay strategies, a reservoir sampling algorithm was adopted to ensure the fairness of experience replay pool, so as to avoid non-stationary problems in it. Experimental results on three different traffic pressure simulations in complex multi-intersection environments show that under different traffic pressure conditions, especially in low and medium traffic flow conditions, IS-DQN algorithm has lower average vehicle driving time, better convergence performance and convergence stability compared to independent deep reinforcement learning algorithms.

2025 Vol.45 No.4

Current Issue
Archive
Honorary Editor-in-Chief: ZHANG Jingzhong
Editor-in-Chief: XU Zongben
Associate Editor: SHEN Hengtao XIA Zhaohui
Domestic Post Distribution Code: 62-110
Foreign Distribution Code: M4616
Address:
No. 9, 4th Section of South Renmin Road, Chengdu 610041, China
Tel: 028-85224283-803
  028-85222239-803
Website: www.joca.cn
E-mail: bjb@joca.cn
WeChat
Join CCF