Loading...

Table of Content

    10 September 2024, Volume 44 Issue 9 Catalog Download
    Artificial intelligence
    Overview of research and application of knowledge graph in equipment fault diagnosis
    Jie WU, Ansi ZHANG, Maodong WU, Yizong ZHANG, Congbao WANG
    2024, 44(9):  2651-2659.  DOI: 10.11772/j.issn.1001-9081.2023091280
    Asbtract ( )   HTML ( )   PDF (2858KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Useful knowledge can be extracted from equipment fault diagnosis data for construction of a knowledge graph, which can effectively manage complex equipment fault diagnosis information in the form of triples (entity, relationship, entity). This enables the rapid diagnosis of equipment faults. Firstly, the related concepts of knowledge graph for equipment fault diagnosis were introduced, and the framework of knowledge graph for equipment fault diagnosis domain was analyzed. Secondly, the research status at home and abroad about several key technologies, such as knowledge extraction, knowledge fusion and knowledge reasoning for equipment fault diagnosis knowledge graph, was summarized. Finally, the applications of knowledge graph in equipment fault diagnosis were summarized, some shortcomings and challenges in the construction of knowledge graph in this field were proposed, and some new ideas were provided for the field of equipment fault diagnosis in the future.

    Time series causal inference method based on adaptive threshold learning
    Qinzhuang ZHAO, Hongye TAN
    2024, 44(9):  2660-2666.  DOI: 10.11772/j.issn.1001-9081.2023091278
    Asbtract ( )   HTML ( )   PDF (1142KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Time-series data exhibits recency characteristic, i.e., variable values are generally dependent on recent historical information. Existing time-series causal inference methods do not fully consider the recency characteristic, which use a uniform threshold when inferring causal relationships with different delays through hypothesis testing, so that it is difficult to effectively infer weaker causal relationships. To address the aforementioned issue, a method for time-series causal inference based on adaptive threshold learning was proposed. Firstly, data characteristics were extracted. Then, based on the data characteristics at different delays, a combination of thresholds used in the hypothesis testing process was automatically learned. Finally, this threshold combination was applied to the hypothesis testing processes of the PC (Peter-Clark) algorithm, PCMCI (Peter-Clark and Momentary Conditional Independence) algorithm, and VAR-LINGAM (Vector AutoRegressive LINear non-Gaussian Acyclic Model) algorithm to obtain more accurate causal relationship structures. Experimental results on the simulation dataset show that the F1 values of adaptive PC algorithm, adaptive PCMCI algorithm, and adaptive VAR-LINGAM algorithm using the proposed method are all improved.

    Unsupervised text sentiment transfer method based on generation prompt
    Yuxin HUANG, Jialong XU, Zhengtao YU, Shukai HOU, Jiaqi ZHOU
    2024, 44(9):  2667-2673.  DOI: 10.11772/j.issn.1001-9081.2023091302
    Asbtract ( )   HTML ( )   PDF (916KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Text sentiment transfer is to change text’s sentiment attribute while preserving its content. Due to the lack of parallel corpora, most of the existing unsupervised methods for text sentiment transfer construct latent representations of sentiment and content through text reconstruction and classification losses, and then realize sentiment transfer. However, this weakly supervised training strategy results in significant model performance degradation under prompt learning paradigms. To address this issue, an unsupervised text sentiment transfer method based on generation prompt was proposed. Firstly, textual content prompts were generated by using a prompt generator. Secondly, the target sentiment prompts were fused as the ultimate prompt. Finally, a two-stage training strategy was formulated to provide smooth training gradients for the model training, thereby solving the problem of model performance degradation. Experimental results on the public dataset for sentiment transfer — Yelp show that the proposed method significantly outperforms the generation based method UnpairedRL in text preservation, sentiment transfer score, and BLEU (BiLingual Evaluation Understudy), and the improvements are 39.1%, 62.3%, and 14.5%, respectively.

    Multi-domain fake news detection model enhanced by APK-CNN and Transformer
    Jinjin LI, Guoming SANG, Yijia ZHANG
    2024, 44(9):  2674-2682.  DOI: 10.11772/j.issn.1001-9081.2023091359
    Asbtract ( )   HTML ( )   PDF (1378KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In order to solve the problems of domain shifting and incomplete domain labeling in social media news, as well as to explore more efficient multi-domain news feature extraction and fusion networks, a multi-domain fake news detection model based on enhancement by APK-CNN (Adaptive Pooling Kernel Convolutional Neural Network) and Transformer was proposed, namely Transm3. Firstly, a three-channel network was designed for feature extraction and representation of semantic, emotional, and stylistic information of the text and view combination of these features using a multi-granularity cross-domain interactor. Secondly, the news domain labels were refined by optimized soft-shared memory networking and domain adapters. Then, Transformer was combined with a multi-granularity cross-domain interactor to dynamically and weighty aggregate the interaction features of different domains. Finally, the fused features were fed into the classifier for true/false news discrimination. Experimental results show that compared with M3FEND (Memory-guided Multi-view Multi-domain FakE News Detection) and EANN (Event Adversarial Neural Networks for multi-modal fake news detection), Transm3 improves the comprehensive F1 value by 3.68% and 6.46% on Chinese dataset, and 6.75% and 11.93% on English dataset; and the F1 values on sub-domains are also significantly improved. The effectiveness of Transm3 for multi-domain fake news detection is fully validated.

    Chinese story ending generation model based on bidirectional contrastive training
    Qi SHUAI, Hairui WANG, Guifu ZHU
    2024, 44(9):  2683-2688.  DOI: 10.11772/j.issn.1001-9081.2023091244
    Asbtract ( )   HTML ( )   PDF (909KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Chinese Story Ending Generation (SEG) is one of the downstream tasks in Natural Language Processing (NLP). CLSEG (Contrastive Learning of Story Ending Generation) based on completely wrong endings performs well in terms of story consistency. However, due to the fact that the wrong ending also contains the same content as the original ending text, using only the wrong ending for contrastive training may results in the main part of the generated text with the correct ending being stripped off. Therefore, forward ending enhancement training was added on the basis of CLSEG to preserve the correct parts lost in contrastive training. At the same time, by introducing forward endings, the generated endings have stronger diversity and relevance. The proposed Chinese story ending generation model based on bidirectional contrastive training consisted of two main parts: 1) multi-ending sampling, by which positively enhanced endings and reverse contrasted erroneous endings were obtained by different model methods; 2) contrastive training, by which the loss function was modified during the training process to make the generated ending close to the positive ending and away from the wrong ending. Experimental results on the publicly available story dataset OutGen show that compared to models such as GPT2.ft and Della (Deeply fused layer-wise latent variable), the proposed model achieves better results in BERTScore, METEOR, and other indicators, generating more diverse and relevant endings.

    Text-to-SQL model based on semantic enhanced schema linking
    Xianglan WU, Yang XIAO, Mengying LIU, Mingming LIU
    2024, 44(9):  2689-2695.  DOI: 10.11772/j.issn.1001-9081.2023091360
    Asbtract ( )   HTML ( )   PDF (739KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To optimize Text-to-SQL generation performance based on heterogeneous graph encoder, SELSQL model was proposed. Firstly, an end-to-end learning framework was employed by the model, and the Poincaré distance metric in hyperbolic space was used instead of the Euclidean distance metric to optimize semantically enhanced schema linking graph constructed by the pre-trained language model using probe technology. Secondly, K-head weighted cosine similarity and graph regularization method were used to learn the similarity metric graph so that the initial schema linking graph was iteratively optimized during training. Finally, the improved Relational Graph ATtention network (RGAT) graph encoder and multi-head attention mechanism were used to encode the joint semantic schema linking graphs of the two modules, and Structured Query Language (SQL) statement decoding was solved using a grammar-based neural semantic decoder and a predefined structured language. Experimental results on Spider dataset show that when using ELECTRA-large pre-training model, the accuracy of SELSQL model is increased by 2.5 percentage points compared with the best baseline model, which has a great improvement effect on the generation of complex SQL statements.

    Data science and technology
    Incomplete multi-view clustering algorithm based on self-attention fusion
    Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO
    2024, 44(9):  2696-2703.  DOI: 10.11772/j.issn.1001-9081.2023091253
    Asbtract ( )   HTML ( )   PDF (2806KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Multi-view clustering task based on incomplete data has become one of the research hotspots in the field of unsupervised learning. However, most multi-view clustering algorithms based on “shallow” models often find it difficult to extract and characterize potential feature structures within views when dealing with large-scale high-dimensional data. At the same time, the stacking or averaging methods of multi-view information fusion ignore the differences between views and does not fully consider the different contributions of each view to building a common consensus representation. To address the above issues, an Incomplete Multi-View Clustering algorithm based on Self-Attention Fusion (IMVCSAF) was proposed. Firstly, the potential features of each view were extracted on the basis of a deep autoencoder, and the consistency information among views was maximized by using contrastive learning. Secondly, a self-attention mechanism was adopted to recode and fuse the potential representations of each view, and the inherent causality as well as feature complementarity between different views was considered and mined comprehensively. Thirdly, based on the common consensus representation, the potential representation of missing instance was predicted and recovered, thereby fully implementing the process of multi-view clustering. Experimental results on Scene-15, LandUse-21, Caltech101-20 and Noisy-MNIST datasets show that, the accuracy of IMVCSAF is higher than those of other comparison algorithms while meeting the convergence requirements. On Noisy-MNIST dataset with 50% miss rate, the accuracy of IMVCSAF is 6.58 percentage points higher than that of the second best algorithm — COMPETER (inCOMPlete muLti-view clustEring via conTrastivE pRediction).

    Recommendation model combining self-features and contrastive learning
    Xingyao YANG, Yu CHEN, Jiong YU, Zulian ZHANG, Jiaying CHEN, Dongxiao WANG
    2024, 44(9):  2704-2710.  DOI: 10.11772/j.issn.1001-9081.2023091264
    Asbtract ( )   HTML ( )   PDF (1737KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the over-smoothing and noise problems in the embedding representation in the message passing process of graph convolution based on graph neural network recommendation, a Recommendation model combining Self-features and Contrastive Learning (SfCLRec) was proposed. The model was trained using a pre-training-formal training architecture. Firstly, the embedding representations of users and items were pre-trained to maintain the feature uniqueness of the nodes themselves by fusing the node self-features and a hierarchical contrastive learning task was introduced to mitigate the noisy information from the higher-order neighboring nodes. Then, the collaborative graph adjacency matrix was reconstructed according to the scoring mechanism in the formal training stage. Finally, the predicted score was obtained based on the final embedding. Compared with existing graph neural network recommendation models such as LightGCN and Simple Graph Contrastive Learning (SimGCL), SfCLRec achieves the better recall and NDCG (Normalized Discounted Cumulative Gain) in three public datasets ML-latest-small, Last.FM and Yelp, validating the effectiveness of SfCLRec.

    Session-based recommendation with graph auxiliary learning
    Tingjie TANG, Jiajin HUANG, Jin QIN
    2024, 44(9):  2711-2718.  DOI: 10.11772/j.issn.1001-9081.2023091257
    Asbtract ( )   HTML ( )   PDF (1786KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problems that the existing self-supervised contrastive tasks fail to make full use of the rich semantic information in the original data and lack universality, a Session-based Recommendation with Graph Auxiliary Learning (SR-GAL) model was proposed. Firstly, an encoding channel with Representation Consistency (RC) was introduced on the basis of Graph Neural Network (GNN) to mine more valuable self-supervised signals from the original data. Secondly, in order to make full use of these self-supervised signals, two auxiliary tasks, predictive one and constraint one, that were closely related to the target task were designed. Finally, a simple and GNN model-unrelated auxiliary learning framework was developed to unify the two auxiliary tasks with the recommendation task in order to improve the recommendation performance of the GNN model. Compared with the suboptimal comparison model CGSNet (Contrastive Graph Self-attention Network), on Diginetica dataset, the proposed model has the Precision P@20 and Mean Reciprocal Rank MRR@20 increased by 0.58% and 1.61%; on Tmall dataset, the proposed model has the P@20 and MRR@20 increased by 12.65% and 8.41% respectively, verifying the effectiveness of the model. Experimental results on multiple real datasets show that SR-GAL model outperforms advanced models and has good extensibility as well as universality.

    Multi-layer information interactive fusion algorithm based on graph neural network for session-based recommendation
    Hang YANG, Wanggen LI, Gensheng ZHANG, Zhige WANG, Xin KAI
    2024, 44(9):  2719-2725.  DOI: 10.11772/j.issn.1001-9081.2023091255
    Asbtract ( )   HTML ( )   PDF (1517KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Addressing the insufficient exploration of item-transition information within the current session and the limited utilization of other session details in session-based recommendation nowadays, a multi-layer information interactive fusion algorithm based on graph neural network was proposed for session-based recommendation. Based on the current session, firstly, the information of neighborhood nodes was aggregated by designing different weights for the connection relationships between nodes, and the explicit information of item-transition in the current session was mined. Secondly, the neighborhood node information was aggregated by stacked residual graph attention network, and the implicit item-transition information in the current session was mined. Finally, the sequence-dependent information in the time stamp-based session was mined through a single gated graph neural network. Based on other sessions, the entire set of sessions was linked through the first-order neighbors of nodes, and the global information encoding was learnt, and then, the embedding representations of four levels were integrated to obtain more comprehensive item-transition information. At the same time, soft attention mechanism and reverse position embedding information were used to fuse the obtained item-transition information more effectively. Experimental results show that the precision P@20 and mean reciprocal rank MRR@20 of the proposed algorithm are increased by 0.79% and 0.84% respectively compared with the suboptimal model GCE-GNN (Global Context Enhanced Graph Neural Network) on Diginetica dataset, the P@20 and MRR@20 of the proposed algorithm are increased by 8.23% and 7.86% respectively compared with the suboptimal model HyperS2Rec on Tmall dataset, and the P@20 and MRR@20 of the proposed algorithm are increased by 1.33% and 7.16% respectively compared with the suboptimal model HyperS2Rec on Nowplaying dataset.

    Constructing pre-trained dynamic graph neural network to predict disappearance of academic cooperation behavior
    Yu DU, Yan ZHU
    2024, 44(9):  2726-2731.  DOI: 10.11772/j.issn.1001-9081.2023091325
    Asbtract ( )   HTML ( )   PDF (1708KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Some of the existing research on link disappearance only focuses on discovering and analyzing the reasons for link disappearance, while some research only uses static network representations for prediction, rarely analyzing link disappearance problem from the perspective of network dynamic evolution. In response to the above research status, a pre-trained dynamic graph neural network based academic cooperative behavior disappearance prediction model PreDGN (Pre-trained Dynamic Graph neural Network) was proposed. In PreDGN, firstly, the temporal information of the dynamic network was captured by using the dynamic graph to generate pre-training tasks, and the topological information of the network was supplemented with edge features constructed by temporal motifs. Then, combined with attention node embedding based on time encoding, node representations were learnt more accurately. The historical information of the dynamic graph was learnt by the pre-trained model and the model was able to be fine-tuned in specific tasks for predicting the disappearance of academic cooperation behaviors. Experiments were conducted on data from the publicly available academic cooperation dataset HepTh with different time spans and data scales. On the 1996, 1997, 94—96, and 97—99 subsets, compared to the second best method: dynamic graph neural network DyRep, the proposed model has the Area Under the ROC Curve (AUC) increased by 10.47, 8.16, 13.41, and 3.27 percentage points, respectively, and the Average Precision (AP) improved by 5.87, 2.15, 8.26, and 3.01 percentage points, respectively.

    Multivariate time series prediction model based on decoupled attention mechanism
    Liting LI, Bei HUA, Ruozhou HE, Kuang XU
    2024, 44(9):  2732-2738.  DOI: 10.11772/j.issn.1001-9081.2023091301
    Asbtract ( )   HTML ( )   PDF (1545KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problem that it is difficult to fully utilize the sequence contextual semantic information and the implicit correlation information among variables in multivariate time-series prediction, a model based on decoupled attention mechanism — Decformer was proposed for multivariate time-series prediction. Firstly, a novel decoupled attention mechanism was proposed to fully utilize the embedded semantic information, thereby improving the accuracy of attention weight allocation. Secondly, a pattern correlation mining method without relying on explicit variable relationships was proposed to mine and utilize implicit pattern correlation information among variables. On three different types of real datasets (TTV, ECL and PeMS-Bay), including traffic volume of call, electricity consumption and traffic, Decformer achieves the highest prediction accuracy over all prediction time lengths compared with excellent open-source multivariate time-series prediction models such as Long- and Short-term Time-series Network (LSTNet), Transformer and FEDformer. Compared with LSTNet, Decformer has the Mean Absolute Error (MAE) reduced by 17.73%-27.32%, 10.89%-17.01%, and 13.03%-19.64% on TTV, ECL and PeMS-Bay datasets, respectively, and the Mean Squared Error (MSE) reduced by 23.53%-58.96%, 16.36%-23.56% and 15.91%-26.30% on TTV, ECL and PeMS-Bay datasets, respectively. Experimental results indicate that Decformer can enhance the accuracy of multivariate time series prediction significantly.

    Multivariate long-term series forecasting method with DFT-based frequency-sensitive dual-branch Transformer
    Liehong REN, Lyuwen HUANG, Xu TIAN, Fei DUAN
    2024, 44(9):  2739-2746.  DOI: 10.11772/j.issn.1001-9081.2023091320
    Asbtract ( )   HTML ( )   PDF (3137KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In multivariate long-term time series forecasting, only relying on time domain analysis often falls to capture long time-series dependencies, leading to insufficient information utilization and not high enough prediction accuracy. To solve these problems, combined with time and frequency domain analyses, a Frequency-Sensitive Dual-branch Transformer with Discrete Fourier Transform (DFT) for multivariate long-term series forecasting (FSDformer) method was proposed. Firstly, by utilizing DFT, the transformation between time and frequency was accomplished, allowing the decomposition of complex time-series data into three structurally simple components: low-frequency trend item, medium-frequency seasonal item, and high-frequency residual item. Then, a dual-branch structure was adopted: one branch dedicated to predict medium- and high-frequency components, with an Encoder-Decoder structure applied to design a periodic enhancement attention mechanism, and another dedicated forecast to low-frequency trend components, with a MultiLayer Perceptron (MLP) structure. Finally, the prediction results from both branches were aggregated to obtain the final multivariate long-term time series forecasting results. FSDformer was compared with five classical algorithms on two datasets. On the Electricity dataset, when the historical sequence length is 96 and the predicted sequence length is 336, compared to the comparison algorithms such as Autoformer, FSDformer decreases the Mean Absolute Error (MAE) by 11.5%-29.1%, and decreases the Mean Square Error (MSE) by 20.9%-43.7%, reaching the optimal prediction accuracy. Experimental results show that, FSDformer can capture the dependencies within long-term time series data efficiently, and can improve the prediction stability of model while enhancing prediction accuracy and computational efficiency.

    Intermittent demand forecasting method based on adaptive matching of demand patterns
    Lilin FAN, Fukang CAO, Wanting WANG, Kai YANG, Zhaoyu SONG
    2024, 44(9):  2747-2755.  DOI: 10.11772/j.issn.1001-9081.2023091372
    Asbtract ( )   HTML ( )   PDF (1955KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The demand for after-sales parts in large manufacturing enterprises is characterized by sparse distribution and high volatility, with high uncertainty in both demand frequency and demand quantity, and the demand sequences present typical intermittent characteristic. However, in actual operation and maintenance, the demand for parts fluctuates greatly in terms of frequency and quantity, resulting in various demand patterns. The existing intermittent demand prediction mainly uses single model or static combination of fixed prediction models, which is difficult to fully explore the evolution laws of demand sequences under different demand patterns, and the prediction accuracy and stability are hard to guarantee. To solve the above problems, an intermittent demand forecasting method based on adaptive matching of demand patterns was proposed, in which demand patterns were adaptively matched, and the prediction effect of intermittent sequences was improved by dynamically identifying and matching demand patterns. The method included two stages. In the model training stage, firstly, according to the intermittent characteristics of the historical demand data of parts, it was divided into demand sequences and interval sequences, and the two types of sequences were clustered separately to capture the different demand and interval patterns corresponding to each type of sequence. Secondly, a prediction model library containing statistical analysis models, shallow machine learning models, and deep learning models was established, and the prediction effects of different models on each demand pattern were tested to identify and mark the optimal prediction model for each type of demand pattern. In the prediction stage, the sequence to be predicted was divided into demand sequences and interval sequences, the demand pattern was identified and matched with the optimal prediction model, and the predicted values of demand and interval were combined to form the final prediction result. The experimental validation was carried out on the intermittent parts demand datasets of the American Automobile Company and the Royal Air Force, and the results showed that the proposed method could be applied to the historical data of parts with different demand patterns, and effectively improved the prediction accuracy by adaptively matching the demand pattern and the optimal prediction model.

    Cyber security
    Low-cost adversarial example defense algorithm based on example preprocessing
    Xiao CHEN, Yan CHANG, Danchen WANG, Shibin ZHANG
    2024, 44(9):  2756-2762.  DOI: 10.11772/j.issn.1001-9081.2023091249
    Asbtract ( )   HTML ( )   PDF (1915KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In order to defend against existing attacks on artificial intelligence algorithms (especially artificial neural networks) as much as possible, and reduce the additional costs, the rattan algorithm based on example preprocessing was proposed. By cutting the unimportant information part of the image, normalizing the neighboring pixel values and scaling image, the examples were preprocessed to destroy the adversarial disturbance and generate new examples with less threat to the model, ensuring high accuracy of model recognition. Experimental results show that the rattan algorithm can defend against some adversarial attacks against MNIST, CIFAR10 datasets and neural network models such as squeezenet1_1, mnasnet1_3 and mobilenet_v3_large with less overhead than similar algorithms, and the minimum example accuracy after defense can reach 88.50%; meanwhile, it does not reduce the example accuracy too much while processing clean examples, and the defense effect and defense cost are better than those of the comparison algorithms such as Fast Gradient Sign Method (FGSM) and Momentum Iterative Method (MIM).

    Hybrid internet of vehicles intrusion detection system for zero-day attacks
    Jiepo FANG, Chongben TAO
    2024, 44(9):  2763-2769.  DOI: 10.11772/j.issn.1001-9081.2023091328
    Asbtract ( )   HTML ( )   PDF (2618KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Existing machine learning methods suffer from over-reliance on sample data and insensitivity to anomalous data when confronted with zero-day attack detection, thus making it difficult for Intrusion Detection System (IDS) to effectively defend against zero-day attacks. Therefore, a hybrid internet of vehicles intrusion detection system based on Transformer and ANFIS (Adaptive-Network-based Fuzzy Inference System) was proposed. Firstly, a data enhancement algorithm was designed and the problem of unbalanced data samples was solved by denoising first and then generating. Secondly, a feature engineering module was designed by introducing non-linear feature interactions into complex feature combinations. Finally, the self-attention mechanism of Transformer and the adaptive learning method of ANFIS were combined, which enhanced the ability of feature representation and reduced the dependence on sample data. The proposed system was compared with other SOTA (State-Of-The-Art) algorithms such as Dual-IDS on CICIDS-2017 and UNSW-NB15 intrusion datasets. Experimental results show that for zero-day attacks, the proposed system achieves 98.64% detection accuracy and 98.31% F1 value on CICIDS-2017 intrusion dataset, and 93.07% detection accuracy and 92.43% F1 value on UNSW-NB15 intrusion dataset, which validates high accuracy and strong generalization ability of the proposed algorithm for zero-day attack detection.

    Random validation blockchain construction for federated learning
    Tingwei CHEN, Jiacheng ZHANG, Junlu WANG
    2024, 44(9):  2770-2776.  DOI: 10.11772/j.issn.1001-9081.2023091254
    Asbtract ( )   HTML ( )   PDF (1975KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    A random verification blockchain construction and privacy protection method for federated learning was proposed to address the issues such as local device model gradient leakage, the ability of centralized server devices to exit at will, and the inability of global models to resist malicious user attacks in existing federated learning models. Firstly, blockchain leadership nodes were elected randomly by introducing verifiable hash functions, thereby ensuring the fairness of voting a node to create block. Secondly, a verification node cross detection mechanism was designed to defend against malicious node attacks. Finally, based on differential privacy technology, blockchain nodes were trained, and incentive rules were constructed on the basis of the contribution of nodes to the model to enhance the training accuracy of the federated learning model. Experimental results show that the proposed method achieves 80% accuracy for malicious node poisoning attacks with 20% malicious nodes, which is 61 percentage points higher than that of Google FL, and the gradient matching loss of the proposed method is 14 percentage points higher than that of Google FL when the noise variance is 10-3. It can be seen that compared to the federated learning methods such as Google FL, the proposed method can ensure good accuracy while improving the security of the model, and has better security and robustness.

    Federated spatial data publication method with differential privacy and secure aggregation
    Zhizheng ZHANG, Xiaojian ZHANG, Junqing WANG, Guanghui FENG
    2024, 44(9):  2777-2784.  DOI: 10.11772/j.issn.1001-9081.2023091296
    Asbtract ( )   HTML ( )   PDF (2235KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problems of federated spatial data isolation, spatial data indexing, and privacy of publishing spatial data, a Federated Spatial data Publishing (FSP) method based on dynamic quad-tree was proposed. Firstly, in each iteration of the FSP method, quad-tree replica was shared by the server with each client in the round, and each client encoded its own location data using the quad-tree replica, and discrete noise was generated through Polya distribution for locally perturbing the encoding results. Secondly, local masks were generated through LWE (Learning With Error) to encrypt the noisy results. Thirdly, the reported values from each client in the iteration were combined by the aggregator to perform secure aggregation and mask elimination. Then the aggregated results were sent to the server. The quad-tree structure was pruned by the server dynamically in a bottom-up way based on the collected encoding vectors and noise variance. Experimental results on four spatial datasets Beijing, Checkin, NYC, and Landmark show that the FSP method not only ensures client privacy, but also reduces the Mean Squared Error (MSE) in federated spatial data publication by 3.80%, 2.96%, 7.51% and 14.13% at a privacy budget of 1.8, respectively, compared to the existing better federated spatial data publication method AHH (Adaptive Hierarchical Histograms). This indicates that the FSP method achieves higher precision than similar methods in federated spatial data publishing.

    Differential property evaluation method based on GPU for large-state cryptographic S-boxes
    Runlian ZHANG, Mi ZHANG, Xiaonian WU, Rui SHU
    2024, 44(9):  2785-2790.  DOI: 10.11772/j.issn.1001-9081.2023091268
    Asbtract ( )   HTML ( )   PDF (1245KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Large-state cryptographic S-boxes can provide better obfuscation for symmetric encryption algorithms, but the costs for evaluating their properties are very expensive. To efficiently evaluate the differential properties of large-state cryptographic S-boxes, a GPU-based method for evaluating the differential properties of large-state cryptographic S-boxes was proposed. According to the existing differential uniformity calculation method, the GPU parallel schemes were designed for evaluating differential uniformity of 16-bit S-boxes and differential properties of 32-bit S-boxes respectively. The execution efficiencies of kernel functions and GPU were improved by the schemes, and the time costs were reduced by optimizing GPU parallel granularity and load balancing. The test results show that, compared with CPU methods and GPU parallel methods, the time costs of the proposed schemes for evaluating the differential properties of large-state cryptographic S-boxes are greatly reduced. The time for computing the differential uniformity of 16-bit S-box is 0.3 min; for a single input differential of 32-bit S-box, the time for computing the maximum output differential probability is about 5 min, and the time for evaluating the differential properties is about 2.6 h.

    Multi-key page-level encryption system for SQLite
    Xudong LI, Yukang FENG, Junsheng CHEN
    2024, 44(9):  2791-2801.  DOI: 10.11772/j.issn.1001-9081.2023091362
    Asbtract ( )   HTML ( )   PDF (3204KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    At present, researches on SQLite encryption both domestically and internationally are conducted at the file level and with the single-key, resulting in coarse encryption granularity and low decryption difficulty. In response to the security shortcomings of SQLite, a multi-key page-level encryption system was proposed. Firstly, an independent page key was assigned to each physical page, allowing for individual encryption and decryption of each page. A key file was introduced to store all page keys. Secondly, a page key cache module KeyCache was designed to generate and cache page keys for pages, thereby reducing the performance loss caused by frequent I/O read and write operations. Thirdly, an encryption and decryption module Crypto was proposed to implement the encryption and decryption functions. Crypto was used to quickly retrieve page keys through KeyCache, consequently enhancing the overall system performance. A comparative experiment was conducted between the proposed system and typical SQLCipher. Experimental results show that in read and update tests, compared with SQLCipher, the execution time of the proposed system reduced by 1.5% and 3.0% on average, achieving better performance at a higher security level. Additionally, in create and delete tests, the proposed system exhibits minimal performance loss compared to SQLCipher and the performance loss is close to SQLCipher while significantly enhancing the security level, verifying the effectiveness of the proposed system.

    Advanced computing
    Optimization of tensor virtual machine operator fusion based on graph rewriting and fusion exploration
    Na WANG, Lin JIANG, Yuancheng LI, Yun ZHU
    2024, 44(9):  2802-2809.  DOI: 10.11772/j.issn.1001-9081.2023091252
    Asbtract ( )   HTML ( )   PDF (2329KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In the process of computation-intensive neural networks using the Tensor Virtual Machine (TVM) operator fusion, there are problems such as excessive access counts and low memory resource utilization dure to layer-by-layer exploration of computational graphs. Therefore, an optimization method for TVM operator fusion based on graph rewriting and fusion exploration was proposed. Firstly, an analysis was conducted on the mapping types of operators. Secondly, the computational graph was rewritten based on operation laws to simplify its structure, thereby reducing the generation of intermediate results, and then lowering memory resource consumption and enhancing fusion efficiency. Thirdly, a fusion exploration algorithm was employed to identify operators with lower fusion costs for prioritized fusion, thereby avoiding data redundancy and register spilling. Finally, neural network operator fusion was implemented on the CPU, and the fusion acceleration performance was tested. Experimental results indicate that the proposed method can reduce the numbers of computational graph layers and operators effectively, and decrease memory access frequency and data to be transferred. Compared to the TVM operator fusion method, the proposed method has an average reduction of 18% in computational graph layers and the inference speed is increased by an average of 23% during the fusion process, confirming the effectiveness of the method in optimizing computational graph fusion process.

    Improved KLEIN algorithm and its quantum analysis
    Yanjun LI, Yaodong GE, Qi WANG, Weiguo ZHANG, Chen LIU
    2024, 44(9):  2810-2817.  DOI: 10.11772/j.issn.1001-9081.2023091333
    Asbtract ( )   HTML ( )   PDF (1882KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    KLEIN has experienced attacks such as truncated difference cryptanalysis and integral cryptanalysis since it was proposed. Its encryption structure has actual security, but the vulnerability of the key expansion algorithm leads to full-round key recovery attacks. Firstly, the key expansion algorithm was modified and an improved algorithm N-KLEIN was proposed. Secondly, an efficient quantum circuit was implemented on the S-box using the in-place method, which reduced the width and depth of the circuit and improved the implementation efficiency of the quantum circuit. Thirdly, the quantization of obfuscation operations was achieved using LUP decomposition technology. Then, an efficient quantum circuit was designed for N-KLEIN, and an efficient quantum circuit for all round N-KLEIN was proposed. Finally, the resource occupation for the quantum implementation of full-round N-KLEIN was evaluated and compared with the resources occupied by existing quantum implementations of lightweight block ciphers such as PRESENT and HIGHT. At the same time, an in-depth study was conducted on the cost of key search attacks based on Grover algorithm, and the cost of N-KLEIN-{64,80,96} using Grover algorithm to search for keys under the Clifford+T model was given, and then the quantum security of N-KLEIN was evaluated. Comparative results indicate that the quantum implementation cost of N-KLEIN algorithm is significantly lower.

    Reptile search algorithm based on multi-hunting coordination strategy
    Shanglong LI, Jianhua LIU, Heming JIA
    2024, 44(9):  2818-2828.  DOI: 10.11772/j.issn.1001-9081.2023091304
    Asbtract ( )   HTML ( )   PDF (1883KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Reptile Search Algorithm (RSA) has strong global exploration ability, but its exploitation ability is relatively weak and it cannot converge well in the late stage of the iteration. To address the above issues, combined with the Teaching-Learning-Based Optimization (TLBO) algorithm, the Beetle Antennae Search (BAS) algorithm based on quadratic interpolation and the lens opposite-based learning strategy, Reptile Search Algorithm based on Multi-Hunting Coordination Strategy (MHCS-RSA) was proposed. In MHCS-RSA, the position update formula of the hunting cooperation in the encircling phase (global exploration) and hunting phase (local exploitation) of RSA was retained. And in the hunting coordination of the hunting phase, the learning phase of TLBO algorithm and the BAS based on quadratic interpolation were integrated to perform position update in order to improve the exploitation ability and convergence ability of the algorithm. In addition, the lens opposite-based learning strategy was introduced to enhance the algorithm ability of jumping out of the local optimum. Experimental results on CEC 2020 test functions show that MHCS-RSA has good optimization, convergence abilities and robustness. By solving the tension/compression spring design problem and the speed reducer design problem, the validity of MHCS-RSA is further verified in solving practical problems.

    Flower pollination algorithm based on neural network optimization
    Guanglei YAO, Juxia XIONG, Guowu YANG
    2024, 44(9):  2829-2837.  DOI: 10.11772/j.issn.1001-9081.2023081143
    Asbtract ( )   HTML ( )   PDF (1631KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In order to reduce repeated exploration and improve population diversity and spatial search ability of Flower Pollination Algorithm (FPA), a Flower Pollination Algorithm based on Neural Network optimization (NNFPA) was proposed. In the algorithm, an adaptive control factor was used to switch the global search and local search dynamically. The global search strategy of multi-party information was employed to speed up the convergence and maintain the diversity of pollen population, as well as reduce the dependence of population on social attributes in later iterations of the algorithm. The local search strategy based on neural networks was used to enable the algorithm to have memory function, so that the algorithm was able to have a stable search strategy, thereby reducing the uncertainty of the algorithm and allowing it to explore the solution space more fully. Nine common test functions and some functions selected from CEC2014 test set were chosen for simulation. The results show that compared with the standard FPA and the variant algorithm Flower Pollination Algorithm based on Hybrid Strategy (HSFPA), NNFPA achieves higher search accuracy and convergence speed on the chosen test functions. It can be seen that NNFPA has better optimization ability.

    Automatic design of optical systems based on correctable reinforced search genetic algorithm
    Dong LIU, Chenhang LI, Changmao WU, Faxin RU, Yuanyuan XIA
    2024, 44(9):  2838-2847.  DOI: 10.11772/j.issn.1001-9081.2023081156
    Asbtract ( )   HTML ( )   PDF (7775KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Both the Damped Least Squares (DLS) and Genetic Algorithm (GA) are applicable to automatic design of optical systems. Although DLS has a high search efficiency, it is susceptible to falling into local optima traps. Conversely, GA has strong global search capability in the parameter space of optical structures but weak local search capability. To address these challenges, a Correctable Reinforced Search GA (CRSGA) was proposed. Firstly, DLS was introduced after the GA crossover operation to enhance local search capability. Additionally, a correction strategy was introduced to rollback individuals with deteriorated fitness values before the next iteration, thereby achieving corrective evolutionary results. The improvement of two aspects to genetic algorithm enhanced strengths and compensated for weaknesses. Three typical optical system design experiments, including Double Gaussian (DG), Reversed Telephoto (RT), and Finite Conjugate Distance Imaging (FCDI), were conducted to validate the effectiveness of CRSGA. CRSGA outperforms both DLS and GA, and its optimization outcomes are about 8.92%, 12.19%, and 9.39% respectively better than those of commercial optical design software Zemax DLS. In particularly, the optimization outcomes achieve a significant improvement, reaching 99.98%, 94.33%, and 88.45% respectively compared to the Zemax HAMMER algorithm. In conclusion, it is shown that the proposed algorithm is effective for optical system optimization and can be used for automatic optical system design.

    Network and communications
    Interference analysis and performance research of LoRa signals
    Min HUA, Jia’nan WEI, Wei ZHAO, Shuo MENG
    2024, 44(9):  2848-2854.  DOI: 10.11772/j.issn.1001-9081.2023091233
    Asbtract ( )   HTML ( )   PDF (1962KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    LoRa (Long Range radio) system is in a relatively leading position in the currently developing LPWAN (Low Power Wide Area Network). Its MAC layer adopts an ALOHA-based access protocol. Although this access mechanism is simple and easy to implement, it also easily aggravates the occurrence of conflicts and collisions, degrading the communication performance of the whole system. Therefore, it is necessary to study the mutual interference when multiple terminals occupy the channel resources at the same time. For an LoRa signal, its SF (Spreading Factor) determines the communication coverage of the signal. Thus, the effect of an interference signal on the demodulation performance of a transmitting signal was analyzed when the SF of the interference signal was the same as or different from the SF of the transmitting signal. Experimental results show that the interference effect between the same SF signals is relatively large. When the SF of the interference signal is different from that of the transmitting signal, the effect of interference is relatively small. The SIR (Signal-to-Interference Ratio) required for correct demodulation at the receiver side was obtained through theoretical analyses. It can be seen that LoRa signals with different SFs can be viewed as pseudo-orthogonal.

    Distributed power allocation algorithm based on graph convolutional network for D2D communication systems
    Chuanlin PANG, Rui TANG, Ruizhi ZHANG, Chuan LIU, Jia LIU, Shibo YUE
    2024, 44(9):  2855-2862.  DOI: 10.11772/j.issn.1001-9081.2023081221
    Asbtract ( )   HTML ( )   PDF (2647KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In order to effectively control the co-channel interference in Device-to-Device (D2D) communication system while reducing the implementation complexity of the system, a Graph Convolutional Network (GCN)-based distributed power allocation algorithm was proposed to maximize the weighted sum rate of all D2D links. Firstly, the system topology was built into a graph model, and the characteristics of nodes and edges as well as the message-passing manners were defined. Then, the unsupervised learning model was used to train the model parameters in the GCN. After the offline training, each D2D link was able to obtain the optimal power allocation strategy in a distributed manner based on local channel state information and the interaction with neighboring nodes. Experimental results show that compared with the optimization theory-based algorithm, the proposed algorithm cuts down the running time by 97.41% while suffering only 3.409% weighted sum rate loss; and compared with the deep reinforcement learning theory-based algorithm, the proposed algorithm has better generalization ability and is stable under different setting of parameters.

    Computer software technology
    Mutant generation strategy based on program dependencies
    Tian TIAN, Yangyang SHAO, Miaomiao WANG, Huan YANG
    2024, 44(9):  2863-2870.  DOI: 10.11772/j.issn.1001-9081.2023091319
    Asbtract ( )   HTML ( )   PDF (1314KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problem of large numbers of mutants leading to high mutation testing cost, a Program Dependency based Mutant Generation (PDMG) strategy was proposed to select the mutation implementation objects satisfying certain constraint conditions for mutation generation. Firstly, program dependency graphs were generated based on data dependencies and control dependencies. Then, based on the mutation object selection strategy and program dependency graphs, the dependency statements were selected as mutation objects. Finally, the mutation operator was injected to the selected mutation objects in order to generate mutants. The proposed method was applied to mutation testing of 8 benchmark test programs. Experimental results show that compared with Random Selection (RS) and Mutation Operator Selection (MOS) strategies, PDMG strategy can reduce the mutants by 52.20% on average, improving the execution efficiency of mutation testing without reducing the effectiveness of mutation testing.

    Multimedia computing and computer simulation
    Optimization model for small object detection based on multi-level feature bidirectional fusion
    Yexin PAN, Zhe YANG
    2024, 44(9):  2871-2877.  DOI: 10.11772/j.issn.1001-9081.2023091274
    Asbtract ( )   HTML ( )   PDF (1447KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Due to objective factors such as small inherent features and the depth of the network causing feature loss, the detection of small objects is always a challenging issue in the field of object detection. To address the above issues, a model for optimizing the detection of small objects was proposed based on multiple feature enhancements based on the network structure. Firstly, the optimization of gradient calculation was achieved by replacing Spatial Pyramid Pooling (SPP) in the backbone network. Secondly, a multi-level bidirectional fusion at the feature level and the addition of Adaptive Feature Fusion (AFF) module to the output head were employed in the network neck to achieve multi-level feature enhancement. Experimental results show that on COCO2017-val dataset, when the IoU (Intersection over Union) is 0.5, the average precision of the proposed model reaches 61.4%, which is 4.7 percentage points higher than that of the currently popular YOLOv7 model. At the same time, the detection frame rate of the proposed model with a single GPU is 78.2 frame/s, which is in line with industrial level detection speed.

    Siamese mixed information fusion algorithm for RGBT tracking
    Ying HUANG, Jiayu YANG, Jiahao JIN, Bangrui WAN
    2024, 44(9):  2878-2885.  DOI: 10.11772/j.issn.1001-9081.2023081223
    Asbtract ( )   HTML ( )   PDF (3144KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The core of visible light and thermal infrared tracking (RGBT (RGB-Thermal) tracking for shot) lies in the effective utilization of information from different modalities. To address the problem of low-quality results produced by single branch in decision-level fusion affecting algorithm’s object decision-making, a Siamese mixed information fusion algorithm — SiamMIF was proposed for RGBT tracking. Firstly, Siamese Backbone Network (SBN) was used for multi-modal feature extraction. Secondly, the affect of low-quality images on the dual-branch parallel decision-making was analyzed from the perspective of signal-to-noise ratio, and an Signal-to-Noise Ratio (SNR)-driven Information Interaction Module (IIM) was designed for information complementation of information with low signal-to-noise ratio. Thirdly, a Dual-stream Anchor-free Head (DAH) was employed for the classification and regression of the compensated features. Finally, an Adaptive Lightweight Decision Module (ALDM) was used to fuse the tracking results and determine the object’s position quickly. Experimental results on four RGBT benchmark datasets including GTOT, RGBT234, VOT-RGBT2019 and LasHeR show that the success rate and precision of the proposed method on LasHeR dataset are 0.396 and 0.518 respectively, and compared to the APFNet (Attribute-based Progressive Fusion Network), there are a 9.4% improvement in success rate and a 3.6% enhancement in precision. At the same time, SiamMIF achieves good results on other three datasets, and the frame rate on GPU can reach 40 frame/s.

    Crowd counting method based on dual attention mechanism
    Zhiqiang ZHAO, Peihong MA, Xinhong HEI
    2024, 44(9):  2886-2892.  DOI: 10.11772/j.issn.1001-9081.2023091269
    Asbtract ( )   HTML ( )   PDF (2158KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In response to challenges such as scale variation, background interference, and partial occlusion in crowd counting within complex scenes, a DA-DCCNN (Dual Attention based Dilated Contextual Convolutional Neural Network) was proposed. Firstly, the convolutional layers from VGG16 were utilized as feature extractors to obtain abstract and deep-level feature maps of the crowd image. Subsequently, by employing dilated convolutions, a Dilated Context Module (DCM) was constructed to connect features obtained from different layers. The Spatial Attention Module (SAM) and Channel Attention Module (CAM) were introduced to acquire contextual information. Finally, a loss function was formulated by combining the Euclidean distance and cross entropy to measure the disparity between the predicted attention map and the ground truth attention map. Experimental results on three publicly available datasets — ShanghaiTech, UCF_CC_50 and UCF-QNRF demonstrate that DA-DCCNN can effectively capture multi-scale features in the image and enhance the perception of important regions and channels within the image, achieving the optimal Mean Absolute Error (MAE). The feature fusion network based on dual attention mechanism can efficiently recognize spatial structures and local features in images so that by using the generated density maps, the crowd regions can be predicted and counted more accurately.

    Unsupervised person re-identification based on self-distilled vision Transformer
    Jieru JIA, Jianchao YANG, Shuorui ZHANG, Tao YAN, Bin CHEN
    2024, 44(9):  2893-2902.  DOI: 10.11772/j.issn.1001-9081.2024040425
    Asbtract ( )   HTML ( )   PDF (2340KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Since the lack of inductive bias in Vision Transformer (ViT) makes it hard to learn meaningful visual representations on relatively small-scale datasets, an unsupervised person re-identification method based on self-distilled vision Transformer was proposed. Firstly, because of the modular architecture of ViT, the feature generated by any intermediate block has the same dimension, so an intermediate Transformer block was selected randomly and was fed into the classifier to obtain prediction results. Secondly, by using the Kullback-Leibler divergence between the minimized randomly selected intermediate classifier output and the final classifier output distribution, the classification prediction results of the intermediate block were constrained to be consistent with the results of the final classifier, and a self-distillation loss function was constructed based on this. Finally, the model was optimized by jointly minimizing the cluster-level contrast loss, instance-level contrast loss, and self-distillation loss. Besides by providing soft supervision from the final classifier to the intermediate block, the inductive bias was introduced to ViT model effectively, so that the model was able to learn more robust and generalized visual representations. Compared to Transformer-based Object Re-IDentification Self-Supervised Learning (TransReID-SSL), the proposed method improves the mean Average Precision (mAP) and Rank-1 by 1.2 and 0.8 percentage points respectively on Market-1501 dataset, and by 3.4 and 3.1 percentage points respectively on MSMT17 dataset. Experimental results demonstrate that the proposed method can increase the unsupervised person re-identification precision effectively.

    Uncertainty-based frame associated short video event detection method
    Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO
    2024, 44(9):  2903-2910.  DOI: 10.11772/j.issn.1001-9081.2023091242
    Asbtract ( )   HTML ( )   PDF (2161KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problem of how to combine the frame uncertainty and temporal correlation of short videos to enhance event detection capability, a frame associated short video event detection method based on uncertainty perception was proposed. Firstly, 2D Convolutional Neural Network (CNN) was used to extract the features of each frame of short video, and then the extracted features were forward propagated several times to obtain the feature mean value and the uncertainty information corresponding to the features through Bayesian variational layering. Secondly, the uncertainty perception module constructed by the model was used to fuse the feature mean value and the uncertainty information, and then the correlations in time domain of the fused features of the frames were strengthened by the temporal correlation module. Finally, the time-domain correlated features were used to realize short video event detection through the classification network. The short video event detection dataset crawled from Flickr platform was utilized to carry out experimental comparison, and the results show that subspace learning methods such as Support Vector Machine (SVM) have the poor classification performance and do not explore high-level semantic representations enough, while deep learning methods have significantly better accuracy for event detection. Compared to Sparse Video-Text Transformer (SViTT) method, the proposed method has the accuracy, Average Recall (AR), and Average Precision (AP) improved by 3.37%, 2.55%, and 2.09%, respectively, so that the effectiveness of the proposed method for the task of short video event detection is verified.

    Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation
    Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU
    2024, 44(9):  2911-2918.  DOI: 10.11772/j.issn.1001-9081.2023091332
    Asbtract ( )   HTML ( )   PDF (2025KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    3D/2D registration is a key technique for intraoperative guidance. In existing deep learning based registration methods, image features were extracted through the network to regress the corresponding pose transformation parameters. This kind of method relies on real samples and their corresponding 3D labels for training, however, this part of expert-annotated medical data is scarce. In the alternative solution, the network was trained with Digital Reconstructed Radiography (DRR) images, which struggled to keep the original accuracy on Xray images due to the differences of image features across domains. For the above problems, an Unsupervised Cross-Domain Transfer Network (UCDTN) based on self-attention was designed. Without relying on Xray images and their 3D spatial labels as the training samples, the correspondence between the image features captured in the source domain and spatial transformations were migrated to the target domain. The public features were used to reduce the disparity of features between domains to minimize the negative impact of cross-domain. Experimental results show that the mTRE (mean Registration Target Error) of the result predicted by UCDTN is 2.66 mm, with a 70.61% reduction compared to the model without cross-domain transfer training, indicating the effectiveness of UCDTN in cross-domain registration tasks.

    Adaptive hybrid network for affective computing in student classroom
    Yan RONG, Jiawen LIU, Xinlei LI
    2024, 44(9):  2919-2930.  DOI: 10.11772/j.issn.1001-9081.2023091303
    Asbtract ( )   HTML ( )   PDF (4730KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Affective computing can provide a better teaching effectiveness and learning experience for intelligent education. Current research on affective computing in classroom domain still suffers from limited adaptability and weak perception on complex scenarios. To address these challenges, a novel hybrid architecture was proposed, namely SC-ACNet, aiming at accurate affective computing for students in classroom. In the architecture, the followings were included: a multi-scale student face detection module capable of adapting to small targets, an affective computing module with an adaptive spatial structure that can adapt to different facial postures to recognize five emotions (calm, confused, jolly, sleepy, and surprised) of students in classroom, and a self-attention module that visualized the regions of the model contributing most to the results. In addition, a new student classroom dataset, SC-ACD, was constructed to alleviate the lack of face emotion image datasets in classroom. Experimental results on SC-ACD dataset show that SC-ACNet improves the mean Average Precision (mAP) by 4.2 percentage points and the accuracy of affective computing by 9.1 percentage points compared with the baseline method YOLOv7. Furthermore, SC-ACNet has the accuracies of 0.972 and 0.994 on common sentiment datasets, namely KDEF and RaFD, validating the viability of the proposed method as a promising solution to elevate the quality of teaching and learning in intelligent classroom.

    Multi-level color restoration of mural image based on gated positional encoding
    Zhigang XU, Chuang ZHANG
    2024, 44(9):  2931-2937.  DOI: 10.11772/j.issn.1001-9081.2023081220
    Asbtract ( )   HTML ( )   PDF (2838KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In recent years, the research on color restoration of mural image has become a hot issue in the fields of mural cultural heritage protection and display. Aiming at the problems that the overall feature information of mural image color restoration is difficult to extract and maintain effectively, and local color restoration is prone to generate false color phenomenon and color spill, a multi-level color restoration method of mural image based on gated positional encoding was proposed. Firstly, an encoder network based on global feature constraints was constructed, and the global feature gradient of the image was extracted as the downsampling value standard by an improved multi-kernel maximum-average-minimum pooling algorithm to establish the mural image feature pyramid, so as to reduce the overall feature loss of the mural image in the feature coding process. Secondly, in order to restore the local color information of mural image accurately, a color transfer module based on gated positional encoding was designed to restrict the learning of similarity kernel between content feature and color feature in spatial domain for accurately mapping of color feature in the mural image to be restored, so as to reduce false color phenomenon and color spill in restored image. Experimental results show that compared with the mural restoration images generated by AdaIN (Adaptive Instance Normalization), AST (Arbitrary Style Transfer) and other comparison methods, the NIQE (Natural Image Quality Evaluator) and PIQE (Perception based Image Quality Evaluator) in the mural restoration images generated by proposed method achieve the best results. It can be seen that the proposed method has good performance in restoring the color information of mural image and maintaining the global structural and textural characteristics of the mural image to be restored.

    Frontier and comprehensive applications
    Formation obstacle-avoidance and reconfiguration method for multiple UAVs
    Lingxia MU, Zhengjun ZHOU, Ban WANG, Youmin ZHANG, Xianghong XUE, Kaikai NING
    2024, 44(9):  2938-2946.  DOI: 10.11772/j.issn.1001-9081.2023091342
    Asbtract ( )   HTML ( )   PDF (5293KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problems of obstacles and failures in multi-UAV (Unmanned Aerial Vehicle) formation flight, a dynamic formation switching and reconfiguration method was proposed. For a certain UAV in the formation, the obstacles and other UAVs in formation were regarded as dynamic threats. By adaptively adjusting score weights considering different flying scenarios, the ability of the UAV formation to avoid obstacles in dynamic environment was improved. When one UAV in the formation occurs faults, the remaining UAVs were formation reconfigured. By changing the positions of the followers relative to the leader in the objective function of the dynamic window approach, a new formation without the fault UAV was reconfigured. By this means, fault-tolerant formation could be obtained. Simulation results show that the proposed formation obstacle avoidance and reconfiguration algorithm can realize dynamic obstacle avoidance and fault-tolerant formation flight in the case of one UAV fault or lack of power. At the same time, compared with the traditional method, the distance error between the UAVs in the formation is lower.

    Short-term traffic flow prediction of urban highway based on variant residual model and Transformer
    Xin YANG, Xueni CHEN, Chunjiang WU, Shijie ZHOU
    2024, 44(9):  2947-2951.  DOI: 10.11772/j.issn.1001-9081.2023091262
    Asbtract ( )   HTML ( )   PDF (1254KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The prediction of urban highway traffic flow is influenced by historical traffic flow and neighboring lane traffic flow, involving complex spatio-temporal features. In order to address the insufficient feature extraction, feature confusion, and feature information loss caused by not separating the spatio-temporal features in the traditional traffic flow prediction model of Convolutional Long Short-Term Memory (ConvLSTM) network, some improvements were made to the ConvLSTM model. Firstly, the short-term temporal features and spatial features of the traffic flow data at each sampling moment were extracted, and the short-term spatio-temporal features of the traffic flow were fused in specific dimensions. Secondly, residual mapping was performed. Finally, the mapped short-term spatio-temporal features were input to the Transformer model to capture the long-term spatio-temporal features of the traffic flow data, based on which the traffic flow at each sampling point in the future moment was predicted. On California urban freeway data, with Mean Absolute Error (MAE) as the model evaluation metric, the proposed model has the prediction accuracy improved by 18% compared to the Conv-Transformer model, validating the effectiveness of the proposed model.

    Port traffic flow prediction based on knowledge graph and spatio-temporal diffusion graph convolutional network
    Guixiang XUE, Hui WANG, Weifeng ZHOU, Yu LIU, Yan LI
    2024, 44(9):  2952-2957.  DOI: 10.11772/j.issn.1001-9081.2023081100
    Asbtract ( )   HTML ( )   PDF (1614KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Accurate prediction of port traffic flow is a challenging task due to its stochastic uncertainty and time-unsteady characteristics. In order to improve the accuracy of port traffic flow prediction, a port traffic flow prediction model based on knowledge graph and spatio-temporal diffusion graph convolution network, named KG-DGCN-GRU, was proposed, taking into account the external disturbances such as meteorological conditions and the opening and closing status of the port-adjacent highway. The factors related to the port traffic network were represented by the knowledge graph, and the semantic information of various external factors were learned from the port knowledge graph by using the knowledge representation method, and Diffusion Graph Convolutional Network (DGCN) and Gated Recurrent Unit (GRU) were used to effectively extract the spatio-temporal dependency features of the port traffic flow. The experimental results based on the Tianjin Port traffic dataset show that KG-DGCN-GRU can effectively improve the prediction accuracy through knowledge graph and diffusion graph convolutional network, the Root Mean Squared Error (RMSE) is reduced by 4.85% and 7.04% and the Mean Absolute Error (MAE) is reduced by 5.80% and 8.17%, compared with Temporal Graph Convolutional Network (T-GCN) and Diffusion Convolutional Recurrent Neural Network (DCRNN) under single step prediction (15 min).

    Safe reinforcement learning method for decision making of autonomous lane changing based on trajectory prediction
    Hailin XIAO, Tianyi HUANG, Qiuxiang DAI, Yuejun ZHANG, Zhongshan ZHANG
    2024, 44(9):  2958-2963.  DOI: 10.11772/j.issn.1001-9081.2023091266
    Asbtract ( )   HTML ( )   PDF (1903KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Deep reinforcement learning easily leads to unsafe actions in the training process due to its trial-and-error learning characteristics in decision-making problem of autonomous lane changing. Therefore, a safe reinforcement learning method for decision making of autonomous lane changing based on trajectory prediction was proposed. Firstly, the future trajectories of the vehicles were predicted through probabilistic modeling of maximum likelihood estimation. Secondly, driving risk assessment was performed by using the obtained trajectory prediction and safety distance. And the safe actions were constrained according to the driving risk assessment results, which means that the action space was cut into the safe action space and the intelligent vehicle was guided to avoid dangerous actions. The proposed method was tested and compared with Deep Q-Network (DQN) and its improved methods in the freeway scene of simulation platform. Experimental results show that the proposed method can reduce the number of collisions by 47%-57% compared to other methods while ensuring fast convergence during intelligent vehicle training process, and thus improves the safety during training process effectively.

    Molecular toxicity prediction based on meta graph isomorphism network
    Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG
    2024, 44(9):  2964-2969.  DOI: 10.11772/j.issn.1001-9081.2023091286
    Asbtract ( )   HTML ( )   PDF (1150KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To obtain more accurate molecular toxicity prediction results, a molecular toxicity prediction model based on meta Graph Isomorphism Network (GIN) was proposed, namely Meta-MTP. Firstly, graph isomorphism neural network was used to obtain molecular characterization by using atoms as nodes, bonds as edges, and molecules as graph structures. The pre-trained model was used to initialize the GIN to obtain better parameters. A feedforward Transformer incorporating layer-wise attention and local enhancement was introduced. Atom type prediction and bond prediction were used as auxiliary tasks to extract more internal molecular information. The model was trained through a meta learning dual-level optimization strategy. Finally, the model was trained using Tox21 and SIDER datasets. Experimental results on Tox21 and SIDER datasets show that Meta-MTP has good molecular toxicity prediction ability. When the number of samples is 10, compared to FSGNNTR (Few-Shot Graph Neural Network-TRansformer) model in all tasks, the Area Under the ROC Curve (AUC) of Meta-MTP is improved by 1.4% and 5.4% respectively. Compared to three traditional graph neural network models, Graph Isomorphism Network (GIN), Graph Convolutional Network (GCN), and Graph Sample and AGgrEgate (GraphSAGE), the AUC of Meta-MTP improves by 18.3%-23.7% and 7.3%-22.2% respectively.

    Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network
    Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG
    2024, 44(9):  2970-2974.  DOI: 10.11772/j.issn.1001-9081.2023091371
    Asbtract ( )   HTML ( )   PDF (1067KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The diagnosis of major depressive disorder predominantly relies on subjective methods, including physician consultations and scale assessments, which may lead to misdiagnosis. EEG (ElectroEncephaloGraphy) offers advantages such as high temporal resolution, low cost, ease of setup, and non-invasiveness, making it a potential quantitative measurement tool for psychiatric disorders, including depressive disorder. Recently, deep learning algorithms have been diversely applied to EEG signals, notably in the diagnosis and classification of depressive disorder. Due to significant redundancy is observed when processing EEG signals through a self-attention mechanism, a convolutional neural network leveraging a Probabilistic sparse Self-Attention mechanism (PSANet) was proposed. Firstly, a limited number of pivotal attention points were chosen in the self-attention mechanism based on the sampling factor, addressing the high computational cost and facilitating its application to extensive EEG data sequences; concurrently, EEG data was amalgamated with patients’ physiological scales for a comprehensive diagnosis. Experiments were executed on a dataset encompassing both depressive disorder patients and a healthy control group. Experimental results show that PSANet exhibits superior classification accuracy and a reduced number of parameters relative to alternative methodologies such as EEGNet.

    Self-optimized dual-modal multi-channel non-deep vestibular schwannoma recognition model
    Rui ZHANG, Pengyun ZHANG, Meirong GAO
    2024, 44(9):  2975-2982.  DOI: 10.11772/j.issn.1001-9081.2023091273
    Asbtract ( )   HTML ( )   PDF (2542KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problems of the corresponding features between different modals easy to be fused and mislocated, the subjective empirical parameter adjustment of recognition model experts, and the high computational cost, a self-optimized dual-modal (“contrast enhanced T1 weighting” and “high resolution enhanced T2 weighting”) multi-channel non-deep vestibular schwannoma recognition model was proposed. Firstly, a vestibular schwannoma recognition model was constructed to further explore the multi-modal image features of vestibular schwannoma and the complex nonlinear complementary information among the modals. Then, a model optimization strategy with global parallel sparrow search algorithm based on game theory was designed to realize the adaptive optimization of key hyperparameters of the model, so that the model had a better recognition effect. Experimental results show that compared with the deep learning-based model, the proposed model reduces the number of parameters by 27.9% with an improvement of 4.19 percentage points in recognition accuracy, which verifies the effectiveness and adaptability of the proposed model.

2025 Vol.45 No.2

Current Issue
Archive
Honorary Editor-in-Chief: ZHANG Jingzhong
Editor-in-Chief: XU Zongben
Associate Editor: SHEN Hengtao XIA Zhaohui
Domestic Post Distribution Code: 62-110
Foreign Distribution Code: M4616
Address:
No. 9, 4th Section of South Renmin Road, Chengdu 610041, China
Tel: 028-85224283-803
  028-85222239-803
Website: www.joca.cn
E-mail: bjb@joca.cn
WeChat
Join CCF