Loading...

Table of Content

    10 April 2023, Volume 43 Issue 4
    Artificial intelligence
    Survey of multimodal pre-training models
    Huiru WANG, Xiuhong LI, Zhe LI, Chunming MA, Zeyu REN, Dan YANG
    2023, 43(4):  991-1004.  DOI: 10.11772/j.issn.1001-9081.2022020296
    Asbtract ( )   HTML ( )   PDF (5539KB) ( )   PDF(mobile) (3280KB) ( 89 )  
    Figures and Tables | References | Related Articles | Metrics

    By using complex pre-training targets and a large number of model parameters, Pre-Training Model (PTM) can effectively obtain rich knowledge from unlabeled data. However, the development of the multimodal PTMs is still in its infancy. According to the difference between modals, most of the current multimodal PTMs were divided into the image-text PTMs and video-text PTMs. According to the different data fusion methods, the multimodal PTMs were divided into two types: single-stream models and two-stream models. Firstly, common pre-training tasks and downstream tasks used in validation experiments were summarized. Secondly, the common models in the area of multimodal pre-training were sorted out, and the downstream tasks of each model and the performance and experimental data of the models were listed in tables for comparison. Thirdly, the application scenarios of M6 (Multi-Modality to Multi-Modality Multitask Mega-transformer) model, Cross-modal Prompt Tuning (CPT) model, VideoBERT (Video Bidirectional Encoder Representations from Transformers) model, and AliceMind (Alibaba’s collection of encoder-decoders from Mind) model in specific downstream tasks were introduced. Finally, the challenges and future research directions faced by related multimodal PTM work were summed up.

    Deep graph matching model based on self-attention network
    Zhoubo XU, Puqing CHEN, Huadong LIU, Xin YANG
    2023, 43(4):  1005-1012.  DOI: 10.11772/j.issn.1001-9081.2022030345
    Asbtract ( )   HTML ( )   PDF (2118KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Node feature representation was learned by Graph Convolutional Network (GCN) by deep graph matching models in the stage of node feature extraction. However, GCN was limited by the learning ability for node feature representation, affecting the distinguishability of node features, which causes poor measurement of node similarity, and leads to the loss of model matching accuracy. To solve the problem, a deep graph matching model based on self-attention network was proposed. In the stage of node feature extraction, a new self-attention network was used to learn node features. The principle of the network is improving the feature description of nodes by utilizing spatial encoder to learn the spatial structures of nodes, and using self-attention mechanism to learn the relations among all the nodes. In addition, in order to reduce the loss of accuracy caused by relaxed graph matching problem, the graph matching problem was modelled to an integer linear programming problem. At the same time, structural matching constraints were added to graph matching problem on the basis of node matching, and an efficient combinatorial optimization solver was introduced to calculate the local optimal solution of graph matching problem. Experimental results show that on PASCAL VOC dataset, compared with Permutation loss and Cross-graph Affinity based Graph Matching (PCA-GM), the proposed model has the average matching precision on 20 classes of images increased by 14.8 percentage points, on Willow Object dataset, the proposed model has the average matching precision on 5 classes of images improved by 7.3 percentage points, and achieves the best results on object matching tasks such as bicycles and plants.

    Feature extraction model based on neighbor supervised locally invariant robust principal component analysis
    Mengting GE, Minghua WAN
    2023, 43(4):  1013-1020.  DOI: 10.11772/j.issn.1001-9081.2022030329
    Asbtract ( )   HTML ( )   PDF (1981KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Focused on the issue that the category relationship between samples is not considered in the unsupervised Locally Invariant Robust Principal Component Analysis (LIRPCA) algorithm, a feature extraction model based on Neighbor Supervised LIRPCA (NSLIRPCA) was proposed. The category information between samples was considered by the proposed model, and a relationship matrix was constructed based on this information. The formulas of the model were solved and the convergences of the formulas were proved. At the same time, the proposed model was applied to various occlusion datasets. Experimental results show that compared with Principal Component Analysis (PCA), PCA based on L1-norm (PCA-L1), Non-negative Matrix Factorization (NMF), Locality Preserving Projection (LPP) and LIRPCA algorithms on ORL, Yale, COIL-Processed and PolyU datasets, the proposed model has the recognition rate improved by 8.80%, 7.76%, 20.37%, 4.72% and 4.61% at most respectively on the original image datasets, and the recognition rate improved by 30.79%, 30.73%, 36.02%, 19.65% and 17.31% at most respectively on the occluded image datasets. It can be seen that with the proposed model, the recognition performance of the algorithm is improved, and the complexity of the model is reduced, verifying that the model is obviously better than the comparison algorithms.

    User granularity-level personalized social text generation model
    Yongbing GAO, Juntian GAO, Rong MA, Lidong YANG
    2023, 43(4):  1021-1028.  DOI: 10.11772/j.issn.1001-9081.2022030460
    Asbtract ( )   HTML ( )   PDF (2546KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In the field of open social text, the generated text content lacks personalized features. In order to solve the problem, a user-level fine-grained control generation model was proposed, namely PTG-GPT2-Chinese (Personalized Text Generation Generative Pre-trained Transformer 2-Chinese). In the proposed model, on the basis of the GPT2 (Generative Pre-trained Transformer 2.0) structure, an Encoder-Decoder model framework was designed. First, the static personalized information of a user was modeled and encoded on the Encoder side, a bidirectional independent attention module was added on the Decoder side to receive the static personalized feature vector, and the attention module in the original GPT2 structure was used for capturing the dynamic personalized features in the user’s text. Then, the scores of different attention modules were weighted and fused dynamically, and were participated in the subsequent decoding, thereby automatically generating social text constrained by the user’s personalized feature attributes. However, the semantic sparsity of the user’s basic information may cause conflicts between the generated text and some personalized features. Aiming at this problem, the BERT (Bidirectional Encoder Representations from Transformers) model was used to perform the secondary enhanced generation of consistent understanding between the output data of the Decoder side and the user’s personalized features, and finally the personalized social text generation was realized. Experimental results show that compared with the GPT2 model, the proposed model has the fluency improved by 0.36% to 0.72%, and on the basis of no loss of language fluency, the secondary generation makes the two evaluation indicators: personalization and consistency increase by 10.27% and 13.24% respectively. It is proved that the proposed model can assist user’s creation effectively and generate social text that is fluent and personalized for the user.

    Document-level relation extraction method based on path labels
    Quan YUAN, Yunpeng XU, Chengliang TANG
    2023, 43(4):  1029-1035.  DOI: 10.11772/j.issn.1001-9081.2022030327
    Asbtract ( )   HTML ( )   PDF (1581KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Due to the high complexity of text processing in document-level relation extraction, it is difficult to extract efficient entity relations. Therefore, a path label based document-level extraction method was proposed to select key evidence sentences. Firstly, the Path label was introduced to replace the entity sentence as the processed text dataset for data preprocessing. At the same time, combined with the U-Net model of semantic segmentation, the encoding module at the input end was used to capture the context information of the document entity, and the image style was used to capture the context information of the document entities, and the U-Net semantic segmentation module was used to capture the global dependencies among entity triples. Finally, a Softmax function was introduced to decrease the noise of text extraction. Theoretical analysis and simulation results show that compared with the graph neural network-based RoBERTa (Robustly optimized Bidirectional Encoder Representations from Transformers) (RoBERTa?ATLOP) relation extraction algorithm, Path+U-Net has the F1-score in the development and testing of Document-level Relation Extraction Dataset (DocRED) increased by 1.31 and 0.54 percentage points respectively, and the F1-score in development and testing of Chemical Disease Response (CDR) dataset improved by 1.32 and 1.19 percentage points respectively. At the same time, Path+U-Net has lower extraction cost for datasets and higher extraction accuracy of text, while the correlation between entities is consistent with the correlation in the original dataset. Experimental results show that the proposed extraction algorithm based on path labels can effectively improve the extraction efficiency of long texts.

    Pseudo relevance feedback method for dense retrieval
    Wenhao HU, Jing LUO, Xinhui TU
    2023, 43(4):  1036-1042.  DOI: 10.11772/j.issn.1001-9081.2022030480
    Asbtract ( )   HTML ( )   PDF (1463KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Pseudo Relevance Feedback (PRF) mechanism is an automated Query Expansion (QE) technology that uses the original query and the information contained in the top N documents in the initial retrieval to build more accurate queries. It can further improve the performance of retrieval systems. However, the existing PRF methods for dense retrieval have two problems: lack of semantic information due to text truncation, and high time complexity in retrieval stages. Aiming at these problems, an PRF method based on paragraph-level granularity and can be used in dense retrieval for long texts, namely Dense-PRD, was proposed. Firstly, the embeddings of relevant paragraphs from top N documents of the initial retrieval were obtained by semantic distance calculation. Secondly, the QE term embeddings were obtained by average polling of the relevant paragraph embeddings. Thirdly, new query embeddings were constructed by combining the original query embeddings and QE term embeddings according to their weights. Finally, the final retrieval results were obtained according to new query embeddings. In experiments of comparing Dense-PRF with baseline models on two classic long text test datasets of Robust04 and WT2G, compared to model RepBERT+BM25, Dense-PRF has the accuracy and Normalized Discounted Cumulative Gain (NDCG) index of the top 20 documents improved by 1.66, 1.32 percentage points and 2.30, 1.91 percentage points. Experimental results demonstrate that Dense-PRF can effectively alleviate the mismatches between queries and document vocabularies and improve the retrieval accuracy.

    Session-based recommendation model based on enhanced capsule network
    Hao SUN, Jian CAO, Haisheng LI, Dianhui MAO
    2023, 43(4):  1043-1049.  DOI: 10.11772/j.issn.1001-9081.2022040481
    Asbtract ( )   HTML ( )   PDF (1960KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the dependencies between items are difficult to be captured by the present session-based recommendation models from short sessions, with complex item interactions and dynamic user interest changes considered, a Session-based Recommendation of Enhanced Capsule Network (SR-ECN) model was proposed. First, session sequence data was processed by using the Graph Neural Network (GNN) to obtain embedded vector of each item. Then, the dynamic routing mechanism of the capsule network was used to aggregate high-level user preferences from the interaction history. In addition, a self-attention network was introduced by the proposed model to further consider potential information about users and items, thereby recommending more suitable items for users. Experimental results show that, on Yoochoose dataset, the proposed model is superior to all comparison models such as Session-based Recommendation with GNN (SR-GNN), Target Attentive GNN (TAGNN), and the proposed model improves 0.92 and 0.45 percentage points compared to the Lossless Edge-order preserving aggregation and Shortcut graph attention for Session-based Recommendation (LESSR) model in terms of recommendation recall and Mean Reciprocal Rank (MRR) respectively.

    Pre-hospital emergency text classification model based on label confusion
    Xu ZHANG, Long SHENG, Haifang ZHANG, Feng TIAN, Wei WANG
    2023, 43(4):  1050-1055.  DOI: 10.11772/j.issn.1001-9081.2022020317
    Asbtract ( )   HTML ( )   PDF (1907KB) ( )   PDF(mobile) (906KB) ( 3 )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problems of a lot of specialized vocabulary, sparse features, and a large degree of label confusion in pre-hospital emergency text, a Label Confusion Model (LCM)-based text classification model was proposed. Firstly, Bidirectional Encoder Representation from Transformers (BERT) was used to obtain dynamic word vectors and fully exploit semantic information of specialized vocabulary. Then, the text representation vector was generated by fusing Bidirectional Long Short-Term Memory (BiLSTM) network, weighted convolution, and attention mechanism to improve the feature extraction capability of the model. Finally, LCM was used to obtain semantic associations between text and labels, and dependencies between labels to solve the problem of a large degree of label confusion. In the experiments conducted on the pre-hospital emergency text and public news text datasets, the F1 scores of the LCM-based text classification model reached 93.46% and 97.08%, respectively, which were 0.95% to 7.01% and 0.38% to 2.00% higher than those of the models such as Text Convolutional Neural Network (TextCNN), BiLSTM, and BiLSTM-Attention, respectively. Experimental results show that the proposed model can obtain the semantic information of specialized vocabulary, extract text features more accurately, and effectively solve the problem of large degree of label confusion. At the same time, the proposed model has a certain generalization ability.

    Fine-grained emotion classification of Chinese microblog based on syntactic dependency graph
    Cheng FANG, Bei LI, Ping HAN, Qiong WU
    2023, 43(4):  1056-1061.  DOI: 10.11772/j.issn.1001-9081.2022030469
    Asbtract ( )   HTML ( )   PDF (1598KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Emotion analysis can quickly and accurately dig out users’ emotional tendencies, and has a huge application market. Aiming at the complexity and diversity of the microblog language’s syntactic structures, a Syntax Graph Convolution Network (SGCN) model was proposed for fine-grained emotion classification of Chinese microblog. The proposed model has the characteristics of rich structural and semantic expression at the same time. In the model, a text graph was constructed on the basis of the dependency between words, and the correlation degree between words was quantified by Pointwise Mutual Information (PMI). After that, the PMI was used as the weight of the corresponding edge to represent the structural information of the sentence. The semantic features fusing location information were taken as the initial features of nodes to increase the semantic features of nodes in the text graph. Experimental results on the microblog emotion classification dataset of Social Media Processing 2020 (SMP2020) show that for two sets of microblog data containing six categories of emotions: happiness, sadness, anger, fear, surprise, and emotionlessness, the average F1-score of the proposed model reaches 72.64% which is 2.75 and 3.87 percentage points higher than those of the BERT (Bidirectional Encoder Representations from Transformers) Graph Convolutional Network (BGCN) model and the Text Level Graph Neural Network (Text-Level-GNN) model, verifying that the proposed model can use the structural information of sentences more effectively to improve the classification performance than other deep learning models.

    Instance segmentation algorithm based on Fastformer and self-supervised contrastive learning
    Rong GAO, Jiawei SHEN, Xiongkai SHAO, Xinyun WU
    2023, 43(4):  1062-1070.  DOI: 10.11772/j.issn.1001-9081.2022020270
    Asbtract ( )   HTML ( )   PDF (3193KB) ( )   PDF(mobile) (4595KB) ( 11 )  
    Figures and Tables | References | Related Articles | Metrics

    To address problems of low detection precision, coarse masks and weak generalization ability of the existing instance segmentation algorithms for occluded and blurred instances, an instance segmentation algorithm based on Fastformer and self-supervised contrastive learning was proposed. Firstly, in order to enhance the ability of algorithm to extract global information of feature maps, the Fastformer module based on additive attention was added after feature extraction network, and interrelationship between pixels in each layer of feature map was modeled deeply. Secondly, inspired by self-supervised learning, a self-supervised contrastive learning module was added to conduct self-supervised contrastive learning to instances in images to enhance the ability of algorithm to understand images, thereby improving segmentation results in environments with much noise interference. Experimental results show that the proposed algorithm has the mean Average Precision (mAP) improved by 3.1 and 2.5 percentage points respectively, compared to recently classical instance segmentation algorithm SOLOv2(Segmenting Objects by LOcations v2) on Cityscapes dataset and COCO2017 dataset. And a great balance is achieved between real-time performance and precision by the proposed algorithm, leading good robustness in segmentation instance of complex scenes.

    Aerial target identification method based on switching reasoning evidential network under incomplete information
    Yu WANG, Zilin FAN, Tianjun REN, Xiaofei JI
    2023, 43(4):  1071-1078.  DOI: 10.11772/j.issn.1001-9081.2022020287
    Asbtract ( )   HTML ( )   PDF (2178KB) ( )   PDF(mobile) (700KB) ( 4 )  
    Figures and Tables | References | Related Articles | Metrics

    Existing evidential reasoning methods have fixed model structure, single information processing mode and reasoning mechanism, making these methods difficult to be applied to target identification in an environment with a variety of incomplete information such as uncertain, error and missing information. To address this problem, a Switching Reasoning Evidential Network (SR-EN) method was proposed. Firstly, a multi-template network model was constructed considering evidence-node deletion and other situations. Then, conditional correlation between each evidence variable and target type was analyzed to establish an reasoning rule base for incomplete information. Finally, an intelligent spatio-temporal fusion reasoning method based on three evidence input and correction methods was proposed. Compared with traditional Evidential Network (EN) and combination methods of two information correction methods, such as EN and Technique for Order Preference by Similarity to an Ideal Solution (TOPSIS), SR-EN can achieve continuous and accurate identification for aerial targets under multiple types of random incomplete information while ensuring reasoning timeliness. Experimental results show that SR-EN can realize adaptive switching of evidence processing methods, network structures and fusion rules among nodes in continuous reasoning process through effective identification of various types of incomplete information.

    Data science and technology
    Inner product reduction in formal context
    Qing WANG, Xiuwei GAO, Yehai XIE, Guilong LIU
    2023, 43(4):  1079-1085.  DOI: 10.11772/j.issn.1001-9081.2022030328
    Asbtract ( )   HTML ( )   PDF (1082KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Formal concept analysis is an important tool for knowledge representation and mining, and formal context is one of the basic concepts in formal concept analysis. A new attribute reduction — inner product reduction was proposed to solve the problem of whether the object set in the formal context has the same attribute in a given attribute set, and also to solve the problem of how to eliminate irrelevant attributes in the calculation. Firstly, the concept of inner product was given in formal context. Then, the reduction theory and method in relation system were used to define the inner product reduction, and the inner product reduction algorithm based on discernibility matrix was proposed to obtain all the reduction results in the formal context, and the reduction core was obtained through the intersection operation based on the results. In addition, when attributes increased, an incremental inner product reduction algorithm was designed. Finally, the application of inner product reduction was explored in infectious disease network. In the simulated case, 6 attributes were reduced to 2 attributes. Simulation outcomes demonstrate that the inner product reduction method is feasible, interpretable, and successful in achieving the knowledge reduction goal.

    Imbalanced data classification method based on Lasso and constructive covering algorithm
    Yi JIANG, Shuping WU, Kun HU, Linbo LONG
    2023, 43(4):  1086-1093.  DOI: 10.11772/j.issn.1001-9081.2022040490
    Asbtract ( )   HTML ( )   PDF (1003KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problem that the machine learning classification algorithms have insufficient ability to identify minority samples in the imbalanced data classification problems, an imbalanced data classification method L-CCSmote (Least absolute shrinkage and selection operator Constructive Covering Synthetic minority oversampling technique) was proposed by taking the telecom customer churn scenario as an example. Firstly, the churn costumer related features were extracted through Lasso (Least absolute shrinkage and selection operator) to optimize the model input. Then, a neural network was built through Constructive Covering Algorithm (CCA) to generate coverages that conformed to the overall distribution of samples. Finally, a single-sample coverage strategy, a sample diversity strategy and a sample density peak strategy were further proposed to perform a hybrid sampling to balance the data. Total of 13 imbalanced datasets and 2 desensitized telecom customer datasets were selected from KEEL data base, and the proposed method was verified on Logistic Regression (LR) and Support Vector Machine (SVM) classification algorithms respectively. On LR classification algorithm, compared with the Synthetic Minority Oversampling TEchnique Edited nearest neighbor (SMOTE-Enn), the proposed method had the average Geometric MEAN (G-MEAN) increased by 2.32%. On SVM classification algorithm, compared with the Borderline-SMOTE (Borderline Synthetic Minority Oversampling Technique), the proposed method had the average G-MEAN increased by 2.44%. Experimental results show that the proposed method can solve the influence of class skew distribution on classification, and its recognition ability for rare classes is better than that of the classical balanced data classification methods.

    Generative adversarial network based data uncertainty quantification method
    Hao WANG, Zicheng WANG, Chao ZHANG, Yunsheng MA
    2023, 43(4):  1094-1101.  DOI: 10.11772/j.issn.1001-9081.2022030383
    Asbtract ( )   HTML ( )   PDF (2018KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To solve the problem that the direct use of high-dimensional, high-frequency, noise-containing real-world data to perform data processing leads to unreliable estimators, a data uncertainty quantification method based on Generative Adversarial Network (GAN) was proposed. Firstly, the original data distribution was reconstructed by GAN to construct a mapping distribution from the noise space to the space of the original data. Secondly, the samples were extracted by Markov Chain Monte Carlo (MCMC) method to obtain new samples based on the original data distribution. Thirdly, confidence intervals for the uncertainty of the samples were defined based on the specified functions. Finally, the confidence intervals were used to estimate the uncertainty of the original data, and within the data the confidence intervals was selected as the data used by the estimator. Experimental results show that 50% fewer samples are required to train the estimator to reach the upper limit by using the data within the confidence intervals compared to the samples required by using the original data. At the same time, compared to the original data, the data within the confidence intervals requires 30% fewer samples on average to achieve the same test accuracy.

    Error replica recovery mechanism for cloud storage based on auditable multiple replicas
    Zhenjie XIE, Wei FU
    2023, 43(4):  1102-1108.  DOI: 10.11772/j.issn.1001-9081.2022030477
    Asbtract ( )   HTML ( )   PDF (1642KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Concerning the error replica recovery problem of cloud storage system with auditable multiple replicas, based on the multi-replica cloud storage integrity audit scheme, the error replica recovery mechanism was expounded from five aspects: overall process, influencing factors, recovery strategy, fault location and computation model; the error replica recovery strategies were summarized into four types: full-replica download and upload, full-replica difference upload, fault-block upload and fault-segment upload; the factors affecting the recovery efficiency were quantified; and the computation model for communication overhead, computation overhead and total overhead were proposed. For a specific multi-replica cloud storage integrity audit scheme, the overhead of correcting random errors of one data block under different strategies and parameters was analyzed quantitatively. Experimental results show that when the bandwidth is 1 Mb/s, 10 Mb/s, 100 Mb/s and 1 Gb/s respectively, the time cost of the optimal strategy in the experiment is only 0.34%, 2.44%, 15.27% and 46.93% respectively of that of the full-replica difference upload strategy. It can be seen that the proposed models can be used to select appropriate strategies and parameters for auditable multi-replica cloud storage system to improve the efficiency of recovering error replicas, especially in the case of limited network bandwidth.

    Key node mining in complex network based on improved local structural entropy
    Peng LI, Shilin WANG, Guangwu CHEN, Guanghui YAN
    2023, 43(4):  1109-1114.  DOI: 10.11772/j.issn.1001-9081.2022040562
    Asbtract ( )   HTML ( )   PDF (1367KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The identification of key nodes in complex network plays an important role in the optimization of network structure and effective propagation of information. Local structural Entropy (LE) can be used to identify key nodes by using the influence of the local network on the whole network instead of the influence of nodes on the whole network. However, the cases of the highly aggregative network and nodes forming a loop with neighbor nodes are not considered in LE, which leads to some limitations. To address these limitations, firstly, an improved LE based node importance evaluation method, namely PLE (Penalized Local structural Entropy), was proposed, in which based on the LE, the Clustering Coefficient (CC) was introduced as a penalty term to penalize the highly aggregative nodes in the network appropriately. Secondly, due to the fact that the penalty of PLE penalizing the nodes in triadic closure structure is too much, an improved method of PLE, namely PLEA (Penalized Local structural Entropy Advancement) was proposed, in which control coefficient was introduced in front of the penalty term to control the penalty strength. Selective attack experiments on five real networks with different sizes were conducted. Experimental results show that in the western US states grid and the US Airlines, PLEA has the identification accuracy improved by 26.3% and 3.2% compared with LE respectively, by 380% and 5.43% compared with K-Shell (KS) method respectively, and by 14.4% and 24% compared with DCL (Degree and Clustering coefficient and Location) method respectively. The key nodes identified by PLEA can cause more damage to the network, verifying the rationality of introducing the CC as a penalty term, and the effectiveness and superiority of PLEA. The integration of the number of neighbors and the local network structure of nodes with the simplicity of computation makes it more effective in describing the reliability and invulnerability of large-scale networks.

    Extended belief network recommendation model based on user dynamic interaction behavior
    Caiqian BAO, Jianmin XU, Guofang ZHANG
    2023, 43(4):  1115-1121.  DOI: 10.11772/j.issn.1001-9081.2022020279
    Asbtract ( )   HTML ( )   PDF (1378KB) ( )   PDF(mobile) (1068KB) ( 3 )  
    Figures and Tables | References | Related Articles | Metrics

    An Extended Belief Network Recommendation model based on User Dynamic Interaction Behavior (EBNR_UDIB) was proposed to solve the problem of failing to consider accuracy and diversity simultaneously in current recommendation methods due to the unitary way to combine evidence. Firstly, a three-layer basic Belief Network Recommendation (BNR) model was constructed to provide an effective and flexible framework for the introduction of evidence. Secondly, by analyzing direct and coupled interaction relationships among users, the interaction strength was calculated, and this strength was further adjusted by a dynamic time decay factor. Finally, taking the interest of user weighted by this strength as new evidence, EBNR_UDIB was obtained by using two combination ways of evidence: conjunction and disjunction. Experimental results show that compared with Content-Based Recommendation Model (CBRM) and Social relationship-Based Recommendation Model (SBRM), the proposed model has the accuracy, recall, and F1-measure increased by at least 7, 4, and 5 percentage points respectively under conjunction combination way, and increased by least 2, 8, and 6 percentage points respectively under disjunction combination way; on the diversity and novelty metrics, the proposed model under disjunction combination way is improved by least 15 and 6 percentage points respectively compared to the above two models, and the proposed model under conjunction combination way outperforms the comparison models at the same time.

    Long-tail recommendation model based on adaptive group reranking
    Canghong JIN, Yuhua SHAO, Qinfang HE
    2023, 43(4):  1122-1128.  DOI: 10.11772/j.issn.1001-9081.2022030455
    Asbtract ( )   HTML ( )   PDF (1249KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The traditional recommendation algorithms pay too much attention to the precision of recommendation, which leads to the high recommendation rate of popular items. At the same time, the unpopular items are not paid attention to for a long time. This is a classic long-tail problem. In response to this problem, a Multi-objective Dimension Optimization recommendation Model (MDOM), named Adaptive Group Reranking recommendation Model (AGRM) was proposed, with the construction of two-dimensional weighted similarity based on Euclidean distance and the incorporation of adaptive group reranking. Firstly, a two-dimensional weighted similarity measure was constructed using Euclidean distance, the replacement ratio was set dynamically according to the individual’s historical behavior records, and the long-tail recommendation problem was solved by using the multi-objective optimization algorithm integrated with group. Secondly, two concise objective functions were designed, and the complexity of the objective functions was reduced by taking popularity and long-tail attention into account. Thirdly, based on the two-dimensional weighted similarity measure, a user subset was selected as the "best recommended user group", and the Pareto optimal solution was calculated. Experimental results on MovieLens 1M and Yahoo datasets show that the coverage of AGRM is the best, with an average increase of 4.11 percentage points and 25.38 percentage points respectively compared to that of Item-based Collaborative Filtering (ItemCF) algorithm, and an average increase of 8.38 percentage points and 33.19 percentage points respectively compared to that of Deep Variational Autoencoder with Shallow Parallel Path for Top-N Recommendation (VASP) model. On Yahoo dataset, the average popularity of AGRM recommendation is the lowest, indicating that AGRM can recommend more long-tail items.

    Traffic flow prediction model based on time series decomposition
    Jin XIA, Zhengqun WANG, Shiming ZHU
    2023, 43(4):  1129-1135.  DOI: 10.11772/j.issn.1001-9081.2022030473
    Asbtract ( )   HTML ( )   PDF (2485KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Short-term traffic flow prediction is not only related to historical data, but also affected by the traffic of adjacent areas. Since the trend and spatial correlation of traffic flow are ignored by traditional Time Series Decomposition (TSD) models, a time series processing model based on the combination of Time Series Decomposition and Spatio-Temporal features (TSD-ST) was proposed. Firstly, the trend component and periodic component were obtained by using Empirical Mode Decomposition (EMD) and Discrete Fourier Transform (DFT), the Spatio-Temporal (ST) correlation of the fluctuation component was mined by Mutual Information algorithm (MI), and the state vector was reconstructed on the basis of the above. Then, the fluctuation component was predicted by using the state vector through Long Short-Term Memory (LSTM) network. Finally, the final predicted value was obtained by reconstructing the prediction results of the three parts of the sequence. The validity of the model was verified on the real data of Interstate I090 in Washington State, USA. Experimental results show that the Root Mean Square Error (RMSE) of the proposed model TSD-ST-LSTM is reduced by 16.5%, 34.0%, and 36.6% compared with that of Support Vector Regression (SVR), Gradient Boosting Regression Tree (GBRT) and LSTM, respectively. It can be seen that the proposed model is very effective in improving prediction accuracy.

    Data enhancement method for drugs under graph-structured representation
    Yinjiang CAI, Guangjun XU, Xibo MA
    2023, 43(4):  1136-1141.  DOI: 10.11772/j.issn.1001-9081.2022040489
    Asbtract ( )   HTML ( )   PDF (1966KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Small sample data can lead to over-fitting problems in machine learning models. In the field of drug development, most data tend to be small samples, which greatly limits the application of machine learning techniques in this field. To solve the above problem, a drug data enhancement method based on graph structure was proposed. The samples were perturbed by the proposed method and new similar samples were generated to expand the dataset. The proposed method are consisted of four sub-methods, which are node discarding method based on molecular backbone, edge discarding method based on molecular backbone, multi-sample splicing methods and hybrid strategy method. In specific, the perturbation of drug molecules was completed by the node and edge discarding method based on molecular backbone in the way of a small number of deletion operation on the composition and structure of drug molecules; the perturbation was completed by the multi-sample splicing method through using an addition operation to combine different molecules; in the hybrid strategy method, the diversity of data enhancement results was improved by combining the deletion and addition operation in a certain ratio. The proposed method improved the Area Under receiver operating characteristic Curve (AUC) of the drug attribute prediction baseline model MG-BERT (Molecular Graph Bidirectional Encoder Representations from Transformer) by 1.94% to 12.49% on public datasets BACE, BBBP, ToxCast and ClinTox. Experimental results demonstrate the effectiveness of the proposed method on small sample drug data enhancement.

    Cyber security
    Review of zero trust network and its key technologies
    Qun WANG, Quan YUAN, Fujuan LI, Lingling XIA
    2023, 43(4):  1142-1150.  DOI: 10.11772/j.issn.1001-9081.2022030453
    Asbtract ( )   HTML ( )   PDF (2001KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    With increasingly severe network security threats and increasingly complex security defense means, zero trust network is a new evaluation and review of traditional boundary security architecture. Zero trust emphasizes never always trusting anything and verifying things continuously. Zero trust network emphasizes that the identity is not identified by location, all access controls strictly execute minimum permissions, and all access processes are tracked in real time and evaluated dynamically. Firstly, the basic definition of zero trust network was given, the main problems of traditional perimeter security were pointed out, and the zero trust network model was described. Secondly, the key technologies of zero trust network, such as Software Defined Perimeter (SDP), identity and access management, micro segmentation and Automated Configuration Management System (ACMS), were analyzed. Finally, zero trust network was summarized and its future development was prospected.

    Deep explainable method for encrypted traffic classification
    Jian CUI, Kailang MA, Yu SUN, Dou WANG, Junliang ZHOU
    2023, 43(4):  1151-1159.  DOI: 10.11772/j.issn.1001-9081.2022030382
    Asbtract ( )   HTML ( )   PDF (3314KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Current deep learning models have achieved significant performance advantages over traditional machine learning methods in encrypted traffic classification tasks. However, due to inherent black-box characteristic of the deep learning model, users cannot know the mechanism of classification decisions made by the model. In order to enhance the credibility of the deep learning model while ensuring the classification accuracy, an explainable method for deep learning model of encrypted traffic classification was proposed, including prototype-based traffic-level active explanation and feature similarity saliency map based packet-level passive explanation. Firstly, the prototype-based Flow Prototype Network (FlowProtoNet) was used to automatically extract typical segments of traffic during training, namely, traffic prototypes. Secondly, the similarity between the tested traffic and each prototype was calculated during testing to realize the classification and the traceability explanation of the training set. Thirdly, in order to further improve the visual explainability, Gradient Similarity Saliency Map (Grad-SSM) method was proposed, in which the irrelevant regions of classification decision were filtered out by weighting feature map with gradient, and then the Earth Mover’s Distance (EMD) between the tested traffic and the prototype extracted by FlowProtoNet was calculated to obtain the similarity matrix achieving further focus on attention heatmap by comparing the test traffic and this prototype. On ISCX VPN-nonVPN dataset, the accuracy of the proposed method reaches 96.86%, which is similar with that of the inexplainable method. And FlowProtoNet can further provide classification basis by giving the similarity with the prototype. At the same time, the proposed method has stronger visual explainability and focuses more on the key packets in the traffic.

    Federated learning algorithm based on personalized differential privacy
    Chunyong YIN, Rui QU
    2023, 43(4):  1160-1168.  DOI: 10.11772/j.issn.1001-9081.2022030337
    Asbtract ( )   HTML ( )   PDF (1800KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Federated Learning (FL) can effectively protect users' personal data from attackers. Differential Privacy (DP) is applied to enhance the privacy of FL, which can solve the problem of privacy disclose caused by parameters in the model training. However, existing FL methods based on DP on concentrate on the unified privacy protection budget and ignore the personalized privacy requirements of users. To solve this problem, a two-stage Federated Learning with Personalized Differential Privacy (PDP-FL) algorithm was proposed. In the first stage, the user's privacy was graded according to the user's privacy preference, and the noise meeting the user's privacy preference was added to achieve the purpose of personalized privacy protection. At the same time, the privacy level corresponding to the privacy preference was uploaded to the central aggregation server. In the second stage, in order to fully protect the global data, the simultaneous local and central protection strategy was adopted. And according to the privacy level uploaded by the user, the noise conforming to the global DP threshold was added to quantify the global privacy protection level. Experimental results show that on MNIST and CIFAR-10 datasets, the classification accuracy of PDP-FL algorithm reaches 93.8% to 94.5% and 43.4% to 45.2% respectively, which is better than those of Federated learning with Local Differential Privacy (LDP-Fed) algorithm and Federated Learning with Global Differential Privacy (GDP-FL) algorithm, PDP-FL algorithm meets the needs of personalized privacy protection.

    Network intrusion detection model based on efficient federated learning algorithm
    Shaochen HAO, Zizuan WEI, Yao MA, Dan YU, Yongle CHEN
    2023, 43(4):  1169-1175.  DOI: 10.11772/j.issn.1001-9081.2022020305
    Asbtract ( )   HTML ( )   PDF (1650KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    After the introduction of federated learning technology in intrusion detection scenarios, there is a problem that the traffic data between nodes is non-independent and identically distributed (non-iid), which makes it difficult for models to aggregate and obtain a high recognition rate. To solve this problem, an efficient federated learning algorithm named H?E?Fed was constructed, and a network intrusion detection model based on this algorithm was proposed. Firstly, a global model for traffic data was designed by the coordinator and was sent to the intrusion detection nodes for model training. Then, by the coordinator, the local models were collected and the skewness of the covariance matrix of the local models between nodes was evaluated, so as to measure the correlation of models between nodes, thereby reassigning model aggregation parameters and generating a new global model. Finally, multiple rounds of interactions between the coordinator and the nodes were carried out until the global model converged. Experimental results show that compared with the models based on FedAvg (Federated Averaging) algorithm and FedProx algorithm, under data non-iid phenomenon between nodes, the proposed model has the communication consumption relatively low. And on KDDCup99 dataset and CICIDS2017 dataset, compared with baseline models, the proposed model has the accuracy improved by 10.39%, 8.14% and 4.40%, 5.98% respectively.

    Virus propagation model and stability analysis of heterogeneous backup network
    Yingqi LI, Weifeng JI, Jiang WENG, Xuan WU, Xiuyu SHEN, Yan SUN
    2023, 43(4):  1176-1182.  DOI: 10.11772/j.issn.1001-9081.2022030409
    Asbtract ( )   HTML ( )   PDF (2043KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Concerning the secondary attack problem of virus in cloud computing, data center and other virtual network-based environments, the virus propagation and immune mechanism under the background of dynamic platform defense was studied, and a heterogeneous backup based network virus defense method was proposed. Firstly, the process of secondary attack of redundant backup was analyzed, and the law of virus action was summarized. At the same time, combined with the idea of dynamic platform defense, the heterogeneous platform state node was introduced, and a Susceptible-Escaped-Infected-Removed-Heterogeneous-Susceptible (SEIRHS) virus propagation model was proposed. Secondly, the local stability at the equilibrium point of the model was proved by using the Routh-Hurwitz stability criterion, and the basic reproductive number was solved. Finally, the proposed model was compared with the traditional Susceptible-Infected-Removed (SIR) and Susceptible-Escaped-Infected-Removed (SEIR) models through simulation analysis, the stability of the model was verified, and the effect of virus propagation influencing factors on virus spread scale was discussed. The simulation results show that the proposed model can objectively reflect the propagation law of virus in the network, and effectively improve the network’s defense effect against the virus by reducing the node degree, increasing the Infected-Heterogeneous (I-H) state transition probability, and reducing the probability of being hidden by the virus during backup, etc.

    CFL-based authentication and communication scheme for industrial control system
    Songbai LAN, Fangxiao LI, Leyi SHI
    2023, 43(4):  1183-1190.  DOI: 10.11772/j.issn.1001-9081.2022030451
    Asbtract ( )   HTML ( )   PDF (1990KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problems of key leakage, single point of failure and high communication overhead in the central authentication scheme widely used in Industrial Control Systems (ICSs), the Cryptography Fundamental Logics (CFL) authentication technology with domestic independent intellectual property right was introduced into the authentication and communication process of ICSs, and a CFL-based authentication and communication scheme for ICS was proposed. Firstly, between two communicating parties, the dynamic certificates with right, which were generated by the identity label and authority information of each other were exchanged and verified, so that the decentralized authentication of the identities of the two parties and the negotiation of the session key were realized. Secondly, the session key, CFL dynamic signature and access control rules were used to ensure the secure communication between the two parties. Finally, the detailed logs of control process were encrypted and stored to realize traceable process. Theoretical analysis and experimental results show that this scheme no longer needs the participation of remote authentication center in the authentication stage, and realizes the local and efficient authentication among industrial control equipments. The minimum system throughput improvement of the proposed scheme is 92.53% compared to the Public Key Infrastructure (PKI) scheme and 141.37% compared to the Identity-Based Encryption (IBE) scheme when facing a large number of authentication requests, which means that the proposed scheme can better meet the requirements of large-scale authentication and millisecond-level security communication in ICSs.

    Efficient robust zero-watermarking algorithm for 3D medical images based on ray-casting sampling and quaternion orthogonal moment
    Jian GAO, Zhi LI, Bin FAN, Chuanxian JIANG
    2023, 43(4):  1191-1197.  DOI: 10.11772/j.issn.1001-9081.2021050746
    Asbtract ( )   HTML ( )   PDF (2206KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the copyright protection problem of 3D medical images and the simultaneous expansion problem of watermark storage capacity caused by the increase of the number of images to be protected, a robust zero-watermarking algorithm based on ray-casting sampling and polar complex exponential moment was proposed. Firstly, a sampling algorithm based on ray-casting was proposed to sample the features of 3D medical images composed of multiple sequences of 2D medical images and describe these features in 2D image space. Secondly, a robust zero-watermarking algorithm for 3D medical images was proposed. In the algorithm, three 2D feature images of coronal, sagittal planes and cross section of the 3D medical image were obtained by ray-casting sampling, and the three 2D feature images were transformed by polar complex exponential to obtain the quaternion orthogonal moment. Finally, the zero-watermarking information was constructed by using the quadratic orthogonal moment and Logistic chaotic encryption. Simulation results show that the proposed algorithm can maintain the bit correctness rate of zero-watermarking extraction above 0.920 0 under various common image processing attacks and geometric attacks; the watermark storage capacity of the proposed algorithm can be improved with the increase of the volume of 3D medical image data, and the storage capacity of the proposed algorithm has been improved by 93.75% at least compared to the other 2D medical image zero-watermarking algorithms for comparison.

    Advanced computing
    Reliability of k-ary (n-m)-cube subnetworks under probabilistic fault condition
    Kai FENG, Tong LIU
    2023, 43(4):  1198-1205.  DOI: 10.11772/j.issn.1001-9081.2022030414
    Asbtract ( )   HTML ( )   PDF (894KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The k-ary n-cube has many good characteristics, and it has become one of the most commonly used interconnection network topologies in multiprocessor systems. The maintenance ability of system subnetworks plays an important role for the practical applications of the systems when failures occur in the interconnection network. In order to accurately measure the fault tolerance of subnetworks with arbitrary size in a k-ary n-cube, the reliability of k-ary (n-m)-cube subnetworks in a k-ary n-cube in the presence of failures was studied. When k was an odd integer and k was bigger than 2, the upper bound and lower bound on the probability that at least one k-ary (n-m)-cube subnetwork was fault-free in a k-ary n-cube were obtained under the probabilistic fault condition, and an approximate method for evaluating the reliability was proposed. Experimental results show that there is a gradual convergence between the upper bound and lower bound on the k-ary (n-m)-cube subnetwork reliability as the vertex reliability decreases, and the evaluation result obtained by the approximate method is relatively accurate when the vertex reliability is large.

    Computer software technology
    Feature selection method based on self-adaptive hybrid particle swarm optimization for software defect prediction
    Zhenhua YU, Zhengqi LIU, Ying LIU, Cheng GUO
    2023, 43(4):  1206-1213.  DOI: 10.11772/j.issn.1001-9081.2022030444
    Asbtract ( )   HTML ( )   PDF (1910KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Feature selection is a key step in data preprocessing for software defect prediction. Aiming at the problems of existing feature selection methods such as not significant dimension reduction performance and low classification accuracy of selected optimal feature subset, a feature selection method for software defect prediction based on Self-adaptive Hybrid Particle Swarm Optimization (SHPSO) was proposed. Firstly, combined with population partition, a self-adaptive weight update strategy based on Q-learning was designed, in which Q-learning was introduced to adaptively adjust the inertia weight according to the states of the particles. Secondly, to balance the global search ability in the early stage of the algorithm and the convergence speed in the later stage, the curve adaptivity based time-varying learning factors were proposed. Finally, a hybrid location update strategy was adopted to help particles jump out of the local optimal solution as soon as possible and increase the diversity of particles. Experiments were carried out on 12 public software defect datasets. The results show that the proposed method can effectively improve the classification accuracy of software defect prediction model and reduce the dimension of feature space compared with the method using all features, the commonly used traditional feature selection methods and the mainstream feature selection methods based on intelligent optimization algorithms. Compared with Improved Salp Swarm Algorithm (ISSA), the proposed method increases the classification accuracy by about 1.60% on average and reduces the feature subset size by about 63.79% on average. Experimental results show that the proposed method can select a feature subset with high classification accuracy and small size.

    Multifactorial backtracking search optimization algorithm for solving automated test case generation problem
    Zhongbo HU, Xupeng WANG
    2023, 43(4):  1214-1219.  DOI: 10.11772/j.issn.1001-9081.2022030393
    Asbtract ( )   HTML ( )   PDF (1135KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Automated Test Case Generation for Path Coverage (ATCG-PC) problem is a hot topic in the field of automated software testing. The fitness functions commonly used by swarm intelligence evolutionary algorithms in ATCG-PC problem are highly similar with each other, but the existing swarm intelligence evolutionary algorithms for solving ATCG-PC problem do not consider this similarity feature yet. Inspired by the similarity feature, the two similar fitness functions were treated as two tasks, so that ATCG-PC problem was transformed into a multi-task ATCG-PC problem, and a new swarm intelligence evolutionary algorithm called Multifactorial Backtracking Search optimization Algorithm (MFBSA) was proposed to solve multi-task ATCG-PC problem. In the proposed algorithm, the memory population function of multifactorial selection Ⅰ was used to improve the global search ability, and the similar tasks were able to improve each other’s optimization efficiency through knowledge transfer by assortative memory mating. The performance of MFBSA was evaluated on six fog computing test programs and six natural language processing test programs. Compared with Backtracking Search optimization Algorithm (BSA), Immune Genetic Algorithm (IGA), Particle Swarm Optimization with Convergence Speed Controller (PSO-CSC) algorithm, Adaptive Particle Swarm Optimization (APSO) algorithm and Differential Evolution with Hypercube-based learning strategies (DE-H) algorithm, MFBSA has the total test cases used to cover the paths on 12 test programs reduced by 64.46%, 66.64%, 67.99%, 74.15%, and 61.97%, respectively. Experimental results show that the proposed algorithm can effectively reduce testing cost.

    Multimedia computing and computer simulation
    Robust RGB-D SLAM system incorporating instance segmentation and clustering in dynamic environment
    Tianzouzi XIAO, Xiaobo ZHOU, Xin LUO, Qipeng TANG
    2023, 43(4):  1220-1225.  DOI: 10.11772/j.issn.1001-9081.2022020261
    Asbtract ( )   HTML ( )   PDF (2537KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Visual Simultaneous Location And Mapping (VSLAM) technology is commonly used for indoor robot navigation and perception. However, the pose estimation method of VSLAM aims at static environment, and might lead to the location and mapping failure when moving objects exist in the scene. To solve this problem, an Instance Segmentation and Clustering SLAM (ISC-SLAM) system was proposed. In this system, the instance segmentation network was used to generate the possibility masks of dynamic objects in the scene, and the dynamic points in the scene were detected by using the multi-view geometry method during the segmentation. After matching the obtained possibility masks and the detected dynamic points, the accurate dynamic masks of moving objects were determined. The feature points of the dynamic objects were able to be deleted by using the dynamic masks and then the position of camera was estimated accurately by using the remained static feature points. To solve the under-segmentation problem of the instance segmentation network, the depth filling algorithm and clustering algorithm were applied to ensure the completed deletion elimination of dynamic feature points. Finally, the moving objects obscured background was reconstructed, and the static point cloud map was built with the correct camera pose. Experimental results on Technical University of Munich (TUM) dataset demonstrate that the proposed system can achieve robust positioning and mapping while ensuring real-time performance in dynamic environment.

    Lightweight algorithm of 3D mesh model for preserving detailed geometric features
    Yun ZHANG, Shuying WANG, Qing ZHENG, Haizhu ZHANG
    2023, 43(4):  1226-1232.  DOI: 10.11772/j.issn.1001-9081.2022030434
    Asbtract ( )   HTML ( )   PDF (3119KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    An important strategy for lightweighting a 3D model is to use the mesh simplification algorithm to reduce the number of triangular meshes on the model surface. The widely used edge collapse algorithm is more efficient and has better simplification effect than other mesh simplification algorithms, but some detailed geometric features may be damaged or lost during the simplification process of this algorithm. Therefore, the approximate curvature of curve and the average area of the first-order neighborhood triangle of the edge to be collapsed were added as penalty factors to optimize the edge collapse cost of the original algorithm. First, according to the definition of curve curvature in geometry, the calculation formula of the approximate curvature of curve was proposed. Then, in the calculation process of vertex normal vector, two stages - area weighting and interior angle weighting were used to modify the initial normal vector, thereby considering more abundant geometric information of the model. The performance of the optimized algorithm was verified by experiments. Compared with the classical Quadratic Error Metric (QEM) algorithm and the mesh simplification algorithm considering the angle error, the optimized algorithm has the maximum error reduced by 73.96% and 49.77% at least and respectively. Compared with the QEM algorithm, the optimized algorithm has the Hausdorff distance reduced by 17.69% at least. It can be seen that in the process of model lightweighting, the optimized algorithm can reduce the deformation of the model and better maintain its own detailed geometric features.

    Oriented line integral convolution algorithm for flow field based on information entropy
    Mengyi LI, Xia FANG, Hongbo ZHENG, Xujia QIN
    2023, 43(4):  1233-1239.  DOI: 10.11772/j.issn.1001-9081.2022030391
    Asbtract ( )   HTML ( )   PDF (4496KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Flow field visualization is a new visualization technology for intuitive analysis of flow field data. And Oriented Line Integral Convolution (OLIC) algorithm, as a classic texture visualization method, can be used to clearly observe the evolution of flow in the direction of flow field. To optimize the visualization effect, an OLIC algorithm based on information entropy was proposed. Firstly, sparse noise based on information entropy was generated on the basis of the flow field vector data. Then, the slope convolution kernel function was used to convolute the input texture. Finally, the final texture image of OLIC was obtained by calculating the gray value of each pixel in the output texture image. In the proposed algorithm, streamlines were able to be generated in the critical point region and non-critical point region according to the entropy value adaptively. As the critical point region contained important information of the flow field, the dense drawing was selected, while in the non-critical point region sparse drawling was selected. By drawing streamlines with different densities in different regions, the algorithm can save computational cost, and the drawing speed of the proposed algorithm is increased by 18.6% at least compared with that of the ordinary OLIC algorithm. In terms of visualization effect, the proposed algorithm is superior to the ordinary global drawing, and can be used to observe feature regions more carefully.

    Dual-branch residual low-light image enhancement combined with attention
    Jiazhen ZU, Yongxia ZHOU, Le CHEN
    2023, 43(4):  1240-1247.  DOI: 10.11772/j.issn.1001-9081.2022030479
    Asbtract ( )   HTML ( )   PDF (4669KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Photos taken in low-light conditions will suffer from a series of visual problems due to underexposure, such as low brightness, loss of information, noise and color distortion. In order to solve the above problems, a dual-branch residual low-light image enhancement network combined with attention was proposed. Firstly, the improved InceptionV2 was used to extract shallow features. Secondly, Residual Feature extraction Block (RFB) and Dense RFB (DRFB) were used to extract deep features. Thirdly, the shallow and deep features were fused, and the fusion result was input into Brightness Adjustment Module (BAM) to adjust the brightness, and finally the enhanced image was obtained. At the same time, a Feature Fusion Module (FFM) was designed in combination with attention mechanism to capture important feature information, which helped to restore dark areas in low-light images. In addition, a joint loss function was introduced to measure the network training loss from multiple aspects. Experimental results show that, compared with RRM (Robust Retinex Model), Zero-DCE (Zero-reference Deep Curve Estimation) and EnlightenGAN (Enlighten Generative Adversarial Network), on LOL (LOw-Light) dataset, the Peak Signal-to-Noise Ratio (PSNR) indicator of the proposed network is increased by 49.9%, 40.0% and 18.5% respectively. Meanwhile, the Structural Similarity Index Measure (SSIM) indicator of the proposed network is increased by 20.3%, 50.0% and 34.5% compared with those of the above three on LOL?V2 dataset. The proposed network improves the brightness of low-light images while reducing noise, color distortion and artifacts, resulting in sharper and more natural enhanced images.

    Tracking appearance features based on attention self-correlation mechanism
    Guangyi DOU, Fanan WEI, Chuangyi QIU, Jianshu CHAO
    2023, 43(4):  1248-1254.  DOI: 10.11772/j.issn.1001-9081.2022030426
    Asbtract ( )   HTML ( )   PDF (2258KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In order to solve the Multi-Objective Tracking (MOT) algorithms’ problems such as ID Switch (IDS) caused by fuzzy pedestrian features and verify the importance of pedestrian appearance in the tracking process, an Attention Self-Correlation Network (ASCN) based on center point detection model was proposed. Firstly, the original image was learned by channel and spatial attention networks to obtain two different feature maps, and the deep information was decoupled. Then, more accurate pedestrian appearance features and pedestrian orientation information were obtained through the autocorrelation learning between the feature maps, and this information was used to track association process. In addition, a tracking dataset of videos at low frame rate conditions was produced to verify the performance of the improved algorithm. When the video frame rate conditions were not ideal, the pedestrian appearance information was obtained by the improved algorithm through ASCN, and the algorithm had better accuracy and robustness than the algorithms only using pedestrian orientation information. Finally, the improved algorithm was tested on the MOT17 dataset of MOT Challenge. Experimental results show that compared with the FairMOT (Fairness in MOT) without adding ASCN, the improved algorithm has the Multiple Object Tracking Accuracy (MOTA) and Identification F-Score (IDF1) increased by 0.5 percentage points and 1.1 percentage points respectively, the number of IDS decreased by 32.2%, and the running speed on a single NVIDIA Tesla V100 card reached 21.2 frames per second. The above proves that the improved algorithm not only reduces the errors in the tracking process, but also improves the overall tracking performance, and can meet the real-time requirements.

    Real-time reconstruction method of visual information for manipulator operation
    Qingyu JIA, Liang CHANG, Xianyi YANG, Baohua QIANG, Shihao ZHANG, Wu XIE, Minghao YANG
    2023, 43(4):  1255-1260.  DOI: 10.11772/j.issn.1001-9081.2022020262
    Asbtract ( )   HTML ( )   PDF (2136KB) ( )   PDF(mobile) (1418KB) ( 1 )  
    Figures and Tables | References | Related Articles | Metrics

    Current skill teaching methods of manipulator mainly construct a virtual space through three-dimensional reconstruction technology for manipulator to simulate and train. However, due to the different visual angles between human and manipulator, the traditional visual information reconstruction methods have large reconstruction errors, long time, and need harsh experimental environment and many sensors, so that the skills learned by manipulator in virtual space can not be well transferred to the real environment. To solve the above problems, a visual information real-time reconstruction method for manipulator operation was proposed. Firstly, information was extracted from real-time RGB images through Mask-Region Convolutional Neural Network(Mask-RCNN). Then, the extracted RGB images and other visual information were jointly encoded, and the visual information was mapped to the three-dimensional position information of the manipulator operation space through Residual Neural Network-18 (ResNet-18). Finally, an outlier adjustment method based on Cluster Center DIStance constrained (CC-DIS) was proposed to reduce the reconstruction error, and the adjusted position information was visualized by Open Graphics Library (OpenGL). In this way, the three-dimensional real-time reconstruction of the manipulator operation space was completed. Experimental results show that the proposed method has high reconstruction speed and reconstruction accuracy. It only takes 62.92 milliseconds to complete a three-dimensional reconstruction with a reconstruction speed of up to 16 frames per second and a reconstruction relative error of about 5.23%. Therefore, it can be effectively applied to the manipulator skill teaching tasks.

    Reconstruction algorithm for highly undersampled magnetic resonance images based on residual graph convolutional neural network
    Xiaoyu FAN, Suzhen LIN, Yanbo WANG, Feng LIU, Dawei LI
    2023, 43(4):  1261-1268.  DOI: 10.11772/j.issn.1001-9081.2022020309
    Asbtract ( )   HTML ( )   PDF (2569KB) ( )   PDF(mobile) (2309KB) ( 3 )  
    Figures and Tables | References | Related Articles | Metrics

    Magnetic Resonance Imaging (MRI) is widely used in the diagnosis of complex diseases because of its non-invasiveness and good soft tissue contrast. Due to the low speed of MRI, most of the acceleration is currently performed by highly undersampled Magnetic Resonance (MR) signals in k-space. However, the representative algorithms often have the problem of blurred details when reconstructing highly undersampled MR images. Therefore, a highly undersampled MR image reconstruction algorithm based on Residual Graph Convolutional Neural nETwork (RGCNET) was proposed. Firstly, auto-encoding technology and Graph Convolutional neural Network (GCN) were used to build a generator. Secondly, the undersampled image was input into the feature extraction (encoder) network to extract features at the bottom layer. Thirdly, the high-level features of MR images were extracted by the GCN block. Fourthly, the initial reconstructed image was generated through the decoder network. Finally, the final high-resolution reconstructed image was obtained through a dynamic game between the generator and the discriminator. Test results on FastMRI dataset show that at 10%, 20%, 30%, 40% and 50% sampling rates, compared with spatial orthogonal attention mechanism based MRI reconstruction algorithm SOGAN(Spatial Orthogonal attention Generative Adversarial Network), the proposed algorithm decreases 3.5%, 26.6%, 23.9%, 13.3% and 14.3% on Normalized Root Mean Square Error (NRMSE), increases 1.2%, 8.7%, 6.9%, 2.9% and 3.2% on Peak Signal-to-Noise Ratio (PSNR) and increases 0.8%, 2.9%, 1.5%, 0.5% and 0.5% on Structural SIMilarity (SSIM) respectively. At the same time, subjective observation also proves that the proposed algorithm can preserve more details and have more realistic visual effects.

    Multi-channel pathological image segmentation with gated axial self-attention
    Zhi CHEN, Xin LI, Liyan LIN, Jing ZHONG, Peng SHI
    2023, 43(4):  1269-1277.  DOI: 10.11772/j.issn.1001-9081.2022030333
    Asbtract ( )   HTML ( )   PDF (4014KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In Hematoxylin-Eosin (HE)-stained pathological images, the uneven distribution of cell staining and the diversity of various tissue morphologies bring great challenges to automated segmentation. Traditional convolutions cannot capture the correlation features between pixels in a large neighborhood, making it difficult to further improve the segmentation performance. Therefore, a Multi-Channel Segmentation Network with gated axial self-attention (MCSegNet) model was proposed to achieve accurate segmentation of nuclei in pathological images. In the proposed model, a dual-encoder and decoder structure was adopted, in which the axial self-attention encoding channel was used to capture global features, while the convolutional encoding channel based on residual structure was used to obtain local fine features. The feature representation was enhanced by feature fusion at the end of the encoding channel, providing a good information base for the decoder. And in the decoder, segmentation results were gradually generated by cascading multiple upsampling modules. In addition, the improved hybrid loss function was used to alleviate the common problem of sample imbalance in pathological images effectively. Experimental results on MoNuSeg2020 public dataset show that the improved segmentation method is 2.66 percentage points and 2.77 percentage points higher than U-Net in terms of F1-score and Intersection over Union (IoU) indicators, respectively, and effectively improves the pathological image segmentation effect and the reliability of clinical diagnosis.

    Highlight removal algorithm for medical endoscopic images
    Yue CHI, Zhengping LI, Chao XU, Bo FENG
    2023, 43(4):  1278-1283.  DOI: 10.11772/j.issn.1001-9081.2022030478
    Asbtract ( )   HTML ( )   PDF (2907KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The existing endoscopic image highlight removal algorithms often have some problems such as unreasonable removal structure and color distortion, which leads to the wrong results of the focus recognition algorithms and image enhancement algorithms. In order to solve the above problems, in the aspect of highlight localization, a method based on the combination of growth in dark region and Scharr filtering was proposed to locate relative highlight; in the aspect of highlight filling, an improved Crinminisi algorithm was proposed. Firstly, through the statistics on a huge amount of data, the search scope was limited and the filling efficiency was increased. Secondly, the statistical scope of priority was improved to avoid repeated meaningless calculations. Finally, the reasonable reconstruction of texture was performed according to the adaptive templates of different regions. Experiments were carried out on endoscopic image dataset of different human tissues, compared with the dichromatic reflection model based method, the Robust Principle Component Analysis (RPCA) method, the thermal diffusion method and the original Criminisi algorithm, the Natural Image Quality Evaluator (NIQE) value of the proposed algorithm was the lowest. Compared with the RPCA method, the thermal diffusion method and the original Crimnisi algorithm, the running time of the proposed algorithm was the lowest. Experimental results show that the proposed algorithm not only has better objective image indicators than other algorithms, but also has a nearly 100-fold improvement in efficiency compared to the original Criminisi algorithm.

    YOLOv4 highway pavement crack detection method using Ghost module and ECA
    Juming HAO, Jingyu YANG, Shumei HAN, Yangping WANG
    2023, 43(4):  1284-1290.  DOI: 10.11772/j.issn.1001-9081.2022030410
    Asbtract ( )   HTML ( )   PDF (5654KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the difficulty of detection of pavement diseases caused by the variety of types and scales of road pavement cracks, a lightweight unmanned aerial vehicle image crack detection method based on the GhostNet was proposed for the detection of different types of cracks in pavement. First, the Ghost module in the lightweight GhostNet was introduced to optimize the YOLOv4 backbone feature extraction network, and the lightweight model YOLOv4-Light was obtained, thereby reducing the model complexity and improving the crack detection speed. Then, the Efficient Channel Attention (ECA) mechanism was integrated in the model prediction output to further enhance the crack feature extraction ability and improve the precision of crack detection. Simulation results show that compared with the existing YOLOv4, the proposed method has the model size reduced by 82.31%, the amount of model parameters reduced by 82.56%, and the crack detection efficiency improved. The method can meet the detection needs of different types of cracks during road transportation.

    Automatic detection and recognition of electric vehicle helmet based on improved YOLOv5s
    Zhouhua ZHU, Qi QI
    2023, 43(4):  1291-1296.  DOI: 10.11772/j.issn.1001-9081.2022020313
    Asbtract ( )   HTML ( )   PDF (2941KB) ( )   PDF(mobile) (3142KB) ( 45 )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problems of low detection precision, poor robustness, and imperfect related systems in the current small object detection of electric vehicle helmet, an electric vehicle helmet detection model was proposed based on improved YOLOv5s algorithm. In the proposed model, Convolutional Block Attention Module (CBAM) and Coordinate Attention (CA) module were introduced, and the improved Non-Maximum Suppression (NMS) - Distance Intersection over Union-Non Maximum Suppression (DIoU-NMS) was used. At the same time, multi-scale feature fusion detection was added and densely connected network was combined to improve feature extraction effect. Finally, a helmet detection system for electric vehicle drivers was established. The improved YOLOv5s algorithm had the mean Average Precision (mAP) increased by 7.1 percentage points when the Intersection over Union (IoU) is 0.5, and Recall increased by 1.6 percentage points compared with the original YOLOv5s on the self-built electric vehicle helmet wearing dataset. Experimental results show that the improved YOLOv5s algorithm can better meet the requirements for detection precision of electric vehicles and the helmets of their drivers in actual situations, and reduce the incidence rate of electric vehicle traffic accidents to a certain extent.

    Handwritten mathematical expression recognition model based on attention mechanism and encoder-decoder
    Lu CHEN, Daoxi CHEN, Yiming LU, Weizhong LU
    2023, 43(4):  1297-1302.  DOI: 10.11772/j.issn.1001-9081.2022020278
    Asbtract ( )   HTML ( )   PDF (1695KB) ( )   PDF(mobile) (993KB) ( 14 )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problem that the existing Handwritten Mathematical Expression Recognition (HMER) methods reduce image resolution and lose feature information after multiple pooling operations in Convolutional Neural Network (CNN), which leads to parsing errors, an encoder-decoder model for HMER based on attention mechanism was proposed. Firstly, Densely connected convolutional Network (DenseNet) was used as the encoder, so that the dense connections were used to enhance feature extraction, promote gradient propagation, and alleviate vanishing gradient. Secondly, Gated Recurrent Unit (GRU) was used as the decoder, and attention mechanism was introduced, so that, the attention was allocated to different regions of image to realize symbol recognition and structural analysis accurately. Finally, the handwritten mathematical expression images were encoded, and the encoding results were decoded into LaTeX sequences. Experimental results on Competition on Recognition of Online Handwritten Mathematical Expressions (CROHME) dataset show that the proposed model has the recognition rate improved to 40.39%. And within the allowable error range of three levels, the model has the recognition rate improved to 52.74%, 58.82% and 62.98%, respectively. Compared with the Bidirectional Long Short-Term Memory (BLSTM) network model, the proposed model increases the recognition rate by 3.17 percentage points. And within the allowable error range of three levels, the proposed model has the recognition rate increased by 8.52 percentage points, 11.56 percentage points, and 12.78 percentage points, respectively. It can be seen that the proposed model can accurately parse the handwritten mathematical expression images, generate LaTeX sequences, and improve the recognition rate.

    Progressive ratio mask-based adaptive noise estimation method
    Jianqing GAO, Yanhui TU, Feng MA, Zhonghua FU
    2023, 43(4):  1303-1308.  DOI: 10.11772/j.issn.1001-9081.2022030384
    Asbtract ( )   HTML ( )   PDF (1425KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Deep learning based speech enhancement algorithms typically perform better than the traditional noise suppression based speech enhancement algorithms. However, deep learning based speech enhancement algorithms usually do not work well when there exists mismatch between training data and test data. Aiming at the above problem, a novel Progressive Ratio Mask (PRM)-based Adaptive Noise Estimation (PRM-ANE) method was proposed, and this method was used for the preprocessing of the speech recognition system. In the method, Improved Minima Controlled Recursive Averaging (IMCRA) algorithm with frame-level noise tracking capability and utterance-level deep progressive learning algorithm nonlinear interactions between speech and noise were used comprehensively. Firstly, two Dimensional-Convolutional Neural Network (2D-CNN) was adopted to learn PRM, which increased with the increase of Signal-to-Noise Ratio (SNR). Then, the PRMs at sentence level were combined by the conventional frame-level speech enhancement algorithm to perform speech enhancement. Finally, the enhanced speech based on the multi-level information fusion was directly fed into speech recognition system to improve the performance of the system. Experimental results on the CHiME-4 real test set show that the proposed method can achieve a relative Word Error Rate (WER) of 7.42%, which is 51.41% lower than that of IMCRA speech enhancement method. Experimental results show that the proposed enhancement method can effectively improve the performance of downstream recognition tasks.

    Frontier and comprehensive applications
    Anti-fraud and anti-tampering online trading mechanism for bulk stock
    Yihan WANG, Chen TANG, Lan ZHANG
    2023, 43(4):  1309-1317.  DOI: 10.11772/j.issn.1001-9081.2022040546
    Asbtract ( )   HTML ( )   PDF (2395KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In view of the huge risks brought by transaction fraud, handover irregularities and other issues in bulk stock online trading, a long-term traceable online trading mechanism was proposed to achieve more reliable bulk stock trading, in order to realize the authenticity and anti-tampering of information and the credibility and anti-fraud of process. Firstly, combined with blockchain, an online trading framework based on the idea of "application-verification-record" was proposed, and the smart contracts were used for multi-party supervision and detailed records for each stage of the trading process. Secondly, to guarantee the authenticity of commodity information, for bulk stock with texture features on its appearance, the commodity fingerprints of the bulk stock were extracted and verified based on the Local Binary Pattern (LBP) algorithm. Finally, to ensure the credibility of the handover process, a standardized handover method of commodities was proposed on the basis of environmental fingerprints. The above trading framework, commodity appearance fingerprint extraction and verification algorithm, and standardized commodity handover method were used jointly to constitute the online trading mechanism. The analysis results show that the proposed trading framework can avoid most of the frauds from the perspectives of user selection and process specification and can identify single-party and two-party frauds occurring in the transactions. The experiment results based on the real log image data show that the proposed commodity appearance fingerprint extraction and verification algorithm can judge different images of the same commodity with 94.00% accuracy and distinguish images of different commodities with 78.30% accuracy. The system performance test shows that the delay of each stage of the proposed trading mechanism is within an acceptable range, and meets the requirements of online trading.

    Signal modulation recognition method based on convolutional long short-term deep neural network
    Haiyu YANG, Wenpu GUO, Kai KANG
    2023, 43(4):  1318-1322.  DOI: 10.11772/j.issn.1001-9081.2022030425
    Asbtract ( )   HTML ( )   PDF (2318KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Focused on the high computational complexity, low recognition rate under the condition of low Signal-to-Noise Ratio (SNR), and relatively simple network structure, a signal modulation recognition method based on Convolutional Long short-term Deep Neural Network (CLDNN) was proposed. Firstly, the open-source benchmark dataset RadioML2016.10a was adopted, and In-phase/Quadrature (I/Q) data conversion was performed on it, then the obtained result was used as the network input. Secondly, the CLDNN model was constructed, which was divided into three parts, that is three-layer Convolutional Neural Network (CNN), two-layer Long Short-Term Memory (LSTM) network, and two-layer Fully Connected Network (FCN). Finally, the proposed model was trained and tested to obtain classification results. Experimental results show that recognition accuracy of CLDNN model increases with SNR improvement and reaches 92% with SNR bigger than 4 dB, which is higher than those of the existing single network structure models such as Residual Neural Network (RES) model, CNN model and RESidual Generative Adversarial Network (RES-GAN) model, in the modulation recognition of 11 kinds of signals at different SNR.

2024 Vol.44 No.3

Current Issue
Archive
Honorary Editor-in-Chief: ZHANG Jingzhong
Editor-in-Chief: XU Zongben
Associate Editor: SHEN Hengtao XIA Zhaohui
Domestic Post Distribution Code: 62-110
Foreign Distribution Code: M4616
Address:
No. 9, 4th Section of South Renmin Road, Chengdu 610041, China
Tel: 028-85224283-803
  028-85222239-803
Website: www.joca.cn
E-mail: bjb@joca.cn
WeChat
Join CCF