Loading...

Table of Content

    10 March 2022, Volume 42 Issue 3
    2021 CCF Conference on Artificial Intelligence (CCFAI 2021)
    Network embedding method based on multi-granularity community information
    Jun HU, Zhengkang XU, Li LIU, Fujin ZHONG
    2022, 42(3):  663-670.  DOI: 10.11772/j.issn.1001-9081.2021040790
    Asbtract ( )   HTML ( )   PDF (758KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Most of the existing network embedding methods only preserve the local structure information of the network, while they ignore other potential information in the network. In order to preserve the community information of the network and reflect the multi-granularity characteristics of the network community structure, a network Embedding method based on Multi-Granularity Community information (EMGC) was proposed. Firstly, the network’s multi-granularity community structure was obtained, the node embedding and the community embedding were initialized. Then, according to the node embedding at previous level of granularity and the community structure at this level of granularity, the community embedding was updated, and the corresponding node embedding was adjusted. Finally, the node embeddings under different community granularities were spliced to obtain the network embedding that fused the community information of different granularities. Experiments on four real network datasets were carried out. Compared with the methods that do not consider community information (DeepWalk, node2vec) and the methods that consider single-granularity community information (ComE, GEMSEC), EMGC’s AUC value on link prediction and F1 score on node classification are generally better than those of the comparison methods. The experimental results show that EMGC can effectively improve the accuracy of subsequent link prediction and node classification.

    Partially explainable non-negative matrix tri-factorization algorithm based on prior knowledge
    Lu CHEN, Xiaoxia ZHANG, Hong YU
    2022, 42(3):  671-675.  DOI: 10.11772/j.issn.1001-9081.2021040927
    Asbtract ( )   HTML ( )   PDF (600KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Non-negative Matrix Tri-Factorization (NMTF) is an important part of the latent factor model. Because this algorithm decomposes the original data matrix into three mutually constrained latent factor matrices, it has been widely used in research fields such as recommender systems and transfer learning. However, there is no research work on the interpretability of non-negative matrix tri-factorization. From this view, by regarding the user comment text information as prior knowledge, Partially Explainable Non-negative Matrix Tri-Factorization (PE-NMTF) algorithm was designed based on prior knowledge. Firstly, sentiment analysis technology was used by to extract the emotional polarity preferences of user comment text information. Then, the objective function and updating formula in non-negative matrix tri-factorization algorithm were changed, embedding prior knowledge into the algorithm. Finally, a large number of experiments were carried out on the Yelp and Amazon datasets for the cold start task of the recommender system and the AwA and CUB datasets for the image zero-shot task to compare the proposed algorithm with the non-negative matrix factorization and the non-negative matrix three-factor decomposition algorithms. The experimental results show that the proposed algorithm performs well on RMSE (Root Mean Square Error), NDCG (Normalized Discounted Cumulative Gain), NMI (Normalized Mutual Information), and ACC (ACCuracy), and the feasibility and effectiveness of the non-negative matrix tri-factorization were verified by using prior knowledge.

    Online kernel regression based on random sketching method
    Qinghua LIU, Shizhong LIAO
    2022, 42(3):  676-682.  DOI: 10.11772/j.issn.1001-9081.2021040869
    Asbtract ( )   HTML ( )   PDF (628KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In online kernel regression learning, the inverse matrix of the kernel matrix needs to be calculated when a new sample arrives, and the computational complexity is at least the square of the number of rounds. The idea of applying sketching method to hypothesis updating was introduced, and a more efficient online kernel regression algorithm via sketching method was proposed. Firstly, The loss function was set as the square loss, a new gradient descent algorithm, called FTL-Online Kernel Regression (F-OKR) was proposed, using the Nystr?m approximation method to approximate the Kernel, and applying the idea of Follow-The-Leader (FTL). Then, sketching method was used to accelerate F-OKR so that the computational complexity of F-OKR was reduced to the level of linearity with the number of rounds and sketch scale, and square with the data dimension. Finally, an efficient online kernel regression algorithm called Sketched Online Kernel Regression (SOKR) was designed. Compared to F-OKR, SOKR had no change in accuracy and reduced the runtime by about 16.7% on some datasets. The sub-linear regret bounds of these two algorithms were proved, and experimental results on standard regression datasets also verify that the algorithms have better performance than NOGD (Nystr?m Online Gradient Descent) algorithm, the average loss of all the datasets was reduced by about 64%.

    Doubly feature-weighted fuzzy support vector machine
    Yunzhi QIU, Tinghua WANG, Xiaolu DAI
    2022, 42(3):  683-687.  DOI: 10.11772/j.issn.1001-9081.2021040760
    Asbtract ( )   HTML ( )   PDF (434KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Concerning the shortcoming that the current feature-weighted Fuzzy Support Vector Machines (FSVM) only consider the influence of feature weights on the membership functions but ignore the application of feature weights to the kernel functions calculation during sample training, a new FSVM algorithm that considers the influence of feature weights on the membership function and the kernel function calculation simultaneously was proposed, namely Doubly Feature-Weighted FSVM (DFW-FSVM). Firstly, relative weight of each feature was calculated by using Information Gain (IG). Secondly, the weighted Euclidean distance between the sample and the class center was calculated in the original space based on the feature weights, and then the membership function was constructed by applying the weighted Euclidean distance; at the same time, the feature weights were applied to the calculation of the kernel function in the sample training process. Finally, DFW-FSVM algorithm was constructed according to the weighted membership functions and kernel functions. In this way, DFW-FSVM is able to avoid being dominated by trivial relevant or irrelevant features. The comparative experiments were carried out on eight UCI datasets, and the results show that compared with the best results of SVM, FSVM, Feature-Weighted SVM (FWSVM), Feature-Weighted FSVM (FWFSVM) and FSVM based on Centered Kernel Alignment (CKA-FSVM) , the accuracy and F1 value of the DFW-FSVM algorithm increase by 2.33 and 5.07 percentage points, respectively, indicating that the proposed DFW-FSVM has good classification performance.

    Hierarchical classification online streaming feature selection algorithm based on ReliefF algorithm
    Xiaoqing ZHANG, Chenxi WANG, Yan LYU, Yaojin LIN
    2022, 42(3):  688-694.  DOI: 10.11772/j.issn.1001-9081.2021040789
    Asbtract ( )   HTML ( )   PDF (860KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In practical classification tasks such as image annotation and disease diagnosis, there is usually a hierarchical structural relationship between the classes in the label space of data with high dimensionality of the features. Many hierarchical feature selection algorithms have been proposed for different practical tasks, but ignoring the unknown and uncertainty of feature space. In order to solve the above problems, an online streaming feature selection algorithm OH_ReliefF based on ReliefF for hierarchical classification learning was presented. Firstly, the hierarchical relationship between classes was incorporated into the ReliefF algorithm to define a new method HF_ReliefF for calculating feature weights for hierarchical data. Then, important features were dynamically selected based on the ability of features to classify decision attributes. Finally, the dynamic redundancy analysis of features was performed based on the independence between features. Experimental results show that the proposed algorithm achieves better results in all evaluation metrics of the K-Nearest Neighbor (KNN) classifier and the Lagrangian Support Vector Machine (LSVM) classifier at least 7 percentage points improvement in accuracy when compared with five advanced online streaming feature selection algorithms.

    Hybrid ITÖ algorithm for multi-scale colored traveling salesman problem
    Shuning HAN, Min XU, Xueshi DONG, Qing LIN, Fanfan SHEN
    2022, 42(3):  695-700.  DOI: 10.11772/j.issn.1001-9081.2021040776
    Asbtract ( )   HTML ( )   PDF (474KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Colored Traveling Salesman Problem (CTSP) is a variant of Multiple Traveling Salesmen Problem (MTSP) and Traveling Salesman Problem (TSP), which can be applied to the engineering problems such as Multi-machine Engineering System (MES) with overlapping workspace. CTSP is an NP complete problem, although related studies have attempted to solve the problem by Genetic Algorithm (SA), Simulated Annealing (SA) algorithm and some other methods, but they solve the problem at a limited scale and with unsatisfactory speed and solution quality. Therefore, a hybrid IT? algorithm combined with Uniform Design (UD), Ant Colony Optimization (ACO) and IT? algorithm was proposed to solve this problem, namely UDHIT?. UD was applied to choose the appropriate combination of parameters of the UDHIT? algorithm, the probabilistic graphic model of ACO was used to generate feasible solutions, and the drift operator and volatility operator of IT? were used to optimize the solutions. Experimental results show that the UDHIT? algorithm can demonstrate improvement over the traditional GA, ACO and IT? algorithm for the multi-scale CTSP problems in terms of best solution and average solution.

    Adaptive artificial fish swarm algorithm utilizing gene exchange
    Zongzheng LI, Kaiqing ZHOU, Yun OU, Lei DING
    2022, 42(3):  701-707.  DOI: 10.11772/j.issn.1001-9081.2021040775
    Asbtract ( )   HTML ( )   PDF (571KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Focusing on the unbalance issue between local optimization and global optimization and the inability to jump out of the local optimum of Artificial Fish Swarm Algorithm (AFSA), an Adaptive AFSA utilizing Gene Exchange (AAFSA-GE) was proposed. Firstly, an adaptive mechanism of view and step was utilized to enhance the search speed and accuracy. Then, chaotic behavior and gene exchange behavior were employed to improve the ability of jumping out of the local optimum and the search efficiency. Ten classic test functions were selected to prove the feasibility and robustness of the proposed algorithm by comparing it with the other three modified AFSAs, which are Normative Fish Swarm Algorithm (NFSA), FSA optimized by PSO algorithm with Extended Memory (PSOEM-FSA), and Comprehensive Improvement of Artificial Fish Swarm Algorithm (CIAFSA). Experimental results show that AAFSA-GE achieves better results in local and global search ability than those of PSOEM-FSA and CIAFSA,and better search efficiency and better global search ability than those of NSFA.

    Model agnostic meta learning algorithm based on Bayesian weight function
    Renjie XU, Baodi LIU, Kai ZHANG, Weifeng LIU
    2022, 42(3):  708-712.  DOI: 10.11772/j.issn.1001-9081.2021040758
    Asbtract ( )   HTML ( )   PDF (466KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    As a multi-task meta learning algorithm, Model Agnostic Meta Learning (MAML) can use different models and adapt quickly to different tasks, but it still needs to be improved in terms of training speed and accuracy. The principle of MAML was analyzed from the perspective of Gaussian stochastic process, and a new Model Agnostic Meta Learning algorithm based on Bayesian Weight function (BW-MAML) was proposed, in which the weight was assigned by Bayesian analysis. In the training process of BW-MAML, each sampling task was regarded as following a Gaussian distribution, and the importance of the task was determined according to the probability of the task in the distribution, and then the weight was assigned according to the importance, thus improving the utilization of information in each gradient descent. The small sample image learning experimental results on Omniglot and Mini-ImageNet datasets show that by adding Bayesian weight function, for training effect of BW-MAML after 2500 step with 6 tasks, the accuracy of BW-MAML is at most 1.9 percentage points higher than that of MAML, and the final accuracy is 0.907 percentage points higher than that of MAML on Mini-ImageNet averagely; the accuracy of BW-MAML on Omniglot is also improved by up to 0.199 percentage points averagely.

    Clustering based on discrete hashing
    Shuting XUAN, Jinglei LIU
    2022, 42(3):  713-723.  DOI: 10.11772/j.issn.1001-9081.2021040911
    Asbtract ( )   HTML ( )   PDF (1072KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The traditional clustering methods are carried out in the data space, and clustered data is high-dimensional. In order to solve these two problems, a new binary image clustering method, Clustering based on Discrete Hashing (CDH), was proposed. To reduce the dimension of data, L21?norm was used in this framework to realize adaptive feature selection. At the same time, the data was mapped into binary Hamming space by the hashing method. Then, the sparse binary matrix was decomposed into a low-rank matrix in the Hamming space to complete fast image clustering. Finally, an optimization scheme that could converge quickly was used to solve the objective function. Experimental results on image datasets (Caltech101, Yale, COIL20, ORL) show that this method can effectively improve the efficiency of clustering. Compared with the traditional clustering methods,such as K-means and Spectral Clustering (SC),the time efficiency of CDH was improved by 87 and 98 percentage points respectively in the Gabor view of the Caltech101 dataset when processing high-dimensional data.

    Data stream preference query based on extraction sequence according to temporal condition
    Runze LI, Xuejiao SUN
    2022, 42(3):  724-730.  DOI: 10.11772/j.issn.1001-9081.2021040786
    Asbtract ( )   HTML ( )   PDF (635KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Traditional research on preference reasoning and preference query mainly focuses on the preference of a single object represented by a relational tuple. However, it is a challenge to extend the method of temporal conditional preference query to the extraction sequence of data stream. The problems encountered mainly include the extraction of sequences in data stream and the rapid processing to obtain the dominant sequences and dominant objects. According to the preference data stream, firstly, the Continuous Query Language (CQL) was extended and a special query language named StreamSeq was proposed to deal with the temporal conditional preference on the data stream effectively, which allows the temporal conditional preference specification and reasoning of the sequences extracted from the data stream. Then, an algorithm for extracting object sequences according to temporal index from data stream and an algorithm for performing dominant comparison between sequences were designed, and the dominant sequences satisfying preference condition were returned according to the input data stream. Finally, two sets of data were used for experimental verification. On the synthetic data set, when the number of generated attributes, sequence number, time range and time sliding interval were 10, 8, 20 s and 1 s, the running time acceleration ratio of sequence extraction algorithm and CQL equivalent algorithm was 13.33; on the real data set; when the time range and time sliding interval were 40 s and 1 s, the running time acceleration ratios of the dominant contrast algorithm to mintopK, partition, and incpartition were 10.77, 6.46 and 5.69. Experimental results show that compared with other preference query algorithms, the proposed method needs less running time and is more efficient in getting results.

    Audio visual joint action recognition based on key frame selection network
    Tingxiu CHEN, Jianqin YIN
    2022, 42(3):  731-735.  DOI: 10.11772/j.issn.1001-9081.2021060995
    Asbtract ( )   HTML ( )   PDF (771KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In recent years, the action recognition of audio visual joint learning has received some attention. Whether in video (visual modality) or audio (auditory modality), the occurrence of action is instantaneous, only the information in the time period of action can significantly express the action category. How to make better use of the significant expression information carried by the key frames of audio-visual modality is one of the problems to be solved in audio-visual action recognition. According to this problem, a key frame screening network KFIA-S was proposed. Though the linear temporal attention mechanism based on the full connected layer, different weights were given to the audio-visual information at different times, so as to screen the audio-visual features beneficial to video classification, reduce redundant information, suppress background interference information, and improve the accuracy of action recognition. The effect of different intensity of time attention on action recognition was studied. The experiments on ActivityNet dataset show that KFIA-S network achieves the SOTA (State-Of-The-Art) recognition accuracy, which proves the effectiveness of the proposed method.

    Multi-person classroom action recognition in classroom teaching videos based on deep spatiotemporal residual convolution neural network
    Yongkang HUANG, Meiyu LIANG, Xiaoxiao WANG, Zheng CHEN, Xiaowen CAO
    2022, 42(3):  736-742.  DOI: 10.11772/j.issn.1001-9081.2021040845
    Asbtract ( )   HTML ( )   PDF (2130KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In view of the problems that classroom teaching scene is obscured seriously and has numerous students, the current video action recognition algorithm is not suitable for classroom teaching scene, and there is no public dataset of student classroom action, a classroom teaching video library and a student classroom action library were constructed, and a real-time multi-person student classroom action recognition algorithm based on deep spatiotemporal residual convolution neural network was proposed. Firstly, combined with real-time object detection and tracking to get the real-time picture stream of each student, and then the deep spatiotemporal residual convolution neural network was used to learn the spatiotemporal characteristics of each student’s action, so as to realize the real-time recognition of classroom behavior for multiple students in classroom teaching scenes. In addition, an intelligent teaching evaluation model was constructed, and an intelligent teaching evaluation system based on the recognition of students’ classroom actions was designed and implemented, which can help improve the teaching quality and realize the intelligent education. By making experimental comparison and analysis on the classroom teaching video dataset, it is verified that the proposed real-time classroom action recognition model for multiple students in classroom teaching video can achieve high accuracy of 88.5%, and the intelligent teaching evaluation system based on classroom action recognition has also achieved good results in classroom teaching video dataset.

    Student expression recognition and intelligent teaching evaluation in classroom teaching videos based on deep attention network
    Wanying YU, Meiyu LIANG, Xiaoxiao WANG, Zheng CHEN, Xiaowen CAO
    2022, 42(3):  743-749.  DOI: 10.11772/j.issn.1001-9081.2021040846
    Asbtract ( )   HTML ( )   PDF (746KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In order to solve the occlusion problem of student expression recognition in complex classroom scenes, and give full play to the advantages of deep learning in the application of intelligent teaching evaluation,a student expression recognition model and an intelligent teaching evaluation algorithm based on deep attention network in classroom teaching videos were proposed. A video library, an expression library and a behavior library for classroom teaching were constructed, then, multi-channel facial images were generated by cropping and occlusion strategies. A multi-channel deep attention network was built and self-attention mechanism was used to assign different weights to multiple channel networks. The weight distribution of each channel was restricted by a constrained loss function, then the global feature of the facial image was expressed as the quotient of the sum of the product of the feature times its attention weight of each channel divided by the sum of the attention weights of all channels. Based on the learned global facial feature, the student expressions in classroom were classified, and the student facial expression recognition under occlusion was realized. An intelligent teaching evaluation algorithm that integrates the student facial expressions and behavior states in classroom was proposed, which realized the recognition of student facial expressions and intelligent teaching evaluation in classroom teaching videos. By making experimental comparison and analysis on the public dataset FERplus and self-built classroom teaching video datasets, it is verified that the student facial expressions recognition model in classroom teaching videos achieves high accuracy of 87.34%, and the intelligent teaching evaluation algorithm that integrates the student facial expressions and behavior states in classroom achieves excellent performance on the classroom teaching video dataset.

    EE-GAN:facial expression recognition method based on generative adversarial network and network integration
    Dingkang YANG, Shuai HUANG, Shunli WANG, Peng ZHAI, Yidan LI, Lihua ZHANG
    2022, 42(3):  750-756.  DOI: 10.11772/j.issn.1001-9081.2021040807
    Asbtract ( )   HTML ( )   PDF (1422KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Because there are many differences in real life scenes, human emotions are various in different scenes, which leads to an uneven distribution of labels in the emotion dataset. Furthermore, most traditional methods utilize model pre-training and feature engineering to enhance the expression ability of expression-related features, but do not consider the complementarity between different feature representations, which limits the generalization and robustness of the model. To address these issues, EE-GAN, an end-to-end deep learning framework including the network integration model Ens-Net was proposed. It took the characteristics of different depths and regions into consideration,the fusion of different semantic and different level features was implemented, and network integration was used to improve the learning ability of the model. Besides, facial images with specific expression labels were generated by generative adversarial network, which aimed to balance the distribution of expression labels in data augmentation. The qualitative and quantitative evaluations on CK+, FER2013 and JAFFE datasets demonstrate the effectiveness of proposed method. Compared with existing view learning methods, including Locality Preserving Projections (LPP), EE-GAN achieves the facial expression accuracies of 82.1%, 84.8% and 91.5% on the three datasets respectively. Compared with traditional CNN models such as AlexNet, VGG, and ResNet, EE-GAN achieves the accuracy increased by at least 9 percentage points.

    Indoor fall detection algorithm based on Res2Net-YOLACT and fusion feature
    Lu ZHANG, Chun FANG, Ming ZHU
    2022, 42(3):  757-763.  DOI: 10.11772/j.issn.1001-9081.2021040857
    Asbtract ( )   HTML ( )   PDF (1061KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In order to strengthen the monitoring of old people and reduce the safety risks caused by falls, a new indoor fall detection algorithm based on Res2Net-YOLACT and fusion feature was proposed. For the video image sequences, firstly, the YOLACT network integrated with Res2Net module was used to extract the human body contour, and then a two-level judgment method was used to make a fall decision. In the first level, whether an abnormal state occurs was judged roughly through the movement speed feature, and in the second level, the human body posture was determined through the model structure that combines the body shape features and the depth feature. Finally, when fall posture was detected and the occurrence time was greater than the threshold, a fall alarm was given. Experimental results show that the proposed fall detection algorithm can extract the human body contour well in complex scenes, which has good robustness to illumination as well as a real-time performance of up to 28 fps (frames per second). In addition, the classification performance of the algorithm after adding manual features is better, the classification accuracy is 98.65%, which is 1.03 percentage points higher than that of the algorithm with original CNN (Convolutional Neural Network) features.

    One-shot video-based person re-identification with multi-loss learning and joint metric
    Yuchang YIN, Hongyuan WANG, Li CHEN, Zundeng FENG, Yu XIAO
    2022, 42(3):  764-769.  DOI: 10.11772/j.issn.1001-9081.2021040788
    Asbtract ( )   HTML ( )   PDF (710KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In order to solve the problem of huge labeling cost for person re-identification, a method of one-shot video-based person re-identification with multi-loss learning and joint metric was proposed. Aiming at the problem that the number of label samples is small and the model obtained is not robust enough, a Multi-Loss Learning (MLL) strategy was proposed. In each training process, different loss functions were used for different data to optimize and improve the discriminative ability of the model. Secondly, a Joint Distance Metric (JDM) was proposed for label estimation, which combined the sample distance and the nearest neighbor distance to further improve the accuracy of pseudo label prediction. JDM solved the problems of the low accuracy of label estimation for unlabeled data, and the instability in the training process caused by the unlabeled data not fully utilized. Experimental results show that compared with the one-shot progressive learning method PL (Progressive Learning), the rank-1 accuracy reaches 65.5% and 76.2% on MARS and DukeMTMC-VideoReID datasets when the ratio of pseudo label samples added per iteration is 0.10, with the improvement of the proposed method of 7.6 and 5.2 percentage points, respectively.

    Analysis of complex spam filtering algorithm based on neural network
    Jian ZHANG, Ke YAN, Xiang MA
    2022, 42(3):  770-777.  DOI: 10.11772/j.issn.1001-9081.2021040791
    Asbtract ( )   HTML ( )   PDF (610KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The recognition of spam is one of the main tasks in natural language processing. The traditional methods are based on text features or word frequency, which recognition accuracies mainly depend on the presence or absence of specific keywords. When there are no keywords or errors in recognizing keywords in the spam, the traditional methods have poor recognition performance. Neural network-based methods were proposed. Recognition training and testing were conducted on complex spam. The spams that cannot be recognized by traditional methods were collected and the same amount of normal information was randomly selected from spam messages, advertisement and spam email datasets to form three new datasets without duplicate data. Three models were proposed based on convolutional neural network and recurrent neural network and tested on three new datasets for spam recognition. The experimental results show that the neural network-based models learned better semantic features from the text and achieved the accuracies of more than 98% on all three datasets, which are significantly higher than those of the traditional methods, such as Naive Bayes (NB), Random Forest (RF) and Support Vector Machine (SVM). The experimental results also show that different neural networks are suitable for text classification with different lengths. The models composed of recurrent neural networks are good at recognizing text with sentence length, the models composed of convolutional neural networks are good at recognizing text with paragraph length, and the models composed of both neural networks are good at recognizing text with chapter length.

    Modeling and optimization method of ride-sharing matching based on E-CARGO model
    Xiaohui LI, Hongbin DONG
    2022, 42(3):  778-782.  DOI: 10.11772/j.issn.1001-9081.2021060983
    Asbtract ( )   HTML ( )   PDF (574KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Ride-sharing application systems can reduce traffic congestion and alleviate parking space tension by increasing the utilization rate of car available seat capacity, thus improving social and environmental benefits. The effective real-time matching and optimization technology of drivers and passengers is one of the core components for a successful ride-sharing system. Role-Based Collaboration (RBC) is an emerging methodology to facilitate an organizational structure, provide orderly system behavior, and coordinate the activities within the system. In order to reduce the dynamic real-time matching time of passengers and drivers, and improve the matching efficiency, a method combining RBC and Environment-Class, Agent, Role, Group and Object (E-CARGO) model was proposed to formalize ride sharing problem. To improve the utilization rate of available seat capacity, maximize platform revenue, and rationalize resource allocation with constraints of entire resource capacity and given profit, the modeling and simulation experiments for ride-sharing matching method were conducted. The experimental results show that the proposed formal method based on E-CARGO model can be applied to the modeling of ride-sharing matching problem, and the optimal matching matrix and time can be obtained by Kuhn-Munkres (K-M) algorithm and ILOG software package in Java. The simulation results show that the average time of K-M algorithm is reduced by 21% at least compared to ILOG software package algorithm. When the agent size is larger than a certain value (more than 600), the time consumption of the proposed algorithm increases sharply.

    Gene data generation method based on generative adversarial network
    Yimin CAO, Lei CAI, Jingyang GAO
    2022, 42(3):  783-790.  DOI: 10.11772/j.issn.1001-9081.2021040759
    Asbtract ( )   HTML ( )   PDF (1786KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In deep learning, as the depth of Convolutional Neural Network (CNN) increases, more and more data is required for neural network training, but gene structure variation is a small sample event in large-scale genetic data, resulting in a very shortage of image data of variant genes, which seriously affects the training effect of CNN and causes the problems of poor gene structure variation detection precision and high false positive rate. In order to increase the number of gene structure variation samples and improve the precision of CNN to identify gene structure variation, a gene image data augmentation method was proposed based on GAN (Generative Adversarial Network), namely GeneGAN. Firstly, initial genetic image data was generated by using the Reads stacking method and it was divided into two datasets including variant gene images and non-variant gene images. Secondly, GeneGAN was used to augment the variant image samples to balance the positive and negative datasets. Finally, CNN was used to detect the datasets before and after augmentation, and precision, recall and F1 score were used as measurement indicators. Experimental results show that compared with tradional augmentation method, GAN based augmentation method and feature extraction method, the F1 score of GeneGAN is improved by 1.94 to 17.46 percentage points, verifying that GeneGAN method can improve the precision of CNN to identify gene structure variation.

    Prediction of NOx emission from fluid catalytic cracking unit based on ensemble empirical mode decomposition and long short-term memory network
    Chong CHEN, Zhu YAN, Jixuan ZHAO, Wei HE, Huaqing LIANG
    2022, 42(3):  791-796.  DOI: 10.11772/j.issn.1001-9081.2021040787
    Asbtract ( )   HTML ( )   PDF (1269KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Nitrogen oxide (NOx) is one of the main pollutants in the regenerated flue gas of Fluid Catalytic Cracking (FCC) unit. Accurate prediction of NOx emission can effectively avoid the occurrence of pollution events in refinery enterprises. Because of the non-stationarity, nonlinearity and long-memory characteristics of pollutant emission data, a new hybrid model incorporating Ensemble Empirical Mode Decomposition (EEMD) and Long Short-Term Memory network (LSTM) was proposed to improve the prediction accuracy of pollutant emission concentration. The NOx emission concentration data was first decomposed into several Intrinsic Mode Functions (IMFs) and a residual by using the EEMD model. According to the correlation analysis between the IMF sub-sequences and the original data, the IMF sub-sequences with low correlation were eliminated, which could effectively reduce the noise in the original data. The IMFs could be divided into high and low frequency sequences, which were respectively trained in the LSTM networks with different depths. The final NOx concentration prediction results were reconstructed by the predicted results of each sub-sequences. Compared with the performance of LSTM in the NOx emission prediction of FCC unit, the Mean Square Error (MSE), Mean Absolute Error (MAE) were reduced by 46.7%, 45.9%,and determination coefficient (R2) of EEMD-LSTM was improved by 43% respectively, which means the proposed model achieves higher prediction accuracy.

    Stock trend prediction method based on temporal hypergraph convolutional neural network
    Xiaojie LI, Chaoran CUI, Guangle SONG, Yaxi SU, Tianze WU, Chunyun ZHANG
    2022, 42(3):  797-803.  DOI: 10.11772/j.issn.1001-9081.2021050748
    Asbtract ( )   HTML ( )   PDF (742KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Traditional stock prediction methods are mostly based on time-series models, which ignore the complex relations among stocks, and the relations often exceed pairwise connections, such as stocks in the same industry or multiple stocks held by the same fund. To solve this problem, a stock trend prediction method based on temporal HyperGraph Convolutional neural Network (HGCN) was proposed, and a hypergraph model based on financial investment facts was constructed to fit multiple relations among stocks. The model was composed of two major components: Gated Recurrent Unit (GRU) network and HGCN. GRU network was used for performing time-series modeling on historical data to capture long-term dependencies. HGCN was used to model high-order relations among stocks to learn intrinsic relation attributes, and introduce the multiple relation information among stocks into traditional time-series modeling for end-to-end trend prediction. Experiments on real dataset of China A-share market show that compared with existing stock prediction methods, the proposed model improves prediction performance, e.g. compared with the GRU network, the proposed model achieves the relative increases in ACC and F1_score of 9.74% and 8.13%, respectively, and is more stable. In addition, the simulation back-testing results show that the trading strategy based on the proposed model is more profitable, with an annual return of 11.30%, which is 5 percentage points higher than that of Long Short-Term Memory (LSTM) network.

    Infrared monocular ranging algorithm based on multiscale feature fusion
    Bin LIU, Gangqing LI, Chengquan AN, Shuigen WANG, Jiansheng WANG
    2022, 42(3):  804-809.  DOI: 10.11772/j.issn.1001-9081.2021040912
    Asbtract ( )   HTML ( )   PDF (1946KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Due to the introduction of MonoDepth2, unsupervised monocular ranging has made great progress in the field of visible light. However, visible light is not applicable in some scenes, such as at night and in some low-visibility environments. Infrared thermal imaging can obtain clear target images at night and under low-visibility conditions, so it is necessary to estimate the depth of infrared image. However, due to the different characteristics of visible and infrared images, it is unreasonable to migrate existing monocular depth estimation algorithms directly to infrared images. An infrared monocular ranging algorithm based on multiscale feature fusion after improving the MonoDepth2 algorithm can solve this problem. A new loss function, edge loss function, was designed for the low texture characteristic of infrared image to reduce pixel mismatch during image reprojection. The previous unsupervised monocular ranging simply upsamples the four-scale depth maps to the original image resolution to calculate projection errors, ignoring the correlation between scales and the contribution differences between different scales. A weighted Bi-directional Feature Pyramid Network (BiFPN) was applied to feature fusion of multiscale depth maps so that the blurring of depth map edge was solved. In addition, Residual Network (ResNet) structure was replaced by Cross Stage Partial Network (CSPNet) to reduce network complexity and increase operation speed. The experimental results show that edge loss is more suitable for infrared image ranging, resulting in better depth map quality. After adding BiFPN structure, the edge of depth image is clearer. After replacing ResNet with CSPNet, the inference speed is improved by about 20 percentage points. The proposed algorithm can accurately estimate the depth of the infrared image, solving the problem of depth estimation in night low-light scenes and some low-visibility scenes, and the application of this algorithm can also reduce the cost of assisted driving to a certain extent.

    Logo recognition algorithm for vehicles on traffic road
    Ne LI, Guangzhu XU, Bangjun LEI, Guoliang MA, Yongtao SHI
    2022, 42(3):  810-817.  DOI: 10.11772/j.issn.1001-9081.2021040860
    Asbtract ( )   HTML ( )   PDF (7541KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In order to solve the problems of small targets, large noises, and many types in the logo recognition for vehicles on traffic road, a method combining a target detection algorithm based on deep learning and a template matching algorithm based on morphology was proposed, and a recognition system with high accuracy and capable of dealing with new types of vehicle logo was designed. First, K-Means++ was used to re-cluster the anchor box values and residual network was introduced into YOLOv4 for one-step positioning of the vehicle logo. Secondly, the binary vehicle logo template library was built by preprocessing and segmenting standard vehicle logo images. Then, the positioned vehicle logo was preprocessed by MSRCR (Multi-Scale Retinex with Color Restoration), OTSU binarization, etc. Finally, the Hamming distance was calculated between the processed vehicle logo and the standard vehicle logo in the template library and the best match was found. In the vehicle logo detection experiment, the improved YOLOv4 detection achieves the higher accuracy of 99.04% compared to the original YOLOv4, two-stage positioning method of vehicle logo based on license plate position and the vehicle logo positioning method based on radiator grid background; its speed is slightly lower than that of the original YOLOv4, higher than those of the other two, reaching 50.62 fps (frames per second). In the vehicle logo recognition experiment, the recognition accuracy based on morphological template matching is higher compared to traditional Histogram Of Oriented Gradients (HOG), Local Binary Pattern (LBP) and convolutional neural network, reaching 91.04%. Experimental results show that the vehicle logo detection algorithm based on deep learning has higher accuracy and faster speed. The morphological template matching method can maintain a high recognition accuracy under the conditions of light change and noise pollution.

    Multiscale residual UNet based on attention mechanism to realize breast cancer lesion segmentation
    Shengqin LUO, Jinyi CHEN, Hongjun LI
    2022, 42(3):  818-824.  DOI: 10.11772/j.issn.1001-9081.2021040948
    Asbtract ( )   HTML ( )   PDF (1860KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Concerning the characteristics of breast cancer in Magnetic Resonance Imaging (MRI), such as different shapes and sizes, and fuzzy boundaries, an algorithm based on multiscale residual U Network (UNet) with attention mechanism was proposed in order to avoid error segmentation and improve segmentation accuracy. Firstly, the multiscale residual units were used to replace two adjacent convolution blocks in the down-sampling process of UNet, so that the network could pay more attention to the difference of shape and size. Then, in the up-sampling stage, layer-crossed attention was used to guide the network to focus on the key regions, avoiding the error segmentation of healthy tissues. Finally, in order to enhance the ability of representing the lesions, the atrous spatial pyramid pooling was introduced as a bridging module to the network. Compared with UNet, the proposed algorithm improved the Dice coefficient, Intersection over Union (IoU), SPecificity (SP) and ACCuracy (ACC) by 2.26, 2.11, 4.16 and 0.05 percentage points, respectively. The experimental results show that the algorithm can improve the segmentation accuracy of lesions and effectively reduce the false positive rate of imaging diagnosis.

    Fundus vessel segmentation method based on U-Net and pulse coupled neural network with adaptive threshold
    Guangzhu XU, Wenjie LIN, Sha CHEN, Wan KUANG, Bangjun LEI, Jun ZHOU
    2022, 42(3):  825-832.  DOI: 10.11772/j.issn.1001-9081.2021040856
    Asbtract ( )   HTML ( )   PDF (1357KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Due to the complex and variable structure of fundus vessels, and the low contrast between the fundus vessel and the background, there are huge difficulties in segmentation of fundus vessels, especially small fundus vessels. U-Net based on deep fully convolutional neural network can effectively extract the global and local information of fundus vessel images,but its output is grayscale image binarized by a hard threshold, which will cause the loss of vessel area, too thin vessel and other problems. To solve these problems, U-Net and Pulse Coupled Neural Network (PCNN) were combined to give play to their respective advantages and design a fundus vessel segmentation method. First, the iterative U-Net model was used to highlight the vessels, the fusion results of the features extracted by the U-Net model and the original image were input again into the improved U-Net model to enhance the vessel image. Then, the U-Net output result was viewed as a gray image, and the PCNN with adaptive threshold was utilized to perform accurate vessel segmentation. The experimental results show that the AUC (Area Under the Curve) of the proposed method was 0.979 6,0.980 9 and 0.982 7 on the DRVIE, STARE and CHASE_DB1 datasets, respectively. The method can extract more vessel details, and has strong generalization ability and good application prospects.

    Artificial intelligence
    Object tracking algorithm with hierarchical features and hybrid attention
    Wenqiu ZHU, Guang ZOU, Zhigao ZENG
    2022, 42(3):  833-843.  DOI: 10.11772/j.issn.1001-9081.2021030432
    Asbtract ( )   HTML ( )   PDF (9505KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In object tracking tasks, Fully-Convolutional Siamese network for object tracking (SiamFC) algorithm has problems such as poor robustness and loss of tracking objects under the scenes of object occlusion and illumination variation. Therefore, an object tracking algorithm combining attention mechanism and feature fusion was proposed. Firstly, ResNet50 (Deep Residual Network) was used as the backbone network to extract more adequate object features. Secondly, attention mechanism was used to filter features. After low-level template features and high-level template features were correlated with the corresponding search features, the adaptive weighted fusion was carried out to improve the discrimination of positive and negative samples. Tested on the OTB100 (Object Tracking Benchmark) dataset, the proposed algorithm had the precision and success rate of 81.25% and 64.06%. Tested on the LaSOT (high-quality benchmark for Large-scale Single Object Tracking) dataset, the proposed algorithm had the precision and success rate of 49.4% and 50.1%. Experimental results show that the object tracking performance of the proposed algorithm is better than that of the fully convolutional Siamese network algorithm, and it has better robustness when dealing with complex scenes.

    Semantic segmentation of RGB-D indoor scenes based on attention mechanism and pyramid fusion
    Na YU, Yan LIU, Xiongju WEI, Yuan WAN
    2022, 42(3):  844-853.  DOI: 10.11772/j.issn.1001-9081.2021030392
    Asbtract ( )   HTML ( )   PDF (1447KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the issue of ineffective fusion of multi-modal features of indoor scene semantic segmentation using RGB-D, a network named APFNet (Attention mechanism and Pyramid Fusion Network) was proposed, in which attention mechanism fusion module and pyramid fusion module were designed. To fully use the complementarity of the RGB features and the Depth features, the attention allocation weights of these two kinds of features were respectively extracted by the attention mechanism fusion module, making the network focus more on the multi-modal feature domain with more information content. Local and global information were fused by pyramid fusion module with four different scales of pyramid features, thus scene context was extracted and segmentation accuracies of object edges and small-scale objects were improved. By integrating these two fusion modules into a three-branch “encoder-decoder” network, an “end-to-end” output was realized. Comarative experiments were implemented with the state-of-the-art methods, such as multi-level RGB-D residual feature Fusion network (RDF-152), Attention Complementary features Network (ACNet) and Spatial information Guided convolution Network (SGNet) on the SUN RGB-D and NYU Depth v2 datasets. Compared with the best-performing method RDF-152, when the layer number of the encoder network was reduced from 152 to 50, the Pixel Accuracy (PA), Mean Pixel Accuracy (MPA), and Mean Intersection over Union (MIoU) of APFNet were respectively increased by 0.4, 1.1 and 3.2 percentage points. The semantic segmentation accuracies for small-scale objects such as pillows and photos, and large-scale objects such as boards and ceilings were increased by 0.9 to 3.4 and 12.4 to 18 percentage points respectively. The results show that the proposed APFNet has some advantages in dealing with the semantic segmentation of indoor scenes.

    Cross-modal chiastopic-fusion attention network for visual question answering
    Mao WANG, Yaxiong PENG, Anjiang LU
    2022, 42(3):  854-859.  DOI: 10.11772/j.issn.1001-9081.2021030470
    Asbtract ( )   HTML ( )   PDF (759KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In order to improve the accuracy of Visual Question Answering (VQA) model in answering complex image questions, a Cross-modal Chiastopic-fusion Attention Network (CCAN) for VQA was proposed. Firstly, an improved residual channel self-attention method was proposed to pay attention to the image, and to find important areas according to overall information of the image, thereby introduced a new joint attention mechanism that combined word attention and image area attention; secondly, a “cross-modal chiastopic-fusion” network was proposed to generate multiple features to integrate the two dynamic information flows together, and an effective attention flow was generated in each modal. Among them, element-wise multiplication method was used for joint features. In addition, in order to avoid an increase in computational cost, parameters were shared between networks. Experimental results on VQA v1.0 dataset show that the accuracy of the proposed model reaches 67.57%, which is 2.97 percentage points higher than that of MLAN (Multi-level Attention Network) model, 1.20 percentage points higher than that of CAQT (Co-Attention network with Question Type) model. The proposed method effectively improves the accuracy of visual question answering model. The effectiveness and robustness of the method are verified.

    Chinese grammatical error correction model based on bidirectional and auto-regressive transformers noiser
    Qiujie SUN, Jinggui LIANG, Si LI
    2022, 42(3):  860-866.  DOI: 10.11772/j.issn.1001-9081.2021030441
    Asbtract ( )   HTML ( )   PDF (625KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Methods based on neural machine translation are widely used in Chinese grammatical error correction. These methods require a large amount of annotation data to guarantee the performance, which is difficult to obtain in Chinese grammatical error correction. Focused on the issue that the limited size of annotation data constrains Chinese grammatical error correction system’s performance, a Chinese Grammatical Error Correction Model based on Bidirectional and Auto-Regressive Transformers (BART) Noiser (BN-CGECM) was proposed. Firstly, to speed up model convergence, Chinese pretraining language model based on BERT (Bidirectional Encoder Representation from Transformers) was used to initialize the parameters of BN-CGECM’s encoder. Secondly, a BART noiser was used to introduce text noise to the input samples in the training process to automatically generate diverse noisy data, which was used to alleviate the problem of limited size of annotation data. Experimental results on NLPCC 2018 dataset demonstrate that the proposed model achieves F0.5 by 7.14 percentage points higher than that of the Chinese grammatical error correction system proposed by YouDao, and 6.48 percentage points higher than that of the Chinese grammatical error correction ensemble system proposed by Beijing Language and Culture University (BLCU_ensemble). Meanwhile, the proposed model enhances the diversity of the original data and converges faster without increasing the amount of training data.

    Method of generating rhetorical questions based on deep neural network in intelligent consultation
    Zengzhen DU, Dongxin TANG, Dan XIE
    2022, 42(3):  867-873.  DOI: 10.11772/j.issn.1001-9081.2021030375
    Asbtract ( )   HTML ( )   PDF (758KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In order to improve the efficiency of doctor-patient dialogue by enabling doctors to quickly propose reasonable rhetorical questions in intelligent consultation, a method of rhetorical question generation based on deep neural network was proposed. Firstly, a large number of doctor-patient dialogue texts were obtained and labeled. Then, two classification models, Text Recurrent Neural Network (TextRNN) and Text Convolutional Neural Network (TextCNN), were used to classify doctor’s statements respectively. Then, Text Recurrent Neural Network-Bidirectional Long Short-Term Memory (TextRNN-B) and Bidirectional Encoder Representations from Transformers (BERT) classification models were used to trigger questions. Six different Q&A selection methods were designed to simulate the situations in the field of medical consultation. Then, Open-Source Neural Machine Translation (OpenNMT) model was used to generate rhetorical questions. Finally, the generated rhetorical questions were evaluated comprehensively. Experimental results show that TextRNN is better than TextCNN in classification, and BERT model is better than TextRNN-B in question triggering; when OpenNMT model is used to realize rhetorical question generation in Window-top mode, the best results are obtained by using two evaluation indexes: Bilingual Evaluation Understudy (BLEU) and Perplexity (PPL). The proposed method verifies the effectiveness of deep neural network technology in the generation of rhetorical questions, which can effectively solve the problem of doctor-patient question generation.

    Teaching and learning information interactive particle swarm optimization algorithm
    Fangxin NIE, Yujia WANG, Xin JIA
    2022, 42(3):  874-882.  DOI: 10.11772/j.issn.1001-9081.2021030395
    Asbtract ( )   HTML ( )   PDF (1395KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    An information interactive Particle Swarm Optimization (PSO) algorithm for teaching and learning was proposed to solve high dimensional problems of low convergence rate and lack of diversity in a single population. The population was divided into two subpopulations dynamically according to evolutionary process, and processed by PSO algorithm and teaching and learning based optimization algorithm respectively. At the same time, learner stage was used by the particles to carry out information interaction between subpopulations, and by evaluating convergence and diversity indexes, the convergence ability and diversity of particles were balanced in evolutionary process. Compared with PSO algorithm, hybrid PSO and Grey Wolf Optimizer (GWO) algorithm, and improved GWO algorithm using nonlinear convergence factor and elite re-election strategy and other evolutionary algorithms in different dimensions of 15 standard test functions, the proposed algorithm can converge to the theoretical optimal value on multiple test functions, which is 1 to 6 times faster than other algorithms. Experimental results show that the proposed algorithm has good convergence accuracy and speed.

    Road vehicle detection and recognition algorithm based on densely connected convolutional neural network
    Tianmin DENG, Guotao MAO, Zhenhao ZHOU, Zhijian DUAN
    2022, 42(3):  883-889.  DOI: 10.11772/j.issn.1001-9081.2021030384
    Asbtract ( )   HTML ( )   PDF (1354KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Regarding to the problems of low detection accuracy, poor real-time performance, and missed detection of small target vehicles in existing road vehicle detection and recognition algorithms, a road vehicle detection and recognition algorithm based on densely connected convolutional neural networks was proposed. Firstly, Based on YOLOv4 (You Only Look Once version 4) network framework, by adopting the densely connected deep residual network structure, the feature reuse in the feature extraction stage was strengthened to realize the use of features with lower complexity on shallow layers. Then, a jump connection structure was integrated to the multi-scale feature fusion network to strengthen the feature information fusion and expression capability of the network, which reduced the missed detection rate of vehicles. Finally, the dimensional clustering algorithm was used to recalculate the anchor sizes, which were allocated to different detection scales according to a reasonable strategy. Experimental results show that the proposed algorithm achieves the detection accuracy of 98.21% and the detection speed of 48.05 frame/s on KITTI dataset, and it also has a good detection effect for vehicles in the complex and harsh environment of Berkeley DeepDrive (BDD100K) dataset, ensuring required real-time performance and effective accuracy improvement.

    Data science and technology
    Route discovery method based on trajectory point clustering
    Haiyang LIU, Linghang MENG, Zhonghang LIN, Yuantao GU
    2022, 42(3):  890-894.  DOI: 10.11772/j.issn.1001-9081.2021030425
    Asbtract ( )   HTML ( )   PDF (1771KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To strengthen the control and management of local airspace routes, a route discovery method based on trajectory point clustering was proposed. Firstly, for the simulation data generated according to the distribution characteristics of the real data, the pre-processing module was used to weaken and remove the noise of the trajectory data. Secondly, a route discovery method including outlier elimination, trajectory resampling, trajectory point clustering, clustering center correction, and connecting clustering centers was proposed to extract the routes. Finally, the result of route extraction was visualized and the proposed method was validated using civil aviation data. The experimental results on the simulated data show that the node coverage and the length coverage of the proposed method is 99% and 94% respectively, under the noise intensity of 0.1° and the buffer area of 30 km. Compared with the rasterization method, the proposed method has higher accuracy and can extract the routes more effectively, achieving the purpose of extracting the common routes of aircraft.

    Influence maximization algorithm based on directed acyclic graph in heterogeneous information networks
    Qingqing WU, Lihua ZHOU, Xuanyi CUN, Guowang DU, Yiting JIANG
    2022, 42(3):  895-903.  DOI: 10.11772/j.issn.1001-9081.2021020369
    Asbtract ( )   HTML ( )   PDF (894KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problem of Influence Maximization (IM) in heterogeneous information networks, an Influence Maximization algorithm (DAGIM) based on Directed Acyclic Graph (DAG) was proposed. Firstly, the influence of nodes was measured based on the DAG structure, and then the marginal gain strategy was used to select the nodes with the most influence. The DAG structure has strong expressive power, which not only describes the explicit relationship between different types of nodes, but also depicts the implicit relationship between nodes, and more completely retains the heterogeneous information of the network. Experimental results on three real datasets verify that the performance of the proposed DAGIM algorithm is better than those of Degree, PageRank, Local Directed Acyclic Graph (LDAG) and Meta-Path-based Information Entropy (MPIE) algorithms.

    Power data analysis based on financial technical indicators
    An YANG, Qun JIANG, Gang SUN, Jie YIN, Ying LIU
    2022, 42(3):  904-910.  DOI: 10.11772/j.issn.1001-9081.2021030447
    Asbtract ( )   HTML ( )   PDF (785KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Considering the lack of effective trend feature descriptors in existing methods, financial technical indicators such as Vertical Horizontal Filter (VHF) and Moving Average Convergence/Divergence (MACD) were introduced into power data analysis. An anomaly detection algorithm and a load forecasting algorithm using financial technical indicators were proposed. In the proposed anomaly detection algorithm, the thresholds of various financial technical indicators were determined based on statistics, and then the abnormal behaviors of user power consumption were detected using threshold detection. In the proposed load forecasting algorithm, 14 dimensional daily load characteristics related to financial technical indicators were extracted, and a Long Shot-Term Memory (LSTM) load forecasting model was built. Experimental results on industrial power data of Hangzhou City show that the proposed load forecasting algorithm reduces the Mean Absolute Percentage Error (MAPE) to 9.272%, which is lower than that of Autoregressive Integrated Moving Average (ARIMA), Prophet and Support Vector Machine (SVM) algorithms by 2.322, 24.175 and 1.310 percentage points, respectively. The results show that financial technical indicators can be effectively applied to power data analysis.

    Cyber security
    Revocable aggregate signature authentication scheme for vehicular ad hoc networks
    Jingwen WU, Xinchun YIN, Jianting NING
    2022, 42(3):  911-920.  DOI: 10.11772/j.issn.1001-9081.2021030428
    Asbtract ( )   HTML ( )   PDF (684KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In order to address problems concerning communication security and privacy preservation in Vehicular Ad hoc Network (VANET), a revocable aggregate signature authentication scheme for VANETs was proposed. For protecting user privacy and enhancing authentication efficiency, the proposed scheme utilized anonymous authentication, tamper-proof device and aggregate signature technique. For realizing vehicle revocation, the vehicle was required to generate signatures with member secret keys distributed by the Road-Side Unit (RSU). The RSU would check the vehicle identity when the vehicle entered its communication scope, and it would not distribute member secret keys to vehicles in the revoke list. Thus, malicious vehicles could not generate valid signatures. When the input traffic volume came up to 600 vehicles per hour for each entrance lane in the simulated intersection, the proposed scheme saved at least 33.77% of authentication overhead compared to certain schemes of the same kind. The outcome of simulation experiment shows that the proposed scheme is suitable for resource-limited VANET environment.

    Adversarial attack defense model with residual dense block self-attention mechanism and generative adversarial network
    Yuming ZHAO, Shenkai GU
    2022, 42(3):  921-929.  DOI: 10.11772/j.issn.1001-9081.2021030431
    Asbtract ( )   HTML ( )   PDF (804KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Neural network has outstanding performance on image classification tasks. However, it is vulnerable to adversarial examples generated by adding small perturbations, which makes it output incorrect classification results. The current defense methods have the problems of insufficient image feature extraction ability and less attention to the features of key areas of the image. To address these issues, a Defense model that fuses Residual Dense Block (RDB) Self-Attention mechanism and Generative Adversarial Network (GAN), namely RD-SA-DefGAN, was proposed. GAN was combined with Projected Gradient Descent (PGD) attacking algorithm. The adversarial samples generated by PGD attacking algorithm were input to the training sample set, and the training process of model was stabilized by conditional constraints. The model also introduced RDB and self-attention mechanism, fully extracted features from the image, and enhanced the contribution of features from the key areas of the image. Experimental results on CIFAR10, STL10, and ImageNet20 datasets show that RD-SA-DefGAN can effectively defend from adversarial attacks, and outperforms Adv.Training, Adv-BNN, and Rob-GAN methods on defending PGD adversarial attacks. Compared to the most similar algorithm Rob-GAN, RD-SA-DefGAN improved the defense success rate by 5.0 percentage points to 9.1 percentage points on affected images in CIFAR10 dataset, with the disturbance threshold ranged from 0.015 to 0.070.

    Multimedia computing and computer simulation
    Compressed sensing image reconstruction method fusing spatial location and structure information
    Leping LIN, Hongmin ZHOU, Ning OUYANG
    2022, 42(3):  930-937.  DOI: 10.11772/j.issn.1001-9081.2021030434
    Asbtract ( )   HTML ( )   PDF (2281KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problem of poor visual effects of block-based compressed sensing reconstructed images at low sampling rates, a compressed sensing image reconstruction method that fused Spatial Location and Structure Information (SLSI) was proposed. Firstly, observations were linearly mapped to obtain initial estimated values of image blocks. Then, based on block grouping reconstruction branch and whole image reconstruction branch, the spatial location information and structure information of the image were extracted, enhanced and fused. Finally, weighted strategy was used to fuse the outputs of the two branches to obtain final reconstructed whole image. In the block grouping reconstruction branch, reconstruction resources were allocated according to the data characteristics of the image blocks. In the whole image reconstruction branch, information exchange between adjacent image block pixels was mainly carried out through bilateral filtering and structural feature interaction module. Experimental results show that compared with compressed sensing reconstruction methods based on non-iterative Reconstruction Network (ReconNet) and Multi-scale Reconstruction neural Network with Non-Local constraint (NL-MRN), due to the combination of the image prior with strong autocorrelation between pixels, when sampling rate is 0.05, the average Peak Signal-to-Noise Ratio (PSNR) and Structural SIMilarity index (SSIM) of the proposed method on the test image data commonly used in the compressed sensing field increase 2.617 5 dB and 0.105 3 respectively, and the visual effects of reconstructed images are better.

    Medical MRI image super-resolution reconstruction based on multi-receptive field generative adversarial network
    Pengwei LIU, Yuan GAO, Pinle QIN, Zhe YIN, Lifang WANG
    2022, 42(3):  938-945.  DOI: 10.11772/j.issn.1001-9081.2021040629
    Asbtract ( )   HTML ( )   PDF (1135KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To solve the problems of image detail loss and unclear texture caused by interference factors such as noise, imaging technology and imaging principles in the medical Magnetic Resonance Imaging (MRI) process, a multi-receptive field generative adversarial network for medical MRI image super-resolution reconstruction was proposed. First, the multi-receptive field feature extraction block was used to obtain the global feature information of the image under different receptive fields. In order to avoid the loss of detailed texture due to too small or too large receptive fields, each set of features was divided into two groups, and one of which was used to feedback global feature information under different scales of receptive fields, and the other group was used to enrich the local detailed texture information of the next set of features; then, the multi-receptive field feature extraction block was used to construct feature fusion group, and spatial attention module was added to each feature fusion group to adequately obtain the spatial feature information of the image, reducing the loss of shallow and local features in the network, and achieving a more realistic degree in the details of the image. Secondly, the gradient map of the low-resolution image was converted into the gradient map of the high-resolution image to assist the reconstruction of the super-resolution image. Finally, the restored gradient map was integrated into the super-resolution branch to provide structural prior information for super-resolution reconstruction, which was helpful to generate high quality super-resolution images. The experimental results show that compared with the Structure-Preserving Super-Resolution with gradient guidance (SPSR) algorithm, the proposed algorithm improves the Peak Signal-to-Noise Ratio (PSNR) by 4.8%, 2.7% and 3.5% at ×2, ×3 and ×4 scales, respectively, and the reconstructed medical MRI images have richer texture details and more realistic visual effects.

    Video coding optimization algorithm based on rate-distortion characteristic
    Hongwei GUO, Xiangsuo FAN, Shuai LIU, Xiang WEI, Lingli ZHAO
    2022, 42(3):  946-952.  DOI: 10.11772/j.issn.1001-9081.2021030398
    Asbtract ( )   HTML ( )   PDF (780KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Rate-Distortion (R-D) optimization is a crucial technique in video encoders. However, the widely used independent R-D optimization is far from being global optimal. In order to further improve the compression performance of High Efficiency Video Coding (HEVC), a two-pass encoding algorithm combined with both R-D dependency and R-D characteristic was proposed. Firstly, the current frame was encoded with the original method in HEVC, and the number of bits consumed by the current frame and the R-D model parameters of each Coding Tree Unit (CTU) were obtained. Then, combined with time domain dependent rate distortion optimization, the optimal Lagrange multiplier and quantization parameter for each CTU were determined according to the information including current frame bit budget and R-D model parameters. Finally, the current frame was re-encoded, where each CTU had different optimization goal according to its Lagrange multiplier. Experimental results show that the proposed algorithm achieves significant rate-distortion performance improvement. Specifically, the proposed algorithm saves 3.5% and 3.8% bitrate at the same coding quality, compared with the original HEVC encoder, under the coding configurations of low-delay B and P frames.

    Improved 3D hand pose estimation network based on anchor
    Dejian WEI, Wenming WANG, Quanyu WANG, Haopan REN, Yanyan GAO, Zhi WANG
    2022, 42(3):  953-959.  DOI: 10.11772/j.issn.1001-9081.2021030427
    Asbtract ( )   HTML ( )   PDF (659KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In recent years, anchor-based 3D hand pose estimation methods are becoming popular, and Anchor-to-Joint (A2J) is one of the more representative methods. In A2J, anchor points are densely set on depth map, and neural network is used to predict offsets between anchor points and key points together with weights of anchor points; predicted offsets and weights are used to calculate the coordinates of key points in a weighted summation mode to reduce noise in network regression results. A2J methods are simple and effective, but they are sensitive to ill-suited network structure and prone to inaccurate regression due to loss function. Therefore, an improved network HigherA2J was proposed. Firstly, a single branch jointly predicted XY and Z offsets between anchors and key points to better utilize 3D characteristics of depth map; secondly, network branch structure was simplified to reduce network parameters; finally, the loss function for key point estimation was designed, combined with offset estimation loss, which improved the overall estimation accuracy effectively. Experimental results show the reductions in average hand pose estimation error of 0.32 mm, 0.35 mm and 0.10 mm compared to conventional A2J on three datasets NYU, ICVL and HANDS 2017 respectively.

    Weakly supervised action localization method based on attention mechanism
    Cong HU, Gang HUA
    2022, 42(3):  960-967.  DOI: 10.11772/j.issn.1001-9081.2021030372
    Asbtract ( )   HTML ( )   PDF (573KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problem that weakly supervised action localization method cannot locate action directly and the localization accuracy is not high, a weakly supervised action localization method based on attention mechanism was proposed, and an action localization model based on the pre-frame and post-frame information of action frame and the distinguishing function was designed and realized. The attention value generation model of Conditional Variational AutoEncoder (CVAE) was used to generate frame-level attention values as pseudo-frame-level labels; which CAVE was improved to obtain the frame-level attention value by adding the pre-frame and post-frame information of the action frame; to train and optimize pseudo-frame-level labels repeatedly, the optimization model for attention value based on distinguishing function was used. The experimental results conducted on THUMOS14 and ActivityNet1.2 datasets show that the action localization model based on the pre- and post-frame information of the action frame and the distinguishing function has better action localization effect and accuracy, which missing detection rate reduced by 11.7% compared with the model without the pre-frame and post-frame information of action frame; compared with AutoLoc, Weakly-supervised Temporal Activity Localization and Classification framework (W-TALC), 3C-Net and other weakly supervised action localization models, when Intersection over Union (IoU) value is set to 0.5, the mean Average Precision (mAP) value on THUMOS14 dataset is improved by more than 10.7%, and the mAP value on ActivityNet1.2 dataset is improved by more than 8.8%.

    Quality judgment of 3D face point cloud based on feature fusion
    Gong GAO, Hongyu YANG, Hong LIU
    2022, 42(3):  968-973.  DOI: 10.11772/j.issn.1001-9081.2021030414
    Asbtract ( )   HTML ( )   PDF (861KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    A Feature Fusion Network (FFN) was proposed to judge the quality of 3D face point cloud acquired by binocular structured light scanner. Firstly, the 3D point cloud was preprocessed to cut out the face area, and the image obtained from the point cloud and the corresponding 2D plane projection was used as the input. Secondly, Dynamic Graph Convolutional Neural Network (DGCNN) and ShuffleNet were trained for point cloud learning. Then, the middle layer features of the two network modules were extracted and fused to fine-tune the whole network. Finally, three full connected layers were used to realize the five-class classification of 3D face point cloud (excellent, ordinary, stripe, burr, deformation). The proposed FFN achieved the classification accuracy of 83.7%, which was 5.8% higher than that of ShufflNet and 2.2% higher than that of DGCNN. The experimental results show that the weighted fusion of two-dimensional image features and point cloud features can achieve the complementary effect between different features.

    Network and communications
    Ergodic rate analysis of cooperative multiple input multiple output ambient backscatter communication system
    Xin ZHENG, Suyue LI, Anhong WANG, Meiling LI, Sami MUHAIDAT, Aiping NING
    2022, 42(3):  974-979.  DOI: 10.11772/j.issn.1001-9081.2021020312
    Asbtract ( )   HTML ( )   PDF (755KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To solve the problems of large energy consumption and scarcity of spectrum resources in the traditional Internet of Things (IoT), a Multiple Input Multiple Output-Ambient Backscatter Communication (MIMO-AmBC) system model which is constructed by an ambient backscatter, a Cooperative Receiver (CRx) and ambient Radio Frequency (RF) source was proposed. First, the system model was analyzed by using the Parasitic Symbiotic Radio (PSR) scheme to derive the Signal-to-Noise Ratio (SNR). Secondly, the approximate expressions for the ergodic rates of the primary link and the backscatter link were derived, and the maximum expression for the ergodic rate of the backscatter link was obtained. Finally, the proposed system model was compared with the traditional cellular network and Commensal Symbiotic Radio (CSR) scheme. The experimental results verify the correctness of the theoretical derivation and give some meaningful conclusions:1) the backscatter link rate increases with the logarithm of the number of receiving antennas and has nothing to do with the number of transmitting antennas; 2) when the SNR is 10 dB, the sum rate of the PSR scheme is higher than those of the traditional scheme and the CSR scheme by 36.8% and 29.9% respectively. Although the primary link rate of the PSR scheme is 5.5% lower than that of the CSR scheme, the ergodic rate of the backscatter link is 7.7 times higher than that of the CSR scheme, which provides theoretical reference for choosing the AmBC symbiosis scheme for practical applications.

    Hybrid beamforming method with high spectral efficiency for unmanned aerial vehicle patrol system
    Xin LING, Minzheng LI
    2022, 42(3):  980-984.  DOI: 10.11772/j.issn.1001-9081.2021030445
    Asbtract ( )   HTML ( )   PDF (596KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    With the development of smart power grid, Unmanned Aerial Vehicles (UAVs) are more and more widely used for inspection of transmission lines. In order to effectively implement fault location and type judgment of transmission lines, UAVs are required to transmit videos and images with high resolution. Under the condition of limited bandwidth, it is necessary to improve the spectral efficiency of UAV return communication link as much as possible to meet the transmission rate requirements of high-resolution videos and images. A video image transmission communication method based on mesh network was proposed. By deploying the wireless access nodes on tower and building Mesh network, the communication devices carried by UAVs could communicate with the built Mesh network as the network nodes at any time. After capturing a video of the failure on transmission lines the video could be quickly transmitted to the data center by UAVs. For this purpose, the communication module of the patrol UAV was equipped with a large-scale antenna array, in millimeter wave frequency band a heuristic point-to-point directional hybrid beamforming method was adopted to improve the spectral efficiency of receiving communication link. The simulation results show that the performance of the proposed method is better than that of the Orthogonal Matching Pursuit (OMP) method and is closer to that of the fully digital beamforming method.

2024 Vol.44 No.9

Current Issue
Archive
Honorary Editor-in-Chief: ZHANG Jingzhong
Editor-in-Chief: XU Zongben
Associate Editor: SHEN Hengtao XIA Zhaohui
Domestic Post Distribution Code: 62-110
Foreign Distribution Code: M4616
Address:
No. 9, 4th Section of South Renmin Road, Chengdu 610041, China
Tel: 028-85224283-803
  028-85222239-803
Website: www.joca.cn
E-mail: bjb@joca.cn
WeChat
Join CCF