Journal of Computer Applications

Survey of sentiment analysis based on image and text fusion

MENG Xiangrui, YANG Wenzhong, WANG Ting

2021, 41(2): 307-317. DOI: 10.11772/j.issn.1001-9081.2020060923

Asbtract ( )

PDF (1277KB) ( )

References | Related Articles | Metrics

With the continuous improvement of information technology, the amount of image-text data with orientation on various social platforms is growing rapidly, and the sentiment analysis with image and text fusion is widely concerned. The single sentiment analysis method can no longer meet the demand of multi-modal data. Aiming at the technical problems of image and text sentiment feature extraction and fusion, firstly, the widely used image and text emotional analysis datasets were listed, and the extraction methods of text features and image features were introduced. Then, the current fusion modes of image features and text features were focused on and the problems existing in the process of image-text sentiment analysis were briefly described. Finally, the research directions of sentiment analysis in the future were summarized and prospected for. In order to have a deeper understanding of image-text fusion technology, literature research method was adopted to review the study of image-text sentiment analysis, which is helpful to compare the differences between different fusion methods and find more valuable research schemes.

Analysis of double-channel Chinese sentiment model integrating grammar rules

QIU Ningjia, WANG Xiaoxia, WANG Peng, WANG Yanchun

2021, 41(2): 318-323. DOI: 10.11772/j.issn.1001-9081.2020050723

Asbtract ( )

PDF (1093KB) ( )

References | Related Articles | Metrics

Concerning the problem that ignoring the grammar rules reduces the accuracy of classification when using Chinese text to perform sentiment analysis, a double-channel Chinese sentiment classification model integrating grammar rules was proposed, namely CB_Rule (grammar Rules of CNN and Bi-LSTM). First, the grammar rules were designed to extract information with more explicit sentiment tendencies, and the semantic features were extracted by using the local perception feature of Convolutional Neural Network (CNN). After that, considering the problem of possible ignorance of the context when processing rules, Bi-directional Long Short-Term Memory (Bi-LSTM) network was used to extract the global features containing contextual information, and the local features were fused and supplemented, so that the sentimental feature tendency information of CNN model was improved. Finally, the improved features were input into the classifier to perform the sentiment tendency judgment, and the Chinese sentiment model was constructed. The proposed model was compared with R-Bi-LSTM (Bi-LSTM for Chinese sentiment analysis combined with grammar Rules) and SCNN model (a travel review sentiment analysis model that combines Syntactic rules and CNN) on the Chinese e-commerce review text dataset. Experimental results show that the accuracy of the proposed model is increased by 3.7 percentage points and 0.6 percentage points respectively, indicating that the proposed CB_Rule model has a good classification effect.

Personalized social event recommendation method integrating user historical behaviors and social relationships

SUN Heli, XU Tong, HE Liang, JIA Xiaolin

2021, 41(2): 324-329. DOI: 10.11772/j.issn.1001-9081.2020050666

Asbtract ( )

PDF (919KB) ( )

References | Related Articles | Metrics

In order to improve the recommendation effect of social events in Event-based Social Network (EBSN), a personalized social event recommendation method combining historical behaviors and social relationships of users was proposed. Firstly, deep learning technology was used to build a user model from two aspects:the user's historical behaviors and the potential social relationships between users. Then, when modeling user preferences, the negative vector representation of user preferences was introduced, and the attention weight layer was used to assign different weights to different events in the user's historical behaviors and different friends in the user's social relationships according to different candidate recommendation events, at the same time, the various characteristics of events and groups were considered. Finally, a lot of experiments were carried out on the real datasets. Experimental results show that this personalized social event recommendation method is better than the comparative Deep User Modeling framework for Event Recommendation (DUMER) and DIN (Deep Interest Network) model combined with attention mechanism in terms of Hits Ratio (HR), Normalized Discounted Cumulative Gain (NDCG) and Mean Reciprocal Rank (MRR) evaluation indicators.

Multi-graph neural network-based session perception recommendation model

NAN Ning, YANG Chengyi, WU Zhihao

2021, 41(2): 330-336. DOI: 10.11772/j.issn.1001-9081.2020060805

Asbtract ( )

PDF (1052KB) ( )

References | Related Articles | Metrics

The session-based recommendation algorithms mainly rely on the information from the target session, but fail to fully utilize the collaborative information from other sessions. In order to solve this problem, a Multi-Graph neural network-based Session Perception recommendation (MGSP) model was proposed. Firstly, according to the target session and all sessions in the training set, Item-Transition Graph (ITG) and Collaborative Relation Graph (CRG) were constructed. Based on these two graphs, the Graph Neural Network (GNN) was applied to aggregate the information of the nodes in order to obtain two types of node representations. Then, after the two-layer attention module modelling two type node representations, the session-level representation was obtained. Finally, by using the attention mechanism to fuse the information, the ultimate session representation was gained, and the next interaction item was predicted. The comparison experiments were carried out in two scenarios of e-commerce and civil aviation. Experimental results show that, the proposed algorithm is superior to the optimal benchmark model, with an increase of more than 1 percentage point and 3 percentage point in the indicators on the e-commerce and civil aviation datasets respectively, verifying the effectiveness of the proposed model.

Knowledge reasoning method based on differentiable neural computer and Bayesian network

SUN Jianqiang, XU Shaohua

2021, 41(2): 337-342. DOI: 10.11772/j.issn.1001-9081.2020060843

Asbtract ( )

PDF (1252KB) ( )

References | Related Articles | Metrics

Aiming at the problem that Artificial Neural Network (ANN) has limited memory capability for knowledge reasoning oriented to Knowledge Graph (KG) and the KG cannot deal with uncertain knowledge, a reasoning method named DNC-BN was propsed based on Differentiable Neural Computer (DNC) and Bayesian Network. Firstly, using Long Short-Term Memory (LSTM) network as the controller, the output vector and the interface vector of network were obtained by processing the input vector and the read vector obtained from the memory at each moment. Then, the read and write heads were used to realize the interaction between the controller with the memory, the read weights were used to calculate the weighted average of data to obtain the read vector, and the write operation was performed by combining the erase vector and write vector with the write weights, so as to modify the memory matrix. Finally, based on the probabilistic inference mechanism, the BN was used to judge the inference relationship between the nodes, and the KG was completed. In the experiments, on the WN18RR dataset, DNC-BN has the Mean Rank of 2 615 and the Hits@10 of 0.528; on the FB15k-237 dataset, DNC-BN has the Mean Rank of 202, and the Hits@10 of 0.519. Experimental results show that the proposed method has good application effect on knowledge reasoning oriented to KG.

Semantic segmentation method based on edge attention model

SHE Yulong, ZHANG Xiaolong, CHENG Ruoqin, DENG Chunhua

2021, 41(2): 343-349. DOI: 10.11772/j.issn.1001-9081.2020050725

Asbtract ( )

PDF (1372KB) ( )

References | Related Articles | Metrics

Liver is the main organ of human metabolic function. At present, the main problems of machine learning in the semantic segmentation of liver images are as follows:1) there are inferior vena cava, soft tissue and blood vessels in the middle of the liver, and even some necrosis or hepatic fissures; 2) the boundary between the liver and some adjacent organs is blurred and difficult to distinguish. In order to solve the problems mentioned above, the Edge Attention Model (EAM)and the Edge Attention Net (EANet) were proposed by using Encoder-Decoder framework. In the encoder, the residual network ResNet34 pre-trained on ImageNet and the EAM were utilized, so as to fully obtain the detailed feature information of liver edge; in the decoder, the deconvolution operation and the proposed EAM were used to perform the feature extraction to the useful information, thereby obtaining the semantic segmentation diagram of liver image. Finally, the smoothing was performed to the segmentation images with a lot of noise. Comparison experiments with AHCNet were conducted on three datasets, and the results showed that:on 3Dircadb dataset, the Volumetric Overlap Error (VOE) and Relative Volume Difference (RVD) of EANet were decreased by 1.95 percentage points and 0.11 percentage points respectively, and the DICE accuracy was increased by 1.58 percentage points; on Sliver07 dataset, the VOE, Maximum Surface Distance (MSD) and Root Mean Square Surface Distance (RMSD) of EANet were decreased approximately by 1 percentage points, 3.3 mm and 0.2 mm respectively; on clinical MRI liver image dataset of a hospital, the VOE and RVD of EANet were decreased by 0.88 percentage points and 0.31 percentage points respectively, and the DICE accuracy was increased by 1.48 percentage points. Experimental results indicate that the proposed EANet has good segmentation effect of liver image.

Relation extraction model via attention-based graph convolutional network

WANG Xiaoxia, QIAN Xuezhong, SONG Wei

2021, 41(2): 350-356. DOI: 10.11772/j.issn.1001-9081.2020081310

Asbtract ( )

PDF (995KB) ( )

References | Related Articles | Metrics

Aiming at the problem of low information utilization rate of sentence dependency tree and poor feature extraction effect in relation extraction task, an Attention-guided Gate perceptual Graph Convolutional Network (Att-Gate-GCN) model was proposed. Firstly, a soft pruning strategy based on the attention mechanism was used to assign weights to the edges in the dependency tree through the attention mechanism, thus mining the effective information in the dependency tree and filtering the useless information at the same time. Secondly, a gate perceptual Graph Convolutional Network (GCN) structure was constructed, thus increasing the feature perception ability through the gating mechanism to obtain more robust relationship features, and combining the local and non-local dependency features in the dependency tree to further extract key information. Finally, the key information was input into the classifier, then the relationship category label was got. Experimental results indicate that, compared with the original graph convolutional network relation extraction model, the proposed model has the F1 score increased by 2.2 percentage points and 3.8 percentage points on SemEval2010-Task8 dataset and KBP37 dataset respectively, which makes full use of effective information, and improves the relation extraction ability of the model.

Biomedical named entity recognition with graph network based on syntactic dependency parsing

XU Li, LI Jianhua

2021, 41(2): 357-362. DOI: 10.11772/j.issn.1001-9081.2020050738

Asbtract ( )

PDF (845KB) ( )

References | Related Articles | Metrics

The existing biomedical named entity recognition methods do not use the syntactic information in the corpus, resulting in low precision. To solve this problem, a biomedical named entity recognition model with graph network based on syntactic dependency parsing was proposed. Firstly, the Convolutional Nerual Network (CNN) was used to generate character vectors which were concatenated with word vectors, then they were sent to Bidirectional Long Short-Term Memory (BiLSTM) network for training. Secondly, syntactic dependency parsing to the corpus was conducted with a sentence as a unit, and the adjacency matrix was constructed. Finally, the output of BiLSTM and the adjacency matrix constructed by syntactic dependency parsing were sent to Graph Convolutional Network (GCN) for training, and the graph attention mechanism was introduced to optimize the feature weights of adjacency nodes to obtain the model output. On JNLPBA dataset and NCBI-disease dataset, the proposed model reached F1 score of 76.91% and 87.80% respectively, which were 2.62 and 1.66 percentage points higher than those of the baseline model respectively. Experimental results prove that the proposed method can effectively improve the performance of the model in the biomedical named entity recognition task.

Online federated incremental learning algorithm for blockchain

LUO Changyin, CHEN Xuebin, MA Chundi, WANG Junyu

2021, 41(2): 363-371. DOI: 10.11772/j.issn.1001-9081.2020050609

Asbtract ( )

PDF (2197KB) ( )

References | Related Articles | Metrics

As generalization ability of the out-dated traditional data processing technology is weak, and the technology did not take into account the multi-source data security issues, a blockchain oriented online federated incremental learning algorithm was proposed. Ensemble learning and incremental learning were applied to the framework of federated learning, and stacking ensemble algorithm was used to integrate the local models and the model parameters in model training phase were uploaded to the blockchain with fast synchronization. This made the accuracy of the constructed global model only fall by 1%, while the safety in the stage of training and the stage of storage was improved, so that the costs of the data storage and the transmission of model parameters were reduced, and at the same time, the risk of data leakage caused by model gradient updating was reduced. Experimental results show that the accuracy of the model is over 91.5% and the variance of the model is lower than 10^-5, and compared with the traditional integrated data training model, the model has the accuracy slightly reduced, but has the security of data and model improved with the accuracy of the model guaranteed.

Pedestrian attribute recognition based on two-domain self-attention mechanism

WU Rui, LIU Yu, FENG Kai

2021, 41(2): 372-378. DOI: 10.11772/j.issn.1001-9081.2020060850

Asbtract ( )

PDF (1165KB) ( )

References | Related Articles | Metrics

Focusing on the issue that different attributes have different requirements for feature granularity and feature dependence in pedestrian attribute recognition tasks, a pedestrian attribute recognition model based on two-domain self-attention mechanism composed of spatial self-attention mechanism and channel self-attention mechanism was proposed. Firstly, ResNet50 was used as the backbone network to extract the features with certain semantic information. Then, the features were input into the two-branch network respectively to extract the self-attention features with spatial dependence and semantic relevance as well as the global features of overall information. Finally, the features of two branches were concatenated, and the strategies of Batch Normalization (BN) and weighted loss were used to reduce the impact of imbalanced pedestrian attribute samples. Experimental results on two pedestrian attribute datasets PETA and RAP show that the proposed model improves the mean accuracy index by 3.91 percentage points and 4.05 percentage points respectively compared with the benchmark model, and has strong competitiveness in the existing pedestrian attribute recognition models. The proposed pedestrian attribute recognition based on two-domain self-attention mechanism can be used to perform the structural description of pedestrians in monitoring scenarios, so as to improve the accuracy and efficiency of pedestrian analysis and retrieval tasks.

Path planning of mobile robots based on ion motion-artificial bee colony algorithm

WEI Bo, YANG Rong, SHU Sihao, WAN Yong, MIAO Jianguo

2021, 41(2): 379-383. DOI: 10.11772/j.issn.1001-9081.2020060794

Asbtract ( )

PDF (950KB) ( )

References | Related Articles | Metrics

Aiming at the path planning of mobile robots in storage environment, a path planning method based on Ion Motion-Artificial Bee Colony (IM-ABC) algorithm was proposed. In order to improve the convergence speed and searching ability of the traditional Artificial Bee Colony (ABC) algorithm in path planning, a strategy of simulating ion motion was used to update the swarm in this method. Firstly, at the early stage of the algorithm, the anion-cation cross search in ion motion algorithm was used to update the leading bees and following bees, so as to guide the direction of population evolution and greatly improve the development ability of population. Secondly, at the late stage of the algorithm, in order to avoid the local optimum caused by premature convergence in the early stage, random search was adopted by the leading bees and reverse roulette was used by the following bees to select honey sources and expand population diversity. Finally, an adaptive floral fragrance concentration was proposed in the global update mechanism to improve the sampling method, and then the IM-ABC algorithm was obtained. Benchmark function test and simulation experiment results show that the IM-ABC algorithm can not only rapidly converge, but also reduce the number of iterations by 58.3% and improve the optimization performance by 12.6% compared to the traditional ABC algorithm, indicating the high planning efficiency of IM-ABC algorithm.

Obstacle avoidance path planning algorithm of quad-rotor helicopter based on Bayesian estimation and region division traversal

WANG Jialiang, LI Shuhua, ZHANG Haitao

2021, 41(2): 384-389. DOI: 10.11772/j.issn.1001-9081.2020060962

Asbtract ( )

PDF (1767KB) ( )

References | Related Articles | Metrics

In order to improve the real-time ability of obstacle avoidance using image processing technology for quad-rotor helicopter, an obstacle avoidance path planning algorithm was proposed based on Bayesian estimation and region division traversal. Firstly, Bayesian estimation was used to preprocess the video images collected by quad-rotor helicopter. Secondly, obstacle probability analysis was performed to obtain key frames from video images, so as to maximize the real-time performance of the helicopter. Finally, the background difference was carried out on these selected image frames to identify the obstacles, and the pixel point traversal algorithm based on region division was implemented in order to improve the accuracy of obstacle identification. Experimental results show that with the use of the proposed algorithm, the real-time performance of quad-rotor helicopter obstacle avoidance is improved with guaranteeing the obstacle avoidance identification ability, and the maximum distance between the ideal trajectory and the actual flight trajectory of the quad-rotor helicopter is 25.6 cm, while the minimum distance is 0.2 cm. The proposed obstacle avoidance path plan algorithm can provide an efficient solution for quad-rotor helicopter to avoid obstacles by using video images collected by camera.

Unmanned aerial vehicle path planning based on improved genetic algorithm

HUANG Shuzhao, TIAN Junwei, QIAO Lu, WANG Qin, SU Yu

2021, 41(2): 390-397. DOI: 10.11772/j.issn.1001-9081.2020060797

Asbtract ( )

PDF (1487KB) ( )

References | Related Articles | Metrics

In order to solve the problems such as slow convergence speed, falling into local optimum easily, unsmooth planning path and high cost of traditional genetic algorithm, an Unmanned Aerial Vehicle (UAV) path planning method based on improved Genetic Algorithm (GA) was proposed. The selection operator, crossover operator and mutation operator of genetic algorithm were improved to planning a smooth and effective flight path. Firstly, an environment model suitable for the field information acquisition of UAV was established, and a more complex and accurate mathematical model suitable for this scene was established by considering the objective function and constraints of UAV. Secondly, the hybrid non-multi-string selection operator, asymmetric mapping crossover operator and heuristic multi-mutation operator were proposed to find the optimal path and expand the search range of the population. Finally, a cubic B-spline curve was used to smooth the planned path to obtain a smooth flight path and reduce the calculation time of the algorithm. Experimental results show that, compared with the traditional GA, the cost value of the proposed algorithm was reduced by 68%, and the number of convergence iterations was reduced by 67%; compared with the Ant Colony Optimization (ACO) algorithm, its cost value was reduced by 55% and the number of convergence iterations was reduced by 58%. Through a large number of comparison experiments, it is concluded that when the value of the crossover rate is the reciprocal of chromosome size, the proposed algorithm has the best convergence effect. After testing the algorithm performance in different environments, it can be seen that the proposed algorithm has good environmental adaptability and is suitable for path planning in complex environments.

Point-of-interest recommendation algorithm combing dynamic and static preferences

YANG Li, WANG Shihui, ZHU Bo

2021, 41(2): 398-406. DOI: 10.11772/j.issn.1001-9081.2020050677

Asbtract ( )

PDF (1355KB) ( )

References | Related Articles | Metrics

Since most existing Point-Of-Interest (POI) recommendation algorithms ignore the complexity of the modeling of the fusion of user dynamic and static preferences, a POI recommendation algorithm called CLSR (Combing Long Short Recommendation) was proposed that combined complex dynamic user preferences and general static user preferences. Firstly, in the process of modeling complex dynamic preferences, a hybrid neural network was designed based on the user's check-in behaviors and the skip behaviors in check-in behaviors to achieve the modeling of complex dynamic interests of the user. Secondly, in the process of general static preference modeling, a high-level attention network was used to learn the complex interactions between the user and POIs. Thirdly, a multi-layer neural network was used to further learn and express the above dynamic preferences and static preferences. Finally, a unified POI recommendation framework was used to integrate the preferences. Experimental results on real datasets show that, compared with FPMC-LR (Factorizing Personalized Markov Chain and Localized Region), PRME (Personalized Ranking Metric Embedding), Rank-GeoFM (Ranking based Geographical Factorization Method) and TMCA (Temporal and Multi-level Context Attention), CLSR has the performance greatly improved, and compared to the optimal TMCA among the comparison methods, the proposed algorithm has the precision, recall and normalized Discounted Cumulative Gain (nDCG) increased by 5.8%, 5.1%, and 7.2% on Foursquare dataset, and 7.3%, 10.2%, and 6.3% on Gowalla dataset. It can be seen that CLSR algorithm can effectively improve the results of POI recommendation.

Patent text classification based on ALBERT and bidirectional gated recurrent unit

WEN Chaodong, ZENG Cheng, REN Junwei, ZHANG Yan

2021, 41(2): 407-412. DOI: 10.11772/j.issn.1001-9081.2020050730

Asbtract ( )

PDF (979KB) ( )

References | Related Articles | Metrics

With the rapid increase in the number of patent applications, the demand for automatic classification of patent text is increasing. Most of the existing patent text classification algorithms utilize methods such as Word2vec and Global Vectors (GloVe) to obtain the word vector representation of the text, while a lot of word position information is abandoned and the complete semantics of the text cannot be expressed. In order to solve these problems, a multilevel patent text classification model named ALBERT-BiGRU was proposed by combining ALBERT (A Lite BERT) and BiGRU (Bidirectional Gated Recurrent Unit). In this model, dynamic word vector pre-trained by ALBERT was used to replace the static word vector trained by traditional methods like Word2vec, so as to improve the representation ability of the word vector. Then, the BiGRU neural network model was used for training, which preserved the semantic association between long-distance words in the patent text to the greatest extent. In the effective verification on the patent text dataset published by State Information Center, compared with Word2vec-BiGRU and GloVe-BiGRU, the accuracy of ALBERT-BiGRU was increased by 9.1 percentage points and 10.9 percentage points respectively at the department level of patent text, and was increased by 9.5 percentage points and 11.2 percentage points respectively at the big class level. Experimental results show that ALBERT-BiGRU can effectively improve the classification effect of patent texts of different levels.

Verifiable k-means clustering scheme with privacy-preserving

ZHANG En, LI Huimin, CHANG Jian

2021, 41(2): 413-421. DOI: 10.11772/j.issn.1001-9081.2020060766

Asbtract ( )

PDF (1269KB) ( )

References | Related Articles | Metrics

The existing cloud outsourcing privacy-preserving k-means clustering schemes have the problem of low efficiency and the problem of returning unreasonable clustering results when the cloud server is untrusted or attacked by hackers. Therefore, a cloud outsourcing verifiable privacy-preserving k-means clustering scheme that can be applied to multi-party privacy-preserving scenarios was proposed. Firstly, an improved clustering initialization method suitable for cloud outsourcing scenarios was proposed to effectively improve the iterative efficiency of the algorithm. Secondly, the multiplicative triple technology was used to design the safe Euclidean distance algorithm, and the garbled circuit technology was used to design the algorithm for safe calculation of the minimum value. Finally, a verification algorithm was proposed, making the users only need one round of communication to verify the clustering results. And after the data outsourcing, the algorithm training was performed on the cloud entirely, which was able to effectively reduce the interactions between users and the cloud. Simulation results show that the accuracy of the proposed scheme is 97% and 93% on the datasets Synthetic and S1 respectively, indicating that the privacy-preserving k-means clustering is similar to the plaintext k-means clustering, and is suitable for medical, social sciences and business fields.

Efficient dynamic data audit scheme for resource-constrained users

LI Xiuyan, LIU Mingxi, SHI Wenbo, DONG Guofang

2021, 41(2): 422-432. DOI: 10.11772/j.issn.1001-9081.2020050614

Asbtract ( )

PDF (1658KB) ( )

References | Related Articles | Metrics

Internet of Things (IoT) devices promote the rapid development of cloud storage outsourcing data service, which is favored by more and more terminal users. Therefore, how to ensure the integrity verification of user data in cloud server has become a hot issue that needs to be solved urgently. For resource-constrained users, current cloud data audit scheme has the problems such as complex computation, high cost and low efficiency. To solve these problems, an efficient dynamic data audit scheme for resource-constrained users was proposed. First, a new data structure was proposed based on Novel Counting Bloom Filter (NCBF) and Multi-Merkle Hash Tree (M-MHT) to support dynamic audit, namely NCBF-M-MHT. In this data structure, the NCBF structure was able to realize the dynamic updating request of data within O(1) time, thereby ensuring the efficiency of audit. And the root node of M-MHT structure performed signing by user authentication to ensure the security of data. Then, different allocation methods were adopted for different audit entities, and the data evidence and label evidence were used to verify the correctness and integrity of data. Experimental results show that compared with the audit scheme based on Dynamic Hash Table (DHT), the audit scheme based on Merkle Hash Tree (MHT) and the audit scheme based on Location Array-Doubly Linked Info Table (LA-DLIT), the time cost of the proposed scheme in the audit verification phase is reduced by 45.40%, 23.71% and 13.85%, and the time cost in the dynamic update phase is reduced by 43.33%, 27.50% and 17.58% respectively.

Abnormal flow detection based on improved one-dimensional convolutional neural network

HANG Mengxin, CHEN Wei, ZHANG Renjie

2021, 41(2): 433-440. DOI: 10.11772/j.issn.1001-9081.2020050734

Asbtract ( )

PDF (1011KB) ( )

References | Related Articles | Metrics

In order to solve the problems that traditional machine learning based abnormal flow detection methods rely heavily on features, and the detection methods based on deep learning are inefficient and easy to overfit, an abnormal flow detection method based on Improved one-Dimentional Convolutional Neural Network (ICNN-1D) was proposed, namely AFM-ICNN-1D. Different from "convolution-pooling-full connection" structure of the traditional CNN, the ICNN-1D is mainly composed of 2 convolutional layers, 2 global pooling layers, 1 dropout layer and 1 fully connected output layer. The preprocessed data were put into ICNN-1D, and the result after two convolutional layers was used as the input of the global average pooling layer and the global maximum pooling layer, then the obtained output data were merged and sent to the fully connected layer to classify. The model was optimized according to the classification result and the real dataset, then it was used to the abnormal flow detection. The experimental results on the CIC-IDS-2017 dataset showed that the accuracy and recall rate of AFM-ICNN-1D reached 98%, which is better than that of the comparative k-Nearest Neighbor (kNN) and Random Forest (RF) methods. Moreover, compared with traditional CNN, the model parameters were reduced by about 97%, and the training time was shortened by about 40%. Experimental results show that AFM-ICNN-1D has high detection performance, which can reduce training time and avoid over fitting with better retaining the local characteristics of traffic data.

Image steganalysis method based on saliency detection

HUANG Siyuan, ZHANG Minqing, KE Yan, BI Xinliang

2021, 41(2): 441-448. DOI: 10.11772/j.issn.1001-9081.2020081323

Asbtract ( )

PDF (1782KB) ( )

References | Related Articles | Metrics

Aiming at the problem that the steganalysis of images is difficult, and the existing detection models are difficult to make a targeted analysis of steganography regions of images, a method for image steganalysis based on saliency detection was proposed. In the proposed method, the saliency detection was used to guide the steganalysis model to focus on the image features of steganography regions. Firstly, the saliency detection module was used to generate saliency regions of the image. Secondly, the region filter module was used to filter the saliency images with a high degree of coincidence with the steganography regions, and image fusion technology was used to fuse them with the original images. Finally, the quality of training set was improved by replacing the error detection images with their corresponding saliency fusion images, so as to improve the training effect and detection ability of the model. The experiments were carried out on BOSSbase1.01 dataset. The dataset was embedded by image adaptive steganography algorithms in spatial domain and JPEG domain respectively, and experimental results show that the proposed method can effectively improve the the detection accuracy for deep learning-based steganalysis model by 3 percentage points at most. The mismatch test was also carried out on IStego100K dataset to further verify the generalization ability of the model and improve its application value. According to the result of the mismatch test, the proposed method has certain generalization ability.

Audio steganography detection model combing residual network and extreme gradient boosting

CHEN Lang, WANG Rangding, YAN Diqun, LIN Yuzhen

2021, 41(2): 449-455. DOI: 10.11772/j.issn.1001-9081.2020060775

Asbtract ( )

PDF (1165KB) ( )

References | Related Articles | Metrics

Aiming at the problem that the current audio steganography detection methods have low accuracy in detecting audio steganography based on Syndrome-Trellis Codes (STC), and considering the advantages of Convolutional Neural Network (CNN) in extracting abstract features, a model for audio steganography detection combining Deep Residual Network (DRN) and eXtreme Gradient Boosting (XGBoost) was proposed. Firstly, a fixed-parameter High-Pass Filter (HPF) was used to preprocess the input audio, and features were extracted through three convolutional layers. Truncated Linear Unit (TLU) activation function was applied in the first convolutional layer to make the model adapt to the distribution of steganographic signals with low Signal-To-Noise Ratio (SNR). Then, the abstract features were further extracted by five-stage residual blocks and pooling operations. Finally, the extracted high-dimensional features were classified as inputs of the XGBoost model through fully connected layers and dropout layers. The STC steganography and the Least Significant Bit Matching (LSBM) steganography were detected respectively. Experimental results showed that when the embedding rates were 0.5 bps (bit per sample), 0.2 bps and 0.1 bps respectively, that is to say, the average number of bits modified for per audio sample equaled to 0.5, 0.2 and 0.1 respectively, the proposed model achieved average detection accuracies of 73.27%, 70.16% and 65.18% respectively for the STC steganography with a sub check matrix with height of 7, and the average detection accuracies of 86.58%, 76.08% and 72.82% respectively for the LSBM steganography. Compared with the traditional steganography detection methods based on extracting handcrafted features and deep learning steganography detection methods, the proposed model has the average detection accuracies for the two steganography algorithms both increased by more than 10 percent points.

Video watermarking algorithm based on H.266/versatile video coding with intra frame dual mode

LUO Zhiwei, LIU Chibiao

2021, 41(2): 456-460. DOI: 10.11772/j.issn.1001-9081.2020050603

Asbtract ( )

PDF (1212KB) ( )

References | Related Articles | Metrics

Concerning the problem that high rate-distortion loss to the encoder caused by the existing watermarking algorithm, combined with the Versatile Video Coding (VVC) standard, a video watermarking algorithm based on the parity of intra frame prediction modes was proposed. Firstly, the first round of rough selection was performed to calculate 35 even-numbered angle modes and select 6 modes from them. Secondly, during the second round of rough selection, whether to calculate the odd-numbered modes of second round rough selection was determined according to the binary watermark state. Thirdly, when Matrix weighted Intra Prediction (MIP) and the Most Probable Mode (MPM) candidate list were performed, whether to calculate the odd- or even-numbered modes was determined according to the binary watermark state. Finally, the mode with the smallest rate-distortion cost was selected as the optimal mode to complete the watermark embedding, and the watermark was extracted by the decoder according to the parity of the received 4×4 block prediction mode. Experimental results showed that when embedding wartermark in 9 video sequences, compared with the algorithm of the original encoder, the Peak Signal-to-Noise Ratio (PSNR) of the proposed algorithm was decreased by 0.005 4 dB on average, the bit rate was increased by 0.07% on average, and the watermark capacity that could be embedded per frame was 625 b on average; compared with the algorithm that embedding the watermark in the parity of the prediction mode, the PSNR of the proposed algorithm was reduced by 0.005 4 dB on average, the bit rate was improved by 0.06% on average, and the total watermark capacity that could be embedded was 3 183 b on average, while the comparison algorithm had the PSNR reduced by 0.035 dB on average, the bit rate increased by 1.12% on average, and the total watermark capacity that could be embedded was 10 309 b on average. It can be seen that the loss of the rate-distortion performance of the proposed algorithm is less than that of the comparison algorithm.

Binary particle swarm optimization algorithm based on novel S-shape transfer function for knapsack problem with a single continuous variable

WANG Zekun, HE Yichao, LI Huanzhe, ZHANG Fazhan

2021, 41(2): 461-469. DOI: 10.11772/j.issn.1001-9081.2020050710

Asbtract ( )

PDF (1113KB) ( )

References | Related Articles | Metrics

In order to solve the Knapsack Problem with a single Continuous variable (KPC) efficiently, a novel S-shape transfer function based on Gauss error function was proposed, and a new approach of transforming a real vector into a 0-1 vector by using the proposed transfer function was given, thereby a New Binary Particle Swarm Optimization algorithm (NBPSO) was proposed. Then, based on the second mathematical model of KPC and the combination of NBPSO and the effective algorithm to deal with the infeasible solutions of KPC, a new approach to solve KPC was proposed. For validating the performance of NBPSO in solving KPS, NBPSO was utilized to solve four kinds of large-scale KPC instances, and the obtained calculation results were compared with those of Binary Particle Swarm Optimization algorithms (BPSOs) based on other S and V-shape transfer functions, Single-population Binary Differential Evolution with Hybrid encoding (S-HBDE), Bi-population Binary Differential Evolution with Hybrid encoding (B-HBDE) and Binary Particle Swarm Optimization algorithm (BPSO). The comparison results show that NBPSO is superior to the comparison algorithms in average calculation result and stability, illustrating that NBPSO has the performance better than other algorithms.

Improved imperialist competitive algorithm inspired by historical facts of Spring and Autumn Period

WANG Guilin, LI Bin

2021, 41(2): 470-478. DOI: 10.11772/j.issn.1001-9081.2020060974

Asbtract ( )

PDF (2395KB) ( )

References | Related Articles | Metrics

Aiming at the problem that imperialist competitive algorithm falls into the curse of dimensionality easily when used for solving high-dimensional functions because of premature convergence, an improved imperialist competitive algorithm inspired by the historical facts of the vassal states vying for power in the Spring and Autumn Period of China was proposed. Firstly, the competition strategy of "Cooperative Confrontation" was introduced in the process of country initialization to enhance the information interaction, so as to preserve the high-quality populations. Secondly, learnt from the colonial rule strategy of gradual infiltration and assimilation at all levels of state was used in the empire assimilation process to improve the development ability of the algorithm. Finally, the mechanism of judging and jumping out of local optimum was added to avoid the impact of "premature" on the optimization performance. In the simulation experiment, through 8 classic benchmark functions, the optimization ability, convergence speed and high-dimensional function applicability of the improved algorithm were verified, and three schemes for jumping out of local optimum were compared and analyzed. Furthermore, CEC2017 test function experiment was carried out, and the comparison of the proposed improved algorithm with five representative advanced algorithms in the field of algorithm improvement in recent years showed that the improved algorithm had higher optimization accuracy and stronger stability. And the Kendall correlation coefficient verification showed the significant difference between the improved algorithm and the original algorithm on optimization performance, and that the improvement of assimilation mechanism played a key role in the performance improvement.

Real-valued Cartesian genetic programming algorithm based on quasi-oppositional mutation

FU Anbing, WEI Wenhong, ZHANG Yuhui, GUO Wenjing

2021, 41(2): 479-485. DOI: 10.11772/j.issn.1001-9081.2020060791

Asbtract ( )

PDF (1178KB) ( )

References | Related Articles | Metrics

Concerning the problems that the traditional Cartesian Genetic Programming (CGP) is lack of diversity of mutation operation and the evolutionary strategy used in it has limitations, an ADvanced Real-Valued Cartesian Genetic Programming algorithm based on quasi-oppositional mutation (AD-RVCGP) was proposed. Firstly, the 1+lambda evolutionary strategy was adopted in the evolution process in AD-RVCGP just like in the traditional CGP, that is lambda offsprings were generated by a parent only through mutation operation. Secondly, three mutation operators including quasi-oppositional mutation operator, terminal mutation operator and single-point mutation operator were dynamically selected in the process of evolution, and the information of oppositional individuals was used for the mutation operation. Finally, in the evolution process, different parents were selected in the algorithm to generate the next generation individuals according to the state of evolution stage. In the test of symbolic regression problem, the convergence speed of the proposed AD-RVCGP was about 30% faster than that of the traditional CGP, and the running time was about 20% less. In addition, the error between the optimal solution obtained by AD-RVCGP and the real optimal solution was smaller than the optimal solution obtained by the traditional CGP and the real optimal solution. Experimental results show that the proposed AD-RVCGP has high convergence speed and precision for solving problem.

F-X domain predictive filtering parallel algorithm based on compute unified device architecture

YANG Xianfeng, GUI Hongjun, FU Chunchang

2021, 41(2): 486-491. DOI: 10.11772/j.issn.1001-9081.2020050688

Asbtract ( )

PDF (1373KB) ( )

References | Related Articles | Metrics

Concerning the high time complexity problem of traditional F-X domain predictive filtering in suppressing random noise of seismic data, a parallel algorithm based on Compute Unified Device Architecture (CUDA) was proposed. Firstly, the algorithm was analyzed modularly to find the calculation bottleneck of the algorithm. Then, starting with the steps of calculating the correlation matrix, calculating the filter factor, filtering from each window data and so on, the filtering process was divided into multiple tasks for parallel processing based on the Graphic Processing Unit (GPU). Finally, the efficiency of the algorithm was improved by implementing the parallel algorithm and optimizing the redundant data reading in adjacent filter windows. Experimental results based on NVIDIA Tesla K20c show that in the seismic data of 250×250 work area, the proposed parallel algorithm achieves an efficiency improvement of 10.9 times compared with the original serial algorithm, while ensuring the calculation accuracy required in the engineering at the same time.

Node redeployment strategy based on firefly algorithm for wireless sensor network

SUN Huan, CHEN Hongbin

2021, 41(2): 492-497. DOI: 10.11772/j.issn.1001-9081.2020060803

Asbtract ( )

PDF (994KB) ( )

References | Related Articles | Metrics

Node deployment is one of the important problems in Wireless Sensor Network (WSN). Concerning the problem of energy hole in the process of node employment, a Node Redeployment Based on the Firefly Algorithm (NRBFA) strategy was proposed. Firstly, the k-means algorithm was used to cluster nodes and the redundant nodes were introduced into the sensor network where nodes are randomly deployed. Then, the Firefly Algorithm (FA) was used to move the redundant nodes to share the load of Cluster Heads (CHs) and balance the energy consumption of nodes in the network. Finally, the redundant nodes were updated after finding the target node by reusing the FA. In the proposed strategy, the reduction of moving distances of nodes and the decrease of the network energy consumption were achieved through moving the redundant nodes effectively. Experimental results show that the proposed strategy can alleviate the "energy hole" problem effectively. Compared with the partition node redeployment algorithm based on virtual force, the proposed strategy reduces the complexity of the algorithm, and can better improve the energy efficiency of the network, balance the network load, as well as prolong the network lifetime by nearly 10 times.

Forward collision warning strategy based on vehicle-to-vehicle communication

HUI Fei, XING Meihua, GUO Jing, TANG Shuyu

2021, 41(2): 498-503. DOI: 10.11772/j.issn.1001-9081.2020060773

Asbtract ( )

PDF (1325KB) ( )

References | Related Articles | Metrics

In the delay time of the Forward Collision Warning (FCW) system under Vehicle-to-Vehicle (V2V) communication, the traditional model assumes uniform speed of the host vehicle and error-free Global Positioning System (GPS), so as to significantly underestimate the risk of collision. Aiming at this problem, a new FCW strategy was proposed with correcting GPS errors and considering the movement state of the host vehicle within the delay time. Firstly, the overall workflow of the FCW system based on V2V communication was analyzed, and the key delays in the system were modeled by using the Gaussian model. Then, a collision avoidance model was established with correcting GPS errors and taking the movement state of the host vehicle within the delay time into consideration. And different warning strategies were formulated corresponding to three scenarios of the constant speed, acceleration and deceleration of the remote vehicle. Finally, in view of the situation where the host vehicle accelerated in the delay time, Matlab was used to simulate the proposed FCW strategy. Simulation results show that the average successful collision avoidance rate of the proposed warning strategy can reach 96%, verifying the effectiveness of it in different scenarios.

Low-poly rendering for image and video

HAN Yanru, YIN Mengxiao, QIN Zixuan, SU Peng, YANG Feng

2021, 41(2): 504-510. DOI: 10.11772/j.issn.1001-9081.2020050626

Asbtract ( )

PDF (9250KB) ( )

References | Related Articles | Metrics

Low-poly is a popular style in the art design field recently. In order to improve the quality of image and video low-poly stylization, an image and video low-poly rendering method based on edge features and superpixel segmentation was proposed. Firstly, the intersection points of adjacent superpixels and the uniform sampling points of the difference set between feature edges and superpixel boundaries were extracted as the vertices of the triangle mesh, and Delaunay triangulation was performed to generate the initial triangle mesh. Then, the constrained quadric error metric method was used to simplify the generated mesh in order to generate the final triangle mesh. Finally, the triangle mesh was filled with color to obtain the image with low-poly style. For video low-poly rendering, the temporally consistent superpixels were used to track the same part of the object across frames to establish associations between the video frames, reducing the jitter after video rendering. In addition, the video segmentation method was used to segment the moving objects in the video, so as to obtain sampling points with different densities between the moving objects and the background, and the local stylized effect of the video was obtained by rendering the moving objects. Experimental results show that the proposed method can generate low-poly rendering results with better visual effects.

Feature matching method based on weighted similarity measurement

HU Lihua, ZUO Weijian, NIE Yaoyao

2021, 41(2): 511-516. DOI: 10.11772/j.issn.1001-9081.2020050747

Asbtract ( )

PDF (1365KB) ( )

References | Related Articles | Metrics

In order to solve the problems of poor robustness and high mismatch rate caused by the noise, illumination and scale in the image feature matching process, a feature matching method based on Weighted Similarity Measurement (WSM) was proposed. At first, FM_GMC (Feature Matching based on Grid and Multi-Density Clustering) algorithm was adopted to divide image into several feature clustering blocks. Secondly, in each feature clustering block, the edge feature points were attracted by Canny and descripted by Scale-Invariant Feature Transform (SIFT). Thirdly, the similarity measurements were performed on the Hausdorff distance of spatial context information between feature clustering blocks, the Euclidean distance between appearance descriptors of image feature points and Normalized Cross Correlation (NCC) by using weighting methods. Finally, the similarity measurement results were further optimized according to Nearest Neighbor Distance Ratio (NNDR), so as to determine the feature matching result. With the ancient architecture images as the dataset, the experimental results show that the WSM method has an average matching precision of 92%, and is superior to commonly matching algorithms on matching number and matching precision. Therefore, the effectiveness and robustness of WSM method are verified.

Global-local domain adaptive object detection based on single shot multibox detector

JIANG Ning, FANG Jinglong, YANG Qing

2021, 41(2): 517-522. DOI: 10.11772/j.issn.1001-9081.2020050622

Asbtract ( )

PDF (1199KB) ( )

References | Related Articles | Metrics

In the field of object detection, it is hoped that the model trained in the domain with a lot of labels can be applied to other domains without labels, but different domain distributions are always different to each other, such difference will result in a sharp decline of model performance in domain transfer. To improve the model performance of object detection in domain transfer, the domain transfer was addressed on two levels, including the global-level transfer and the local-level transfer, which were corresponding to different feature alignment methods, that is, the global-level adopted selective alignment and the local-level adopted full alignment. The proposed domain transfer framework was constructed based on Single Shot MultiBox Detector (SSD) model and was disposed of two domain adaptors corresponding to global and local level respectively for the purpose of alleviating the domain difference. The specific training was implemented by the adversarial network algorithm, and the consistency regularization was used to further improve the domain transfer performance of the model. The effectiveness of the proposed domain transfer model was verified by many experiments. Experimental results show that on various datasets, the proposed model outperforms the existing common domain transfer models such as Domain Adaptation-Faster Region-based Convolutional Neural Network(DA-FRCNN), Adversarial Discriminative Domain Adaptation (ADDA), Dynamic Adversarial Adaptation Network (DAAN) by 5%-10% in term of mean Average Precision (mAP).

Specified object tracking of unmanned aerial vehicle based on Siamese region proposal network

ZHONG Sha, HUANG Yuqing

2021, 41(2): 523-529. DOI: 10.11772/j.issn.1001-9081.2020060762

Asbtract ( )

PDF (1689KB) ( )

References | Related Articles | Metrics

Object tracking based on Siamese network has made some progresses, that is it overcomes the limitation of the spatial invariance of Siamese network in the deep network. However, there are still factors such as appearance changes, scale changes, and occlusions that affect tracking performance. Focusing on the problems of large changes in object scale, object motion blur and small scale of object in the specified object tracking of Unmanned Aerial Vehicles (UAV), a new tracking algorithm was proposed based on the Siamese region proposal attention mechanism network, namely Attention-SiamRPN+. Firstly, an improved deep residual network ResNet-50 was employed as a feature extractor to extract feature maps. Secondly, the channel attention mechanism module was used to filter the semantic information of different channel feature maps extracted by the residual network, and the corresponding weights to different channel features were reassigned. Thirdly, a hierarchical fusion of two Region Proposal Networks (RPN) was applied. The RPN module was consisted of channel-by-channel deep cross-correlation of feature maps, classification of positive and negative samples and bounding box regression. Finally, the box of the object position was selected. In the test on the VOT2018 platform, the proposed algorithm had the accuracy of 59.4% and the Expected Average Overlap (EAO) of 39.5%. In the experiment with one-pass evaluation mode on the OTB2015 platform, the algorithm had the success rate and precision of 68.7% and 89.4% respectively. Experimental results show that the evaluation results of the proposed algorithm are better than the results of three excellent correlation filtering tracking and Siamese network tracking algorithms in recent years, and the proposed algorithm has good robustness and real-time processing speed when applying to the tracking of specified objects of UAV.

Video person re-identification based on non-local attention and multi-feature fusion

LIU Ziyan, ZHU Mingcheng, YUAN Lei, MA Shanshan, CHEN Lingzhouting

2021, 41(2): 530-536. DOI: 10.11772/j.issn.1001-9081.2020050739

Asbtract ( )

PDF (1057KB) ( )

References | Related Articles | Metrics

Aiming at the fact that the existing video person re-identification methods cannot effectively extract the spatiotemporal information between consecutive frames of the video, a person re-identification network based on non-local attention and multi-feature fusion was proposed to extract global and local representation features and time series information. Firstly, the non-local attention module was embedded to extract global features. Then, the multi-feature fusion was realized by extracting the low-level and middle-level features as well as the local features, so as to obtain the salient features of the person. Finally, the similarity measurement and sorting were performed to the person features in order to calculate the accuracy of video person re-identification. The proposed model has significantly improved performance compared to the existing Multi-scale 3D Convolution (M3D) and Learned Clip Similarity Aggregation (LCSA) models with the mean Average Precision (mAP) reached 81.4% and 93.4% respectively and the Rank-1 reached 88.7% and 95.3% respectively on the large datasets MARS and DukeMTMC-VideoReID. At the same time, the proposed model has the Rank-1 reached 94.8% on the small dataset PRID2011.

Crowd counting network based on multi-scale spatial attention feature fusion

DU Peide, YAN Hua

2021, 41(2): 537-543. DOI: 10.11772/j.issn.1001-9081.2020060793

Asbtract ( )

PDF (1581KB) ( )

References | Related Articles | Metrics

Concerning the poor performance problem of crowd counting tasks in different dense scenes caused by severe scale changes and occlusions, a new Multi-scale spatial Attention Feature fusion Network (MAFNet) was proposed based on the Congested Scene Recognition Network (CSRNet) by combining the multi-scale feature fusion structure and the spatial attention module. Before extracting features with MAFNet, the scene images with head markers were processed with the Gaussian filter to obtain the ground truth density maps of images. In addition, the method of jointly using two basic loss functions was proposed to constrain the consistency of the density estimation map and the ground truth density map. Next, with the multi-scale feature fusion structure as the backbone of MAFNet, the strategy of extracting and fusing multi-scale features simultaneously was used to obtain the multi-scale fusion feature map, then the feature maps were calibrated and refused by the spatial attention module. After that, an estimated density image was generated through dilated convolution, and the number of people in the scene was obtained by integrating the estimated density image pixel by pixel. To verify the effectiveness of the proposed model, evaluations were conducted on four datasets (ShanghaiTech, UCF_CC_50, UCF_QRNF and World-Expo'10). Experimental results on ShanghaiTech dataset PartB show that, compared with CSRNet, MAFNet has a Mean Absolute Error (MAE) reduction of 34.9% and a Mean Square Error (MSE) reduction of 29.4%. Furthermore, experimental results on multiple datasets show that by using the attention mechanism and multi-scale feature fusion strategy, MAFNet can extract more detailed information and reduce the impact of scale changes and occlusions.

Dense crowd counting model based on spatial dimensional recurrent perception network

FU Qianhui, LI Qingkui, FU Jingnan, WANG Yu

2021, 41(2): 544-549. DOI: 10.11772/j.issn.1001-9081.2020050623

Asbtract ( )

PDF (1486KB) ( )

References | Related Articles | Metrics

Considering the limitations of the feature extraction of high-density crowd images with perspective distortion, a crowd counting model, named LMCNN, that combines Global Feature Perception Network (GFPNet) and Local Association Feature Perception Network (LAFPNet) was proposed. GFPNet was the backbone network of LMCNN, its output feature map was serialized and used as the input of LAFPNet. And the characteristic that the Recurrent Neural Network (RNN) senses the local association features on the time-series dimension was used to map the single spatial static feature to the feature space with local sequence association features, thus effectively reducing the impact of perspective distortion on crowd density estimation. To verify the effectiveness of the proposed model, experiments were conducted on Shanghaitech Part A and UCF_CC_50 datasets. The results show that compared to Atrous Convolutions Spatial Pyramid Network (ACSPNet), the Mean Absolute Error (MAE) of LMCNN was decreased by 18.7% and 20.3% at least, respectively, and the Mean Square Error (MSE) was decreased by 22.3% and 22.6% at least, respectively. The focus of LMCNN is the association between the front and back features on the spatial dimension, and by fully integrating the spatial dimension features and the sequence features in a single image, the crowd counting error caused by perspective distortion is reduced, and the number of people in dense areas can be more accurately predicted, thereby improving the regression accuracy of crowd density.

3D hand pose estimation based on label distribution learning

LI Weiqiang, LEI Hang, ZHANG Jingyu, WANG Xupeng

2021, 41(2): 550-555. DOI: 10.11772/j.issn.1001-9081.2020050721

Asbtract ( )

PDF (1109KB) ( )

References | Related Articles | Metrics

Fast and reliable hand pose estimation has a wide application in the fields such as human-computer interaction. In order to deal with the influences to the hand pose estimation caused by the light intensity changes, self-occlusions and large pose variations, a deep network framework based on label distribution learning was proposed. In the network, the point cloud of the hand was used as the input data, which was normalized through the farthest point sampling and Oriented Bounding Box (OBB). Then, the PointNet++ was utilized to extract features from the hand point cloud data. To deal with the highly non-linear relationship between the point cloud and the hand joint points, the positions of the hand joint points were predicted by the label distribution learning network. Compared with the traditional depth map based approaches, the proposed method was able to effectively extract discriminative hand geometric features with low computation cost and high accuracy. A set of tests were conducted on the public MSRA dataset to verify the effectiveness of the proposed hand pose estimation network. Experimental results showed that the average error of the hand joints estimated by this network was 8.43 mm, the average processing time of a frame was 12.8 ms, and the error of pose estimation was reduced by 11.82% and 0.83% respectively compared with the 3D CNN and Hand PointNet.

Automatic segmentation algorithm of two-stage mediastinal lymph nodes using attention mechanism

XU Shaowei, QIN Pinle, ZENG Jianchao, ZHAO Zhikai, GAO Yuan

2021, 41(2): 556-562. DOI: 10.11772/j.issn.1001-9081.2020060809

Asbtract ( )

PDF (2390KB) ( )

References | Related Articles | Metrics

Judging weather there exists mediastinal lymph node metastasis in the location of mediastinal lymph node region and correctly segmenting malignant lymph nodes have great significance to the diagnosis and treatment of lung cancer. In view of the large difference in mediastinal lymph node size, the imbalance of positive and negative samples and the feature similarity between surrounding soft tissues and lung tumors, a new cascaded two-stage mediastinal lymph node segmentation algorithm based on attention was proposed. First, a two-stage segmentation algorithm was designed based on the medical prior to remove mediastinal interference tissues and then segment the suspicious lymph nodes, so as to reduce the interference of the negative samples and the difficulty of training, while enhancing the ability to segment mediastinal lymph nodes. Second, a global aggregation module and a dual attention module were introduced to improve the network's ability to classify multi-scale targets and backgrounds. Experimental results showed that the proposed algorithm achieved an accuracy of 0.707 9, a recall of 0.726 9, and a Dice score of 0.701 1 on the mediastinal lymph node dataset. It can be seen that the proposed algorithm is significantly better than other current mediastinal lymph node segmentation algorithms in terms of accuracy and Dice score, and can solve problems such as the big difference in size, sample imbalance and easily confused features of lymph nodes.

Citrus disease and insect pest area segmentation based on superpixel fast fuzzy C-means clustering and support vector machine

YUAN Qianqian, DENG Hongmin, WANG Xiaohang

2021, 41(2): 563-570. DOI: 10.11772/j.issn.1001-9081.2020050645

Asbtract ( )

PDF (1737KB) ( )

References | Related Articles | Metrics

Focused on the existing problems that there are few image datasets of citrus diseases and insect pests, the targets of diseases and pests are complex and scattered, and are difficult to realize automatic location and segmentation, a segmentation method of agricultural citrus disease and pest areas based on Superpixel Fast Fuzzy C-means Clustering (SFFCM) and Support Vector Machine (SVM) was proposed. This method made full use of the advantages of SFFCM algorithm, which was fast and robust, and integrated the characteristics of spatial information, meanwhile, it did not require manual selection of samples in image segmentation like the traditional SVM. Firstly, the improved SFFCM segmentation algorithm was used to pre-segment the image to be segmented to obtain the foreground and background regions. Then, the erosion and dilation operations in morphology were used to narrow these two areas, and the training samples were automatically selected for SVM model training. Finally, the trained SVM classifier was used to segment the entire image. Experimental results show that compared with the following three methods:Fast and Robust Fuzzy C-means Clustering (FRFCM), the original SFFCM and Edge Guidance Network (EGNet), the proposed method has the average recall of 0.937 1, average precision of 0.941 8 and the average accuracy of 0.930 3, all of which are better than those of the comparison methods.

Macroscopic fundamental diagram traffic signal control model based on hierarchical control

WANG Peng, LI Yanwen, YANG Di, YANG Huamin

2021, 41(2): 571-576. DOI: 10.11772/j.issn.1001-9081.2020050758

Asbtract ( )

PDF (1351KB) ( )

References | Related Articles | Metrics

Aiming at the problem of coordinated control within urban traffic sub-areas and boundary intersections, a traffic signal control model based on Hierarchical multi-granularity and Macroscopic Fundamental Diagram (HDMF) was proposed. First, the hierarchical multi-granularity characteristic of the urban traffic system and the rough set theory were used to describe the real-time states of the traffic elements. Then, combined with the distributed intersection signal control based on backpressure algorithm and the dynamic characteristics of the traffic elements, the pressures of the intersection phases were calculated and the phase decision was made. Finally, Macroscopic Fundamental Diagram (MFD) was used to achieve the maximum total flow of vehicles driving out of the area and the optimal number of vehicles in each sub-area. Experimental results showed that HDMF model had the average queue length reduced by 6.35% and 10.01% respectively, and had the average travel time reduced by 6.55% and 11.15% respectively compared with EMP (Extended cooperative Max-Pressure control) model and HGA model based on MFD and hybrid genetic simulated annealing algorithm. It can be seen that the propsed HDMF model can effectively relieve interior and boundary traffic congestions of sub-areas and maximize the traffic flow of the whole road network.

Cleaning scheduling model with constraints and its solution

FAN Xiaomao, XIONG Honglin, ZHAO Gansen

2021, 41(2): 577-582. DOI: 10.11772/j.issn.1001-9081.2020050735

Asbtract ( )

PDF (876KB) ( )

References | Related Articles | Metrics

Cleaning tasks of the cleaning service company often have the characteristics such as different levels, different durations and different cycles, and lack a general cleaning scheduling problem model. At present, the solving of cleaning scheduling problem is mainly relies on manual scheduling scheme, causing the problems such as time-consuming, labor-consuming and unstable scheduling quality. Therefore, a mathematical model of cleaning scheduling problem with constraints, which is a NP-hard problem, was proposed, then Simulated Annealing algorithm (SA), Bee Colony Optimization algorithm (BCO), Ant Colony Optimization algorithm (ACO), and Particle Swarm Optimization algorithm (PSO) were utilized to solve the proposed constrained cleaning scheduling problem. Finally, an empirical analysis was carried out by using the real scheduling state of a cleaning service company. Experimental results show that compared with the manual scheduling scheme, the heuristic intelligent optimization algorithms have obvious advantages in solving the constrained cleaning scheduling problem, and the manpower demand of the obtained cleaning schedule reduced significantly. Specifically, these algorithms can make the cleaning manpower in one year scheduling cycle be saved by 218.62 hours to 513.30 hours compared to manual scheduling scheme. It can be seen that the mathematical models based on heuristic intelligent optimization algorithms are feasible and efficient in solving cleaning scheduling problem with constraints, and provide making-decision supports for the scientific management of the cleaning service company.

Fall detection algorithm integrating motion features and deep learning

CAO Jianrong, LYU Junjie, WU Xinying, ZHANG Xu, YANG Hongjuan

2021, 41(2): 583-589. DOI: 10.11772/j.issn.1001-9081.2020050705

Asbtract ( )

PDF (1348KB) ( )

References | Related Articles | Metrics

In order to use computer vision technology to accurately detect the fall of the elderly, aiming at the incompleteness of existing fall detection algorithms caused by artificial designing of features and the problems in the fall detection process such as the difficulty of separating foreground and background, the confusion of objects, the loss of moving objects, and the low accuracy of fall detection, a deep learning fall detection algorithm with the fusion of human motion information was proposed to detect the fall state of human body. Firstly, foreground and background were separated by the improved YOLOv3 network, and human object was marked by minimum bounding rectangle according to the detection results of YOLOv3 network. Then, by analyzing the motion features in the process of human fall, the motion features of human body were vectorized and transformed into the motion weight information between 0 and 1 through the Sigmoid activation function. Finally, in order to classify human falls, the motion features and the features extracted by Convolutional Neural Network (CNN) were spliced and fused through the fully connected layer. The proposed fall detection algorithm was compared with human object detection algorithms such as background difference, Gaussian mixture, VIBE (VIsual Background Extractor), Histogram of Oriented Gradient (HOG) and human fall judgment schemes such as threshold method, grading method, Support Vector Machine (SVM) classification, CNN classification, and tested under different lighting conditions and the interference of mixed daily noise motion. The results show that the proposed algorithm is superior to traditional human fall detection algortihms in environmental adaptability and fall detection accuracy. The proposed algorithm can effectively detect the human body in the video and accurately detect the fall state of human body, which further verifies the feasibility and efficiency of the deep learning recognition method with the fusion of motion information in the video fall behavior analysis.

Diagnosis of mild cognitive impairment using deep learning and brain functional connectivities with different frequency dimensions

KONG Lingxu, WU Haifeng, ZENG Yu, LU Xiaoling

2021, 41(2): 590-597. DOI: 10.11772/j.issn.1001-9081.2020060897

Asbtract ( )

PDF (1848KB) ( )

References | Related Articles | Metrics

Accurate diagnosis of Mild Cognitive Impairment (MCI) is critical to the prevention and treatment of Alzheimer's Disease (AD). Currently, deep learning and resting-state functional Magnetic Resonance Imaging (rs-fMRI) are often used to assist the diagnosis of MCI. The commonly used Pearson correlation method and Window Pearson (WP) correlation method can represent the brain Functional Connectivity (FC) in the time dimension, but cannot decompose and represent the information in different frequency dimensions. In order to solve this problem, a new method of using FC coefficients in different frequency dimensions as the input features of the existing deep learning was proposed to improve the accuracy of MCI classification. Firstly, the data of the subjects were spliced and then subjected to Multivariate Empirical Model Decomposition (MEMD). Secondly, the FC coefficients in different frequency dimensions were obtained after segmenting. Finally, VGG16 and Long Short-Term Memory (LSTM) network were used for testing. Experimental results show that, when the proposed FC coefficients ars used, the classification accuracy of MCI can reach up to 84.33%, which is 18.33-21.00 percentage points higher than the accuracy with the use of the traditional FC coefficients. In addition, the FC coefficients of different frequency dimensions have different resolutions for MCI.

Respiratory sound recognition of chronic obstructive pulmonary disease patients based on HHT-MFCC and short-term energy

CHANG Zheng, LUO Ping, YANG Bo, ZHANG Xiaoxiao

2021, 41(2): 598-603. DOI: 10.11772/j.issn.1001-9081.2020060881

Asbtract ( )

PDF (1298KB) ( )

References | Related Articles | Metrics

In order to optimize the Mel-Frequency Cepstral Coefficient (MFCC) feature extraction algorithm, improve the recognition accuracy of respiratory sound signals, and achieve the purpose of identifying Chronic Obstructive Pulmonary Disease (COPD), a feature extraction algorithm with the fusion of MFCC based on Hilbert-Huang Transform (HHT) and short-term Energy, named HHT-MFCC+Energy, was proposed. Firstly, the preprocessed respiratory sound signal was used to calculate the Hilbert marginal spectrum and marginal spectrum energy through HHT. Secondly, the spectral energy was passed through the Mel filter to obtain the eigenvector, and then the logarithm and discrete cosine transform of the eigenvector were performed to obtain the HHT-MFCC coefficients. Finally, the short-term energy of signal was fused with the HHT-MFCC eigenvector to form a new feature, and the signal was identified by Support Vector Machine (SVM). Three feature extraction algorithms including MFCC, HHT-MFCC and HHT-MFCC+Energy were combined with SVM to recognize the respiratory sound signal. Experimental results show that the proposed feature fusion algorithm has better respiratory sound recognition effect for both COPD patients and healthy people compared with the other two algorithms:the average recognition rate of the proposed algorithm can reach 97.8% on average when extracting 24-dimensional features and selecting 100 training samples, which is 6.9 percentage points and 1.4 percentage points higher than those of MFCC and HHT-MFCC respectively.

Acceleration compensation based anti-swaying flight control for unmanned aerial vehicle with slung-load

JIAO Hailin, GUO Yuying, ZHU Zhengwei

2021, 41(2): 604-610. DOI: 10.11772/j.issn.1001-9081.2020050740

Asbtract ( )

PDF (3609KB) ( )

References | Related Articles | Metrics

In order to reduce the load swing during the slung-load flight of quadrotor Unmanned Aerial Vehicle (UAV), a new anti-sway control method based on acceleration compensation was developed. First, the nonlinear dynamic equation of the quadrotor UAV hanging system was established based on the Lagrange method, and the energy functions were proposed to design the flight control systems, making the quadrotor UAV track the reference trajectory. Then, the generalized error of the motion trajectory of the slung-load was used to design the anti-sway controller, the acceleration compensation of the quadrotor UAV was carried out to modify the motion trajectory of the quadrotor UAV, thereby reducing the slung-load swing caused by the rapid motion of quadrotor UAV. Finally, some simulations were carried out to compare and analyze the effect of the slung-load flight control before and after acceleration compensation. Simulation results show that, the flight control method based on acceleration compensation can not only ensure the stability of the quadrotor UAV hanging flight, but also provide sufficient stability margin for the flight control system.

Table of Content