Journal of Computer Applications

Transfer learning based hierarchical attention neural network for sentiment analysis

QU Zhaowei, WANG Yuan, WANG Xiaoru

2018, 38(11): 3053-3056. DOI: 10.11772/j.issn.1001-9081.2018041363

Asbtract ( )

PDF (759KB) ( )

References | Related Articles | Metrics

The purpose of document-level sentiment analysis is to predict users' sentiment expressed in the document. Traditional neural network-based methods rely on unsupervised word vectors. However, the unsupervised word vectors cannot exactly represent the contextual relationship of context and understand the context. Recurrent Neural Network (RNN) generally used to process sentiment analysis problems has complex structure and numerous model parameters. To address the above issues, a Transfer Learning based Hierarchical Attention Neural Network (TLHANN) was proposed. Firstly, an encoder was trained to understand the context with machine translation task for generating hidden vectors. Then, the encoder was transferred to sentiment analysis task by concatenating the hidden vector generated by the encoder with the corresponding unsupervised vector. The contextual relationship of context could be better represented by distributed representation. Finally, a two-level hierarchical network was applied to sentiment analysis task. A simplified RNN unit called Minimal Gate Unit (MGU) was arranged at each level leading to fewer parameters. The attention mechanism was used in the model for extracting important information. The experimental results show that, the accuracy of the proposed algorithm is increased by an avervage of 8.7% and 23.4% compared with the traditional neural network algorithm and Support Vector Machine (SVM).

Macroeconomic forecasting method fusing Weibo sentiment analysis and deep learning

ZHAO Junhao, LI Yuhua, HUO Lin, LI Ruixuan, GU Xiwu

2018, 38(11): 3057-3062. DOI: 10.11772/j.issn.1001-9081.2018041346

Asbtract ( )

PDF (994KB) ( )

References | Related Articles | Metrics

The rapid development of modern market economy is accompanied by higher risks. Forecasting regional investment in advance can find investment risks in advance so as to provide reference for investment decisions of countries and enterprises. Aiming at the lag of statistical data and the complexity of internal relations in macroeconomic forecasting, a prediction method of Long Short-Term Memory based on Weibo Sentiment Analysis (SA-LSTM) was proposed. Firstly, considering the strong timeliness of Weibo texts, a method of Weibo text crawling and sentiment analysis was determined to obtain Weibo text sentiment propensity scores. Then total investment in the region was forecasted by combing with structured economic indicators government statistics and Long Short-Term Memory (LSTM) networks. The experimental results in four actual datasets show that SA-LSTM can reduce the relative error of prediction by 4.95, 0.92, 1.21 and 0.66 percentage points after merging Weibo sentiment analysis. Compared with the best method in the four methods of AutoRegressive Integrated Moving Average model (ARIMA), Linear Regression (LR), Back Propagation Neural Network (BPNN), and LSTM, SA-LSTM can significantly reduce the relative error of prediction by 0.06, 0.92, 0.94 and 0.66 percentage points. In addition, the variance of the prediction relative error is the smallest, indicating that the proposed method has good robustness and good adaptability to data jitter.

Hierarchical attention-based neural network model for spam review detection

LIU Yuxin, WANG Li, ZHANG Hao

2018, 38(11): 3063-3068. DOI: 10.11772/j.issn.1001-9081.2018041356

Asbtract ( )

PDF (1130KB) ( )

References | Related Articles | Metrics

Existing measures to detect spam reviews mainly focus on designing features from the perspective of linguistic and psychological clues, which hardly reveal the latent semantic information of the reviews. A Hierarchical Attention-based Neural Network (HANN) model was proposed to mine latent semantic information. The model mainly consisted of the following two layers:the Word2Sent layer, which used a Convolutional Neural Network (CNN) to produce continuous sentence representations on the basis of word embedding, and the Sent2Doc layer, which utilized an attention pooling-based neural network to generate document representations on the basis of sentence representations. The generated document representations were directly employed as features to identify spam reviews. The proposed hierarchical attention mechanism enables our model to preserve position and intensity information completely. Thus, the comprehensive information, history, future, and local context of any position in a document can be extracted. The experimental results show that our method can achieve higher accuracy, compared with neural network-based methods only, the accuracy is increased by 5% on average, and the classification effect is improved significantly.

Optimization and implementation of parallel FP-Growth algorithm based on Spark

GU Junhua, WU Junyan, XU Xinyun, XIE Zhijian, ZHANG Suqi

2018, 38(11): 3069-3074. DOI: 10.11772/j.issn.1001-9081.2018041219

Asbtract ( )

PDF (928KB) ( )

References | Related Articles | Metrics

In order to further improve the execution efficiency of Frequent Pattern-Growth (FP-Growth) algorithm on Spark platform, a new parallel FP-Growth algorithm based on Spark, named BFPG (Better Frequent Pattern-Growth), was presented. Firstly, the grouping strategy F-List was improved in the size of the Frequent Pattern-Tree (FP-Tree) and the amount of partition calculation to ensure that the load sum of each partition was approximately equal. Secondly, the data set partitioning strategy was optimized by creating a list P-List, and then the time complexity was reduced by reducing the traversal times. The experimental results show that the BFPG algorithm improves the mining efficiency of the parallel FP-Growth algorithm, and the algorithm has good scalability.

Text sentiment analysis based on feature fusion of convolution neural network and bidirectional long short-term memory network

LI Yang, DONG Hongbin

2018, 38(11): 3075-3080. DOI: 10.11772/j.issn.1001-9081.2018041289

Asbtract ( )

PDF (906KB) ( )

References | Related Articles | Metrics

Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) are widely used in natural language processing, but the natural language has a certain dependence on the structure, only relying on CNN for text classification will ignore the contextual meaning of words, and there is a problem of gradient disappearance or gradient dispersion in the traditional RNN, which limits the accuracy of text classification. A feature fusion model for CNN and Bidirectional Long Short-Term Memory (BiLSTM) was presented. Local features of text were extracted by CNN and global features related to text were extracted by BiLSTM network. The features extracted by the two complementary models were merged to solve the problem of ignoring the contextual semantic and grammatical information of words in a single CNN model, and the fusion model also effectively avoided the problem of gradient disappearance or gradient dispersion in traditional RNN. The experimental results on two kinds of datasets show that the proposed fusion feature model can effectively improve the accuracy of text classification.

Cross-domain personalized recommendation method based on scoring reliability

QU Liping, WU Jiaxi

2018, 38(11): 3081-3083. DOI: 10.11772/j.issn.1001-9081.2018041390

Asbtract ( )

PDF (589KB) ( )

References | Related Articles | Metrics

In the cross-domain recommendation system, there are cases that some users randomly score the purchased items. Since the number of users who arbitrarily scores is relatively small, when the total number of scorings for the item purchased by the users who arbitrarily score is large, random scorings have less influence on the recommendation effect. However, when the total number of scorings for the item purchased by the users who arbitrarily score is relatively small, random scorings will produce a greater impact on the recommendation effect. To solve this problem, a cross-domain personalized recommendation method based on scoring reliability was proposed. Different thresholds were set for users according to the scoring reliability. When migrating the scores in the auxiliary domain to the target domain, if the total number of scorings for an item by a user was lower than the threshold of the user, the user's scorings of the item were not migrated to the target domain, otherwise migration was performed, thereby reducing the influence of the random scorings on the recommendation effect. The experimental results show that it is better to set the user's threshold personally for different scoring reliability than setting a uniform threshold for all users and not setting a threshold for users.

Sentiment analysis of movie reviews based on dictionary and weak tagging information

FAN Zhen, GUO Yi, ZHANG Zhenhao, HAN Meiqi

2018, 38(11): 3084-3088. DOI: 10.11772/j.issn.1001-9081.2018041245

Asbtract ( )

PDF (804KB) ( )

References | Related Articles | Metrics

Focused on the time-consuming and laborious problem of data annotation in review text sentiment analysis, a new automatic data annotation method was proposed. Firstly, the sentiment tendency of the review text was calculated based on the sentiment dictionary. Secondly, the review text was automatically annotated by using the weak tagging information of the user and the sentiment tendency based on the dictionary. Finally, Support Vector Machine (SVM) was used to classify the sentiment of the review text. The proposed method reached 77.2% and 77.8% respectively in the accuracy of sentiment classification on two types of data sets, which were 1.7 percentage points and 2.1 percentage points respectively higher than those of the method only based on user rating. The experimental results show that the proposed method can improve the classification effect in movie reviews sentiment analysis.

Knowledge graph completion algorithm based on similarity between entities

WANG Zihan, SHAO Mingguang, LIU Guojun, GUO Maozu, BI Jiandong, LIU Yang

2018, 38(11): 3089-3093. DOI: 10.11772/j.issn.1001-9081.2018041238

Asbtract ( )

PDF (784KB) ( )

References | Related Articles | Metrics

In order to solve the link prediction problem of knowledge graph, a shared variable network model named LCPE (Local Combination Projection Embedding) was proposed, which realized the prediction of links by embedding entities and relationships into vector space. By analyzing the Unstructured Model, it was derived that the distance between related entities' embedding was shorter in the vector space, in other words, similar entities were more likely to be related. In LCPE model, ProjE model was used based on similarity between two entities to judge whether the two entities were related and the relation type between them. The experiment shows that with the same number of parameters, the LCPE improves Mean Rank by 11 and lifts Hit@10 0.2 percentage points in dataset WN18 while improves Mean Rank 7.5 and lifts Hit@10 3.05 percentage points in dataset FB15k, which proves that the similarity between entities, as auxiliary information, can improve predictive capability of the ProjE model.

Multi-source text topic mining model based on Dirichlet multinomial allocation model

XU Liyang, HUANG Ruizhang, CHEN Yanping, QIAN Zhisen, LI Wanying

2018, 38(11): 3094-3099. DOI: 10.11772/j.issn.1001-9081.2018041359

Asbtract ( )

PDF (1100KB) ( )

References | Related Articles | Metrics

With the rapid increase of text data sources, topic mining for multi-source text data becomes the research focus of text mining. Since the traditional topic model is mainly oriented to single-source, there are many limitations to directly apply to multi-source. Therefore, a topic model for multi-source based on Dirichlet Multinomial Allocation model (DMA) was proposed considering the difference between sources of topic word-distribution and the nonparametric clustering quality of DMA, namely MSDMA (Multi-Source Dirichlet Multinomial Allocation). The main contributions of the proposed model are as follows:1) it takes into account the characteristics of each source itself when modeling the topic, and can learn the source-specific word distributions of topic k; 2) it can improve the topic discovery performance of high noise and low information through knowledge sharing; 3) it can automatically learn the number of topics within each source without the need for human pre-given. The experimental results in the simulated data set and two real datasets indicate that the proposed model can extract topic information more effectively and efficiently than the state-of-the-art topic models.

Web information timeliness evaluation based on clue characteristics

XU Jing, YANG Xiaoping

2018, 38(11): 3100-3104. DOI: 10.11772/j.issn.1001-9081.2018041355

Asbtract ( )

PDF (796KB) ( )

References | Related Articles | Metrics

The rapid development of the network makes online news become an important source of acquiring information. It has a significant impact on the usability of Web sites whether the information published on the Web sites can reflect the current focus of attention or whether the latest progress of the event on the Web sites can be timely updated. In this paper, the clue development trend on the same subject related to the information on Web sites was obtained from the topic clue sentences identified by Conditional Random Field (CRF) model. The time range of topic clues can be inferred by clue development trends obtained, and further the effective range of Web information is estimated. On this basis, combined with the timeliness the information is published and the freshness of the content on Web sites, the timeliness of Web information can be evaluated reasonably. The experimental results show that the proposed method has a good effect on the timeliness evaluation of information on Web sites.

Feature selection for multi-label distribution learning with streaming data based on rough set

CHENG Yusheng, CHEN Fei, WANG Yibin

2018, 38(11): 3105-3111. DOI: 10.11772/j.issn.1001-9081.2018041275

Asbtract ( )

PDF (1135KB) ( )

References | Related Articles | Metrics

Traditional feature selection algorithm cannot process streaming feature data, the redundancy calculation is complicated and the description of the instance is not accurate enough. A multi-label Distribution learning Feature Selection with Streaming Data Using Rough Set (FSSRS) was proposed to solve the above problem. Firstly, the online streaming feature selection framework was introduced into multi-label learning. Secondly, the original conditional probability was replaced by the dependency in rough set theory, which made the streaming data feature selection algorithm more efficient and faster than before by only using the information calculation of the data itself. Finally, since each label has a different degree of description for the same instance in real world, to make the description of the instance more accurate, label distribution was used to instead of traditional logical labels. The experimental results show that the proposed algorithm can retain the features with high correlation with the label space, so that the classification accuracy is improved to a certain extent compared with that without feature selection.

Genetic instance selection algorithm for K-nearest neighbor classifier

HUANG Yuyang, DONG Minggang, JING Chao

2018, 38(11): 3112-3118. DOI: 10.11772/j.issn.1001-9081.2018041337

Asbtract ( )

PDF (1063KB) ( )

References | Related Articles | Metrics

Traditional instance selection algorithms may remove non-noise samples by mistake and have low algorithm efficiency. For this issue, a genetic instance selection algorithm for K-Nearest Neighbor (KNN) classifier was proposed. A two-stage selection mechanism based on decision tree and genetic algorithm was used in the algorithm. Firstly, the decision tree was used to determine the range of noise samples. Then, the genetic algorithm was used to remove the noise samples in this range precisely, which could reduce the risk of mistaken remove effectively and improve the algorithm efficiency. Secondly, the 1NN-based selection strategy of validation set was proposed to improve the instance selection accuracy of the genetic algorithm. Finally, the MSE (Mean Squared Error)-based objective function was used as the fitness function, which could improve the effectiveness and stability of the algorithm. Compared with PRe-classification based KNN (PRKNN), Instance and Feature Selection based on Cooperative Coevolution (IFS-CoCo), K-Nearest Neighbors (KNN), the improvement in classification accuracy is 0.07 to 26.9 percentage points, 0.03 to 11.8 percentage points and 0.2 to 12.64 percentage points respectively, the improvement in AUC (Area Under Curve) and Kappa is 0.25 to 18.32 percentage points, 1.27 to 23.29 percentage points, and 0.04 to 12.82 percentage points respectively. The experimental results show that the proposed method has advantages in terms of classification accuracy and classification efficiency.

Adjusted cluster assumption and pairwise constraints jointly based semi-supervised classification method

HUANG Hua, ZHENG Jiamin, QIAN Pengjiang

2018, 38(11): 3119-3126. DOI: 10.11772/j.issn.1001-9081.2018041220

Asbtract ( )

PDF (1174KB) ( )

References | Related Articles | Metrics

When samples from different classes over classification boundary are seriously overlapped, cluster assumption may not well reflect the real data distribution, so that semi-supervised classification methods based cluster assumption may yield even worse performance than their supervised counterparts. For the above unsafe semi-supervised classification problem, an Adjusted Cluster Assumption and Pairwise Constraints Jointly based Semi-Supervised Support Vector Machine classification method (ACA-JPC-S3VM) was proposed. On the one hand, the distances of individual unlabeled instances to the distribution boundary were considered in learning, which alleviated the degradation of the algorithm performance in such cases to some extent. On the other hand, the information of pairwise constraints was introduced to the algorithm to make up for its insufficient use of supervision information. The experimental results on the UCI dataset show that the performance of ACA-JPC-S3VM method would never be lower than that of SVM (Support Vector Machine), and the average accuracy is 5 percentage points higher than that of SVM when the number of labeled samples is 10. The experimental results on the image classification dataset show that the semi-supervised classification methods such as TSVM (Transductive SVM) have different degrees of unsafety learning (similar or worse performance than SVM) while ACA-JPC-S3VM can learn safely. Therefore, ACA-JPC-S3VM has better safety and correctness.

Multi-dimensional text clustering with user behavior characteristics

LI Wanying, HUANG Ruizhang, DING Zhiyuan, CHEN Yanping, XU Liyang

2018, 38(11): 3127-3131. DOI: 10.11772/j.issn.1001-9081.2018041357

Asbtract ( )

PDF (970KB) ( )

References | Related Articles | Metrics

Traditional multi-dimensional text clustering generally extracts features from text contents, but seldom considers the interaction information between users and text data, such as likes, forwards, reviews, concerns, references, etc. Moreover, the traditional multi-dimension text clustering mainly integrates linearly multiple spatial dimensions and fails to consider the relationship between attributes in each dimension. In order to effectively use text-related user behavior information, a Multi-dimensional Text Clustering with User Behavior Characteristics (MTCUBC) was proposed. According to the principle that the similarity between texts should be consistent in different spaces, the similarity was adjusted by using the user behavior information as the constraints of the text content clustering, and then the distance between the texts was improved by the metric learning method, so that the clustering effect was improved. Extensive experiments conduct and verify that the proposed MTCUBC model is effective, and the results present obvious advantages in high-dimensional sparse data compared to linearly combined multi-dimensional clustering.

Balanced clustering based on simulated annealing and greedy strategy

TANG Haibo, LIN Yuming, LI You, CAI Guoyong

2018, 38(11): 3132-3138. DOI: 10.11772/j.issn.1001-9081.2018041338

Asbtract ( )

PDF (1065KB) ( )

References | Related Articles | Metrics

Concerning the problem that clustering results are usually required to be balanced in practical applications, a Balanced Clustering algorithm based on Simulated annealing and Greedy strategy (BCSG) was proposed. The algorithm includes two steps:Simulated Annealing Clustering Initialization (SACI) and Balanced Clustering based on Greedy Strategy (BCGS) to improve clustering effectiveness with less time cost. First of all, K suitable data points of data set were located based on simulated annealing as the initial point of balanced clustering, and the nearest data points to each center point were added into the cluster where it belongs in stages greedily until the cluster size reach the upper limit. A series of experiments carried on six UCI real datasets and two public image datasets show that the balance degree can be increased by more than 50 percentage points compared with Fuzzy C-Means when the number of clusters is large, and the accuracy of clustering result is increased by 8 percentage points compared with Balanced K-Means and BCLS (Balanced Clustering with Least Square regression) which have good balanced clustering performance. Meanwhile, the time complexity of the BCSG is also lower, the running time is decreased by nearly 40 percentage points on large datasets compared with Balanced K-Means. BCSG has better clustering effectiveness with less time cost than other balanced clustering algorithms.

Semi-supervised K-means clustering algorithm based on active learning priors

CHAI Bianfang, LYU Feng, LI Wenbin, WANG Yao

2018, 38(11): 3139-3143. DOI: 10.11772/j.issn.1001-9081.2018041251

Asbtract ( )

PDF (827KB) ( )

References | Related Articles | Metrics

Iteration-based Active Semi-Supervised Clustering Framework (IASSCF) is a popular semi-supervised clustering framework. There are two problems in this framework. The initial prior information is too less, which leads to poor clustering results in the initial iteration and infects the subsequent clustering. In addition, in each iteration only the sample with the largest information is selected to label, which results in a slow speed and improvement of the performance. Aiming to the existing problems, a semi-supervised K-means clustering algorithm based on active learning priors was designed, which consisted of initializing phase and iterating phase. In the initializing phase, the representative samples were selected actively to build an initial neighborhood set and a constraint set. Each iteration in iterating phase includes three steps:1) Pairwise Constrained K-means (PCK-means) was used to cluster data based on the current constraints. 2) Unlabeled samples with the largest information in each cluster were selected based on the clustering results. 3) The selected samples were extended into the neighborhood set and the constraint set. The iterating phase ends until the convergence thresholds were reached. The experimental results show that the proposed algorithm runs faster and has better performance than the algorithm based on the original IASSCF framework.

Probabilistic soft logic reasoning model with semi-automatic rule learning

ZHANG Jia, ZHANG Hui, ZHAO Xujian, YANG Chunming, LI Bo

2018, 38(11): 3144-3149. DOI: 10.11772/j.issn.1001-9081.2018041308

Asbtract ( )

PDF (1047KB) ( )

References | Related Articles | Metrics

Probabilistic Soft Logic (PSL), as a kind of declarative rule-based probability model, has strong extensibility and multi-domain adaptability. So far, it requires a lot of common sense and domain knowledge as preconditions for rule establishment. The acquisition of these knowledge is often very expensive and the incorrect information contained therein may reduce the correctness of reasoning. In order to alleviate this dilemma, the C5.0 algorithm and probabilistic soft logic were combined to make the data and knowledge drive the reasoning model together, and a semi-automatic learning method was proposed. C5.0 algorithm was used to extract rules, and artificial rules and optimized adjusted rules were supplemented as improved probabilistic soft logic input. The experimental results show that the proposed method has higher accuracy than the C5.0 algorithm and the PSL without rule learning on student performance prediction. Compared with the past method with pure hand-defined rules, the proposed method can significantly reduce the manual costs. Compared with Bayesian Network (BN), Support Vector Machine (SVM) and other algorithms, the proposed method also shows good results.

Stipend prediction based on enhanced-discriminative canonical correlations analysis and classification ensemble

ZHANG Fangjuan, YANG Yan, DU Shengdong

2018, 38(11): 3150-3155. DOI: 10.11772/j.issn.1001-9081.2018041259

Asbtract ( )

PDF (893KB) ( )

References | Related Articles | Metrics

For low efficiency and high workload of higher education institution's stipend management, an algorithm of Enhanced-Discriminative Canonical Correlations Analysis (EN-DCCA) was proposed, and the method of classification ensemble was combined to predict the stipend of undergraduates. The multi-dimensional data of undergraduates at school were divided into two different views. The existing multi-view discriminative canonical correlation analysis algorithms do not consider both the correlation between view categories and the discrimination of view's combination features. The optimization goal of EN-DCCA was to minimize inter-class correlation while maximizing intra-class correlation and considered the discrimination of view's combination features, which further enhanced the performance of attribute identification and was more conducive to classification prediction. The process of undergraduates' stipend prediction is as follows:firstly, according to undergraduates' learning behavior and living behavior at school, the data was preprocessed as two different views. Then, the two views were learned by EN-DCCA. Finally, the classification ensemble was used to complete the prediction. Experimented on a real data set, the prediction accuracy of the proposed method reached 90.01%, which was 2 percentage points higher than that of Combined-feature-discriminability Enhanced Canonical Correlation Analysis (CECCA) method. The experimental results show that the proposed method can effectively achieve the stipend prediction for higher education institutions.

Feature selection algorithm based on multi-objective bare-bones particle swarm optimization

ZHANG Cuijun, CHEN Beibei, ZHOU Chong, YIN Xinge

2018, 38(11): 3156-3160. DOI: 10.11772/j.issn.1001-9081.2018041358

Asbtract ( )

PDF (908KB) ( )

References | Related Articles | Metrics

Concerning there are a lot of redundant features classified in data which not only affect the classification accuracy, but also reduce classification speed, a feature selection algorithm based on multi-objective Bare-bones Particle Swarm Optimization (BPSO) was proposed to obtain the tradeoff between the number of feature subsets and the classification accuracy. In order to improve the efficiency of the multi-objective BPSO, firstly an external archive was used to guide the update direction of the particle, and then the search space of the particle was improved by a mutation operator. Finally, the multi-objective BPSO was applied to feature selection problems, and the classification performance and the number of selected features of the K Nearest Neighbors (KNN) classifier were used as feature selection criteria. The experiments were performed on 12 datasets of UCI datasets and gene expression datasets. The experimental results show that the feature subset selected by the proposed algorithm has better classification performance, the maximum error rate of the minimum classification can be reduced by 7.4%, and the maximum execution speed of the classification algorithm can be shortened by 12 s at most.

Person re-identification based on siamese network and reranking

CHEN Shoubing, WANG Hongyuan, JIN Cui, ZHANG Wei

2018, 38(11): 3161-3166. DOI: 10.11772/j.issn.1001-9081.2018041223

Asbtract ( )

PDF (904KB) ( )

References | Related Articles | Metrics

Person Re-Identification (Re-ID) under non-overlapping multi-camera is easily affected by illumination, posture, and occlusion, and there are image mismatches in the experimental process. A Re-ID method based on siamese network and reranking was proposed. Firstly, a pair of pedestrian training images were given, a discriminative Convolutional Neural Network (CNN) feature and similarity measure could be simultaneously learned by the siamese network to predict the pedestrian identity of the two input images and determine whether they belonged to the same pedestrian. Then, the k-reciprocal neighbor method was used to reduce the image mismatches. Finally, Euclidean distance and Jaccard distance were weighted to rerank the sorted list. Several experiments were performed on the datasets Market1501 and CUHK03. The experimental results show that the Rank1 (the probability of matching successfully for the first time) reaches 83.44% and mAP (mean Average Precision) is 68.75% under Single Query on Market1501. In the case of single-shot on CUHK03, the Rank1 reaches 85.56% and mAP is 88.32%, which are significantly higher than those of the traditional methods based on feature representation and metric learning.

Multi-label feature selection algorithm based on Laplacian score

HU Minjie, LIN Yaojin, WANG Chenxi, TANG Li, ZHENG Liping

2018, 38(11): 3167-3174. DOI: 10.11772/j.issn.1001-9081.2018041354

Asbtract ( )

PDF (1178KB) ( )

References | Related Articles | Metrics

Aiming at the problem that the traditional Laplacian score for feature selection cannot be directly applied to multi-label tasks, a multi-label feature selection algorithm based on Laplacian score was proposed. Firstly, the sample similarity matrix was reconstructed by the correlation of the common and non-correlated correlations of the samples in the overall label space. Then, the correlation and redundancy between features were introduced into Laplacian score, and a forward greedy search strategy was designed to evaluate the co-operation ability between candidate features and selected features, which was used to evaluate the importance of candidate features. Finally, extensive experiments were conducted on six multi-label data sets with five different evaluation criteria. The experimental results show that compared with Multi-label Dimensionality reduction via Dependence Maximization (MDDM), Feature selection for Multi-Label Naive Bayes classification (MLNB) and feature selection for multi-label classification using multivariate mutual information (PMU), the proposed algorithm not only has the best classification performance, but also has a remarkable performance of up to 65%.

Automatic cloud detection algorithm based on deep belief network-Otsu hybrid model

QIU Meng, YIN Haoyu, CHEN Qiang, LIU Yingjian

2018, 38(11): 3175-3179. DOI: 10.11772/j.issn.1001-9081.2018041350

Asbtract ( )

PDF (996KB) ( )

References | Related Articles | Metrics

More than half of the earth's surface is covered by cloud. Current cloud detection methods from satellite remote sensing imageries are mainly manual or semi-automatic, depending upon manual intervention with low efficiency. Such methods can hardly be utilized in real-time or quasi real-time applications. To improve the availability of satellite remote sensing data, an automatic cloud detection method based on Deep Belief Network (DBN) and Otsu's method was proposed, named DOHM (DBN-Otsu Hybrid Model). The main contribution of DOHM is to replace the empirical fixed thresholds with adaptive ones, therefore achieve full-automatic cloud detection and increase the accuracy to greater than 95%. In addition, a 9-dimensional feature vector is adopted in network training. Diversity of the input feature vector helps to capture the characteristics of cloud more effectively.

Analysis of key factors in heat demand prediction based on NARX neural network

XIE Jiyang, YAN Dong, XIE Yao, MA Zhanyu

2018, 38(11): 3180-3187. DOI: 10.11772/j.issn.1001-9081.2018041222

Asbtract ( )

PDF (1202KB) ( )

References | Related Articles | Metrics

In District Heating (DH) network, accurate heat demand prediction has been considered as an important part for efficiency improvement and cost saving. In order to improve the prediction accuracy, it is extremely important to study the influence of different factors on heat load forecasting. In this paper, the Nonlinear AutoRegressive with eXogenous input (NARX) neural network models were trained using the datasets with different key factors and used to compare their prediction performance in order to investigate the impact of direct solar radiance and wind speed on heat demand prediction. The experimental results show that direct solar radiance and wind speed are key factors of heat demand prediction. Including wind speed only, the Mean Absolute Percentage Error (MAPE) and Root Mean Squared Error (RMSE) of the proposed prediction model are lower than those of direct solar radiation only. Including both wind speed and direct solar radiance shows the best model performance, but it cannot result in a large decrease of MAPE and RMSE.

Path planning algorithm based on distance and slope in regular Grid digital elevation model

ZHANG Runlian, ZHANG Xin, ZHANG Chuyun, XI Yuang

2018, 38(11): 3188-3192. DOI: 10.11772/j.issn.1001-9081.2018041340

Asbtract ( )

PDF (985KB) ( )

References | Related Articles | Metrics

Aiming at the low efficiency of A^* algorithm in Digital Elevation Model (DEM) path planning, an improved A^* algorithm based on distance and slope was proposed. A new evaluation function were designed by using distance and slope regarded as evaluation indexes in regular grid digital elevation model, and the pathability of surface barrier was judged. And in order to ensure that the improved algorithm was adaptive to the changing of the resolution ratio for DEM data, the parameters of the evaluation function were calculated according to the DEM data of the actual scene in the path searching process. Finally, a dynamic weight was changed with the changing of path searching, which could optimize path selection by adjusting the influence of completeness function and heuristic function on evaluation result. The simulation results show that the improved algorithm can adapt to the changing of DEM resolution by parameter adjustment, search the optimized path, reduce the search time and improve the search efficiency.

Deep sparse auto-encoder method using extreme learning machine for facial features

ZHANG Huanhuan, HONG Min, YUAN Yubo

2018, 38(11): 3193-3198. DOI: 10.11772/j.issn.1001-9081.2018041274

Asbtract ( )

PDF (1002KB) ( )

References | Related Articles | Metrics

Focused on the problem of low recognition in recognition systems caused by the inaccuracy of input features, an efficient Deep Sparse Auto-Encoder (DSAE) method using Extreme Learning Machine (ELM) for facial features was proposed. Firstly, truncated nuclear norm was used to construct loss function, and sparse features of face images were extracted by minimizing loss function. Secondly, self-encoding of facial features was used by Extreme Learning Machine Auto-Encoder (ELM-AE) model to achieve data dimension reduction and noise filtering. Thirdly, the optimal depth structure was obtained by minimizing the empirical risk. The experimental results on ORL, IMM, Yale and UMIST datasets show that the DSAE method not only has higher recognition rate than ELM, Random Forest (RF), etc. on high-dimensional face images, but also has good generalization performance.

Image automatic annotation based on transfer learning and multi-label smoothing strategy

WANG Peng, ZHANG Aofan, WANG Liqin, DONG Yongfeng

2018, 38(11): 3199-3203. DOI: 10.11772/j.issn.1001-9081.2018041349

Asbtract ( )

PDF (960KB) ( )

References | Related Articles | Metrics

In order to solve the problem of imbalance of label distribution in an image dataset and improve the annotation performance of rare labels, a Multi Label Smoothing Unit (MLSU) based on label smoothing strategy was proposed. High-frequency labels in the dataset were automatically smoothed during training the network model, so that the network appropriately raised the output value of low-frequency labels, thus, the annotation performance of low-frequency labels was improved. Focusing on the problem that the number of images was insufficient in the dataset for image annotation, a Convolutional Neural Network (CNN) model based on transfer learning was proposed. Firstly, the deep convolutional neural network was pre-trained by using the large public image datasets on the Internet. Then, the target dataset was used to fine-tune the network parameters, and a Convolutional Neural Network model using Multi-Label Smoothing Unit (CNN-MLSU) was established. Experiments were carried out on the benchmark image annotation datasets Corel5K and the IAPR TC-12 respectively. The experimental results show that the average accuracy and average recall of the proposed method are 5 percentage points and 8 percentage points higher than those of the Convolutional Neural Network Regression (CNN-R) on the Corel5K dataset. And on the IAPR TC-12 dataset, the average recall of the proposed method has increased by 6 percentage points compared with the Two-Pass K-Nearest Neighbor (2PKNN_ML). The results show that the CNN-MLSU method based on transfer learning can effectively prevent the over-fitting of network and improve the annotation performance of low-frequency labels.

Time series anomaly detection method based on frequent pattern discovery

LI Hailin, WU Xianli

2018, 38(11): 3204-3210. DOI: 10.11772/j.issn.1001-9081.2018041252

Asbtract ( )

PDF (1091KB) ( )

References | Related Articles | Metrics

Aiming at the low efficiency of traditional anomaly detection methods in processing incremental time series, an Time Series Anomaly Detection method based on frequent pattern discovery (TSAD) was proposed. Firstly, the historical input time series data were transformed into symbols. Secondly, the frequent patterns of historical sequence data sets were found by symbolic features. Finally, the similarity between the frequent pattern and the current new time series data was measured with the longest common subsequence matching method, the abnormal patterns in the newly added data were found. Compared with Time Series Outlier Detection based on sliding window prediction (TSOD) and Extended Symbolic Aggregate Approximation based anomaly mining of hydrological time series (ESAA), the detection rate of TSAD is more than 90% for the three types of time series data selected by the experiment. TSOD has a higher detection rate for more regular sequences, and can reach 99%. But the detection rate of noisy sequences is lower, and the data bias is stronger; and the data detection rate of three types of ESAA is not more than 70%. The experimental results show that TSAD can detect abnormal patterns of time series well.

Novel virtual boundary detection method based on deep learning

LAI Chuanbin, HAN Yuexing, GU Hui, WANG Bing

2018, 38(11): 3211-3215. DOI: 10.11772/j.issn.1001-9081.2018041347

Asbtract ( )

PDF (875KB) ( )

References | Related Articles | Metrics

Traditional edge detection methods can not accurately detect the Virtual Boundary (VB) between different regions in materials microscopic images. In order to solve this problem, a virtual boundary detection model based on Convolutional Neural Network (CNN) called Virtual Boundary Net (VBN) was proposed. The VGGNet (Visual Geometry Group Net) deep learning model was simplified, and dropout and Adam algorithms were applied in the training process. An image patch centered on each pixel in the image was extracted as the input, and the class of the image patch was output to decide whether the center pixel belongs to the virtual boundary or not. In the experiment of virtual boundary detection for two kinds of material images, the average detection precision of this method reached 92.5%, and the average recall rate reached 89.5%. The experimental results prove that the VBN can detect the virtual boundary in the image accurately and effectively, which is an alternative method to low-efficient manual analysis.

Image aesthetic quality assessment method based on semantic perception

YANG Wenya, SONG Guangle, CUI Chaoran, YIN Yilong

2018, 38(11): 3216-3220. DOI: 10.11772/j.issn.1001-9081.2018041221

Asbtract ( )

PDF (866KB) ( )

References | Related Articles | Metrics

Current researches on the assessment of image aesthetic quality are based on visual content of images to give assessment results, ignoring the fact that aesthetics is a person's cognitive activity and not considering the user's understanding towards image semantic information during the evaluating process. In order to solve this problem, an approach to image aesthetic quality assessment based on semantic perception was proposed to apply both the object category information and scene category information of images to the aesthetic quality assessment. Using the transfer learning concept, a hybrid network integrating multiple features of the images was constructed. For each input image, the object category features, scene category features, and aesthetic features were extracted respectively by network, and the three features were combined to achieve better image aesthetic quality evaluation. The classification accuracy of the method on the AVA data set reached 89.5%, which was 19.9% higher than that of the traditional method, and the generalization performance on the CUKHPQ data set was greatly improved. The experimental results show that the proposed approach can achieve better classification performance on the aesthetic quality evaluation of images.

Prediction of Parkinson’s disease based on multi-task regression of model filtering

LIU Feng, JI Wei, LI Yun

2018, 38(11): 3221-3224. DOI: 10.11772/j.issn.1001-9081.2018041329

Asbtract ( )

PDF (750KB) ( )

References | Related Articles | Metrics

The traditional speech-based Parkinson's Disease (PD) prediction method is to predict the motor Unified Parkinson's Disease Rating Scale (motor-UPDRS) and the total Unified Parkinson's Disease Rating Scale (total-UPDRS) respectively. In order to solve the problem that the traditional method could not use the shared information between tasks and the poor prediction performance in the process of single task prediction, a multi-task regression method based on model filtering was proposed to predict the motor-UPDRS and total-UPDRS of Parkinson's disease patients. Firstly, considering the different effects of the subtask speech features on the predicted motor-UPDRS and total-UPDRS, an L1 regularization term was added for feature selection. Secondly, according to different Parkinson's patient objects distributed in different domains, a filtering mechanism was added to improve the prediction accuracy. In the simulation experiments of remote Parkinson data set, the Mean Absolute Error (MAE) of motor-UPDRS is 67.2% higher than that of the Least Squares (LS) method. Compared with the Classification And Regression Tree (CART) in the single task, the motor value increased by 64% and the total value increased by 78.4%. The results of experiment show that multi-task regression based on model filtering is superior to the single task regression algorithm for UPDRS prediction.

Lung tumor image recognition algorithm based on cuckoo search and deep belief network

YANG Jian, ZHOU Tao, GUO Lifang, ZHANG Feifei, LIANG Mengmeng

2018, 38(11): 3225-3230. DOI: 10.11772/j.issn.1001-9081.2018041244

Asbtract ( )

PDF (957KB) ( )

References | Related Articles | Metrics

Due to random initialization of the weights, Deep Belief Network (DBN) easily falls into a local optimum, the Cuckoo Search (CS) algorithm was introduced into the traditional DBN model and a lung cancer image recognition algorithm based on CS-DBN was proposed. Firstly, the global optimization ability of CS was used to optimize initial weights of DBN, and on this basis, the layer-by-layer pre-training of DBN was performed. Secondly, the whole network was fine-tuned by using Back Propagation (BP) algorithm, so that the network weights were optimized. Finally, the CS-DBN was applied to the identification of lung tumor images, and CS-DBN was compared with traditional DBN from the four perspectives of Restricted Boltzmann Machine (RBM) training times, training batch sizes, DBN hidden layers numbers, and hidden layer nodes to verify the feasibility and effectiveness of the algorithm. The experimental results show that the recognition accuracy of CS-DBN is obviously higher than that of traditional DBN. Under the conditions of different RBM training times, training batch sizes, DBN hidden layer numbers, and hidden layer nodes, the increase range of CS-DBN identification accuracy over traditional DBN are 1.13 to 4.33, 2 to 3.34, 1.07 to 3.34 and 1.4 to 3.34 percentage points respectively. CS-DBN can improve the accuracy of lung tumor recognition to a certain extent, thereby improving the performance of computer-aided diagnosis of lung tumors.

Network intrusion detection system based on improved moth-flame optimization algorithm

XU Hui, FANG Ce, LIU Xiang, YE Zhiwei

2018, 38(11): 3231-3235. DOI: 10.11772/j.issn.1001-9081.2018041315

Asbtract ( )

PDF (900KB) ( )

References | Related Articles | Metrics

Due to a large amount of data and high dimension in currently network intrusion detection, a Moth-Flame Optimization (MFO) algorithm was applied to the feature selection of network intrusion detection. Since MFO algorithm converges fast and easy falls into local optimum, a Binary Moth-Flame Optimization integrated with Particle Swarm Optimization (BPMFO) algorithm was proposed. On one side, the spiral flight formula of the MFO algorithm was introduced to obtain strong local search ability. On the other side, the speed updating formula of the Particle Swarm Optimization (PSO) algorithm was combined to make the individual to move in the direction of global optimal solution and historical optimal solution, in order to increase the global convergence and avoid to fall into local optimum. By adopting KDD CUP 99 data set as the experimental basis, using three classifiers of Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Naive Bayesian Classifier (NBC), Binary Moth-Flame Optimization (BMFO), Binary Particle Swarm Optimization (BPSO), Binary Genetic Algorithm (BGA), Binary Grey Wolf Optimization (BGWO) and Binary Cuckoo Search (BCS) were compared in the experiment. The experimental results show that, BPMFO algorithm has obvious advantages in the comprehensive performance including algorithm accuracy, operation efficiency, stability, convergence speed and jumping out of local optima when it is applied to the feature selection of network intrusion detection.

Pigmented skin lesion recognition and classification based on deep convolutional neural network

HE Xueying, HAN Zhongyi, WEI Benzheng

2018, 38(11): 3236-3240. DOI: 10.11772/j.issn.1001-9081.2018041224

Asbtract ( )

PDF (810KB) ( )

References | Related Articles | Metrics

Currently, the recognition and classification of skin lesions faces two major challenges. First, the wide variety of skin lesions, the high similarity between different classes, and the large differences within the same class, especially pigmented skin lesions, make it difficult to identify and classify skin lesions. Second, as the limitations of the recognition algorithms of skin lesions, the recognition rates of the algorithms need to be further improved. To this end, an end-to-end structured deep Convolutional Neural Network (CNN) model was trained based on VGG19 network to achieve automated recognition and classification of pigmented skin lesions. Firstly, a data augmentation method (random crop, flip, mirror) was used for data preprocessing. Then, the pre-trained model from ImageNet was transferred to the augmented data samples to fine-tune the parameters. Meanwhile, by setting a weight of Softmax loss, the loss of minority class discriminant errors was increased to effectively alleviate the class-imbalance problem in the dataset. As a result, the recognition rate of the model was improved. Experiments were implemented on the dataset ISIC2017 using the deep learning framework PyTorch. The experimental results show that the recognition rate and sensitivity of the proposed method can reach 71.34% and 70.01%, respectively, which are 2.84 and 11.68 percentage points higher than those without the weight of Softmax loss. It is confirmed that our method is effective in the recognition and classification of skin lesions.

Plant image recoginiton based on family priority strategy

CAO Xiangying, SUN Weimin, ZHU Youxiang, QIAN Xin, LI Xiaoyu, YE Ning

2018, 38(11): 3241-3245. DOI: 10.11772/j.issn.1001-9081.2018041309

Asbtract ( )

PDF (819KB) ( )

References | Related Articles | Metrics

Plant recognition includes two kinds of tasks:specimen recognition and real-environment recognition. Due to the existence of background noise, real-environment plant image recognition is more difficult. To reduce the weight of Convolutional Neural Networks (CNN), to improve over-fitting, to improve the recognition rate and generalization ability, a method of plant identification with Family Priority (FP) was proposed. Combined with the lightweight CNN MobileNet model, a plant recognition model Family Priority MobileNet (FP-MobileNet) was established by means of migration learning. On the single background plant dataset flavia, the MobileNet model achieved 99.8% of accuracy. For the more challenging real-environment flower dataset flower102, when the number of samples in the training set was greater than that in the test set FP-MobileNet achieved 99.56% of accuracy. When the number of samples in the training set was smaller than that in the test set, FP-MobileNet still obtained 95.56% of accuracy. The experimental results show that the accuracies of FP-MobileNet under two different data set partitioning schemes are both higher than those of the pure MobileNet model. In addition, FP-MobileNet weighs only occupy 13.7 MB with high recognition rate. It takes into account both accuracy and delay, and is suitable for promotion to mobile devices that require a lightweight model.

Improvement and analysis of LAN security association scheme based on pre-shared key

XIAO Yuelei, WU Junsheng, ZHU Zhixiang

2018, 38(11): 3246-3251. DOI: 10.11772/j.issn.1001-9081.2018040896

Asbtract ( )

PDF (1018KB) ( )

References | Related Articles | Metrics

For the communication waste of the exchange key establishment process in Local Area Network (LAN) security association scheme based on pre-shared key, an improved LAN security association scheme was proposed. A pairwise key between a new added switch and the authentication server was generated by improving the authentication and unicast key agreement process based on pre-shared key, and was used to the exchange key agreement processes between the new added switch and other nonadjacent switches. Then, on basis of the above improved scheme, a LAN security association scheme in trusted computing environment was put forward. In the improved authentication and unicast key negotiation process based on pre-shared key, the platform authentication of the terminal device was further increased, thereby realizing the trusted network access of the terminal device, and effectively enhancing the security of the LAN. Finally, the two LAN security association schemes were proved secure in the Strand Space Model (SSM). The results of performance comparison analysis show that the improved scheme reduces the number of exchanged messages and computation complexity of the exchange key agreement processes.

Trajectory privacy-preserving method based on information entropy suppression

WANG Yifei, LUO Yonglong, YU Qingying, LIU Qingqing, CHEN Wen

2018, 38(11): 3252-3257. DOI: 10.11772/j.issn.1001-9081.2018040861

Asbtract ( )

PDF (1005KB) ( )

References | Related Articles | Metrics

Aiming at the problem of poor data anonymity and large data loss caused by excessive suppression of traditional high-dimensional trajectory privacy protection model, a new trajectory-privacy method based on information entropy suppression was proposed. A flowgraph based on entropy was generated for the trajectory dataset, a reasonable cost function according to the information entropy of spatio-temproal points was designed, and the privacy was preserved by local suppression of spatio-temproal points. Meanwhile, an improved algorithm for comparing the similarity of flowgraphs before and after suppression was proposed, and a function for evaluating the privacy gains was introduced.Finally, the proposed method was compared with the LK-Local (Length K-anonymity based on Local suppression) approach in trajectory privacy and data practicability. The experimental results on a synthetic subway transportation system dataset show that, with the same anonymous parameter value the proposed method increases the similarity measure by about 27%, reduces the data loss by about 25%, and increases the privacy gain by about 21%.

Anti-sniffering attack method based on software defined network

ZHANG Chuanhao, GU Xuehui, MENG Caixia

2018, 38(11): 3258-3262. DOI: 10.11772/j.issn.1001-9081.2018040836

Asbtract ( )

PDF (986KB) ( )

References | Related Articles | Metrics

In network sniffing attacks, attackers capture and analyze network communication data from network nodes or links, monitor network status and steal sensitive data such as usernames and passwords. In an ongoing attack, the attacker is usually in a silent state, traditional network protection methods such as firewalls, Intrusion Detection System (IDS), or Intrusion Prevention System (IPS) are difficult to detect and defend against it. A Dynamic Path Hopping (DPH) mechanism based on Software Defined Network (SDN) was proposed to solve this problem. In DPH, the paths of communication nodes were dynamically changed according to constraints of space and time, and the communication traffic was evenly distributed in multiple transmission paths, which increased the difficulty of obtaining complete data in the network sniffing attack. The experimental and performance simulation results show that under a certain network scale, DPH can effectively defend sniffer attacks without significantly reducing network transmission performance.

Trajectory privacy protection method based on district partitioning

GUO Liangmin, WANG Anxin, ZHENG Xiaoyao

2018, 38(11): 3263-3269. DOI: 10.11772/j.issn.1001-9081.2018050975

Asbtract ( )

PDF (1029KB) ( )

References | Related Articles | Metrics

Aiming at the vulnerability to continuous query attacks in the methods based on k-anonymity and difficultly in constructing anonymous region when the number of users is few, a method for trajectory privacy protection based on district partitioning was proposed. A user-group that has the history query points of a particular district was obtained by using a third-party auxiliary server, and the historical query points were downloaded from the users in the user-group by P2P protocol. Then the query result was searched in the historical query information to improve the query efficiency. In addition, a pseudo query point was sent to confuse attackers, and the multiple query points were hidden in the same sub-district by district partitioning to keep the attackers from reconstructing real trajectory of the user to ensure security. The experimental results show that the proposed method can improve the security of user trajectory privacy with the increases of distance and cache time. Compared to the Collaborative Trajectory Privacy Preserving (CTPP) method, when the number of users is 1500, the security is averagely increased about 50% and the query efficiency is averagely improved about 35% (the number of sub-districts is 400).

Analysis and design of uplink resource scheduling in narrow band Internet of things

CHEN Fatang, XING Pingping, YANG Yanjuan

2018, 38(11): 3270-3274. DOI: 10.11772/j.issn.1001-9081.2018040849

Asbtract ( )

PDF (942KB) ( )

References | Related Articles | Metrics

Narrow Band Internet of Things (NB-IoT) technology is developing rapidly. Compared with the original wireless communication, the spectrum bandwidth of NB-IoT is only 180 kHz. Therefore, how to use resources or spectrum more efficiently (ie. resource allocation and scheduling) becomes a key issue for NB-IoT technology. In order to solve this problem, the related factors of NB-IoT uplink resource scheduling were analyzed, including resource allocation, power control and uplink transmission gap, and different options for comparison to select the optimal scheme were provided. In addition, modulation and coding scheme and the selection of the number of repeated transmissions were also analyzed in detail. A greedy-stable selection modulation and coding strategy based on different coverage levels and power headroom report were proposed, with which modulation and coding level was initially selected. A compensation factor was introduced to select the number of retransmissions and the update of the modulation and coding level. Finally, the proposed scheme was simulated. The simulation results show that the proposed scheme can save more than 56% of the activity time and 46% of the resource consumption compared with the direct transmission method.

Broadcasting energy-efficiency optimization algorithms for asymmetric duty-cycled sensor network

XU Lijie

2018, 38(11): 3275-3281. DOI: 10.11772/j.issn.1001-9081.2018040793

Asbtract ( )

PDF (1193KB) ( )

References | Related Articles | Metrics

Focused on the energy-efficiency optimization problem of broadcasting with minimum end-to-end delay constraint for asymmetric duty-cycled sensor network, a spatial-temporal state graph that represented the broadcasting spatial-temporal characteristics was first constructed, the target issue was modeled as the forwarding decision subset coverage problem, and then a Minimum-Cost forwarding decision Subset Coverage Algorithm (MC-SCA) and a Cost-Balanced forwarding decision Subset Coverage Algorithm (CB-SCA) were proposed. MC-SCA and CB-SCA both determined the optimal forwarding decision subset in an iterative way. MC-SCA greedily selected the forwarding decision with the least ratio of forwarding cost to the number of new covered nodes in each round of iteration, and CB-SCA greedily selected the forwarding decision which would bring less forward load and more new covered nodes in each round of iteration. In the comparison experiments with the typical Random Parent Node Selection Algorithm (RPNS-A), the total broadcasting energy consumption of MC-SCA decreases by 24.23% in average, and compared with RPNS-A, MC-SCA and the minimum-load-first greedy algorithm, the node maximum broadcasting load of CB-SCA respectively decreases by 48.69%, 65.21% and 10.64% in average, which implies that CB-SCA always achieves the better energy fairness of broadcasting.

Downlink beamforming design based user scheduling for MIMO-NOMA systems

LIU Yi, HU Zhe, JING Xiaorong

2018, 38(11): 3282-3286. DOI: 10.11772/j.issn.1001-9081.2018040876

Asbtract ( )

PDF (774KB) ( )

References | Related Articles | Metrics

Focused on the large inter-user interference in Multiple Input Multiple Output-Non-Orthogonal Multiple Access (MIMO-NOMA) technology, an algorithm merging user scheduling and BeamForming (BF) was proposed. Firstly, during the course of user scheduling, in order to simultaneously take intra-cluster user interference and inter-cluster user interference into account, all user groupings were initially sparsely processed by the L1-norm regularization method according to the channel difference among users. In the respect of user channel correlation, two users with large channel correlation were divided into a cluster. Secondly, Fractional Transmit Power Control (FTPC) was used to implement the power allocation of the intra-cluster users. Finally, an objective optimization function based on sum rate maximization criterion was constructed, which was solved by Successive Convex Approximation (SCA) method to obtain the BF matrix. Compared with OMA (Orthogonal Multiple Access), the proposed scheme achieves 84.3% improvement in system capacity, and compared with the traditional correlation user clustering method, it achieves 20.2% improvement in fairness. The theoretical analysis and simulation results show that the proposed scheme not only suppresses the intra-cluster interference and inter-cluster user interference effectively, but also ensures the fairness among users.

RFID tag number estimation algorithm based on sequential linear Bayes method

WANG Shuai, YANG Xiaodong

2018, 38(11): 3287-3292. DOI: 10.11772/j.issn.1001-9081.2018040854

Asbtract ( )

PDF (923KB) ( )

References | Related Articles | Metrics

In order to solve the contradiction between the estimation precision and the complexity of the existing tag number estimation algorithm, a Radio Frequency IDentification (RFID) tag number estimation algorithm based on sequential linear Bayes was proposed by the analysis and comparison of the existing algorithms. Firstly, a linear model for estimating the number of tags was established based on linear Bayesian theory. This model made full use of the amount and correlation of idle, successful and collision time slots. Then, the closed form expression of the tag number estimation was derived, and the sequential solution method of the statistics was given. Finally, the computational complexity of the sequential Bayesian algorithm was analyzed and compared. The simulation results show that the proposed algorithm improves the estimation accuracy and recognition efficiency by the sequential Bayesian method. The error is only 4% when the number of time slots is half of the frame length. The algorithm updates the estimated value of the number of tags in a linear analytic form to avoid the exhaustive search. Compared with the maximum posterior probability and Mahalanobis distance algorithm with high precision, the computational complexity is reduced from O(n²) and O(n) to O(1). Through theoretical analysis and simulation, the RFID tag number estimation algorithm based on sequential linear Bayes has both high precision and low complexity, and can meet the actual estimation requirements with hardware resource constraints.

Resource allocation framework for heterogeneous wireless network based on software defined network

WU Shikui, WANG Yan

2018, 38(11): 3293-3298. DOI: 10.11772/j.issn.1001-9081.2018040826

Asbtract ( )

PDF (889KB) ( )

References | Related Articles | Metrics

For the popularity of various smart devices in mobile cellular network, and the problem of increasing mobile traffic, the control of radio bandwidth was studied and the radio bandwidth was assigned to multiple radio user equipments. A resource allocation framework based on Software Defined Network (SDN), and a heterogeneous resource allocation algorithm in the LTE/WLAN (Long Term Evolution/Wireless Local Area Network) radio network were proposed. The SDN framework was applied to the heterogeneous resource allocation of LTE-WLAN integrated network, and the framework was extended, and the heterogeneous radio frequency bandwidth in the LTE/WLAN multiple radio network was allocated in a holistic way. Heterogeneous resources could be processed by decomposing the function of centralized solutions to the designated network entities. Simulation experiments show that the proposed framework can better balance network throughput and user fairness, and the algorithm has better convergence.

Hierarchical PCE-based and bimatrix game-based multicast dedicated protection algorithm in multi-domain optical network under static state

CHEN Hao, WU Qiwu, LI Fang, JIANG Lingzhi

2018, 38(11): 3299-3304. DOI: 10.11772/j.issn.1001-9081.2018051099

Asbtract ( )

PDF (1131KB) ( )

References | Related Articles | Metrics

How to ensure the survivability of static multicast business has become a widespread concern in the multi-domain optical network of pre-configured multicast business. Concerning the above problem, by adopting the global topological information and scheduling calculation model based on hierarchical Path Computation Element (PCE) architecture, a bimatrix game model was used to generate link-disjoint multicast trees and multicast protected trees, finally hierarchical PCE-based and bimatrix game-based multicast dedicated protection algorithm was put forward under static state, and concrete examples of the algorithm were given. Theoretical analysis and experimental results show that under certain redundancy allocation of network resources, the proposed algorithm has low time complexity, and it can obviously improve the multicast business survivability in multi-domain optical network under static state, with optimizing resources allocation structure of protection work in the optimal multicast working trees and multicast protected trees at the same time.

Data augmentation method based on conditional generative adversarial net model

CHEN Wenbing, GUAN Zhengxiong, CHEN Yunjie

2018, 38(11): 3305-3311. DOI: 10.11772/j.issn.1001-9081.2018051008

Asbtract ( )

PDF (1131KB) ( )

References | Related Articles | Metrics

Deep Convolutional Neural Network (CNN) is trained by large-scale labelled datasets. After training, the model can achieve high recognition rate or good classification effect. However, the training of CNN models with smaller-scale datasets usually occurs overfitting. In order to solve this problem, a novel data augmentation method called GMM-CGAN was proposed, which was integrated Gaussian Mixture Model (GMM) and CGAN (Conditional Generative Adversarial Net). Firstly, sample number was increased by randomly sliding sampling around the core region. Secondly, the random noise vector was supposed to submit to the distribution of GMM model, then it was used as the initial input to the CGAN generator and the image label was used as the CGAN condition to train the parameters of the CGAN and GMM models. Finally, the trained CGAN was used to generate a new dataset that matched the real distribution of the samples. The dataset was divided into 12 classes of 386 items. After implementing GMM-CGAN on the dataset, the total number of the new dataset was 38600. The experimental results show that compared with CNN's training datasets augmented by Affine transformation or CGAN, the average classification accuracy of the proposed method is 89.1%, which is improved by 18.2% and 14.1%, respectively.

Rumor spread model considering difference of individual interest degree and refutation mechanism

RAN Maojie, LIU Chao, HUANG Xianying, LIU Xiaoyang, YANG Hongyu, ZHANG Guangjian

2018, 38(11): 3312-3318. DOI: 10.11772/j.issn.1001-9081.2018040890

Asbtract ( )

PDF (951KB) ( )

References | Related Articles | Metrics

The impact of individual interest and refutation mechanism on rumor spread was investigated, and a new IWSR (Ignorant-Weak spreader-Strong spreader-Removal) rumor spread model was proposed. The basic reproduction number and equilibrium points of the model were calculated. Using Lyapunov stability theorem, Hurwitz criterion and LaSalle invariance principle, the local stability and global stability of some equilibrium points were proved. Through numerical simulations, it is concluded that increasing the effectivity of the governments' refutation actions or improving people' ability of rumor judgement can effectively suppress rumor spread. Finally, numerical simulations were conducted in WS (Watts-Strogatz) small-world network and BA (Barabási-Albert) scale-free network, showing that the network topology exerts significant influence on rumor spread.

Evaluation of susceptibility to debris flow hazards based on geological big data

ZHANG Yonghong, GE Taotao, TIAN Wei, XIA Guanghao, HE Jing

2018, 38(11): 3319-3325. DOI: 10.11772/j.issn.1001-9081.2018040789

Asbtract ( )

PDF (1168KB) ( )

References | Related Articles | Metrics

In the background of geological data, in order to more accurately and objectively assess the susceptibility of debris flow, a model of regional debris flow susceptibility assessment based on neural network was proposed, and the accuracy of the model was improved by using Mean Impact Value (MIV) algorithm, Genetic Algorithm (GA) and Borderline-SMOTE (Synthetic Minority Oversampling TEchnique) algorithm. Borderline-SMOTE algorithm was used to deal with the classification problem of imbalanced dataset in the preprocessing phase. Afterwards, a neural network was used to fit the non-linear relationship between the main indicators and the degree of proneness, and genetic algorithm was used to improve the fitting speed. Finally, MIV algorithm was combined to quantify the correlation between indicators and proneness. The middle and upper reaches of the Yarlung Zangbo River was selected as the study area. The experimental results show that the model can effectively reduce the overfitting of imbalanced datasets, optimize the original input dimension, and greatly improve the fitting speed. Using AUC (Area Under the Curve) metric to test the evaluation results, the classification accuracy of test set reached 97.95%, indicated that the model can provide reference for assessing the degree of debris flow proneness in the study area under imbalanced datasets.

Mining method of trajectory interval pattern based on spatial proximity searching

ZHANG Haitao, ZHOU Huan, ZHANG Guonan

2018, 38(11): 3326-3331. DOI: 10.11772/j.issn.1001-9081.2018051023

Asbtract ( )

PDF (941KB) ( )

References | Related Articles | Metrics

Concerning the problem that traditional trajectory pattern mining methods have the problems of slow mining and large maximum amount of memory, a method of mining trajectory interval patterns based on spatial proximity searching was proposed. The implementation of the proposed method consists of five phases:1) Space-time discretization is performed on the trajectories, and space-time cell sequences corresponding to trajectories are achieved. 2) All the space-time cell sequences are scanned to get all no-duplication spatial cells, and all frequent spatial cells are obtained by the inclusion operation of the spatial cells and the cell sequences. 3) Frequent spatial cells are transformed into frequent interval patterns of length one. 4) Candidate interval patterns with the frequent spatial cells as units are generated by spatial proximity searching, and the support value of the candidate patterns are calculated by matching the patterns and the space-time cell sequences. 5) Based on the set support threshold, all frequent interval patterns are obtained. The experimental results show that the proposed method has the advantages of faster mining and less maximum amount of memory than traditional methods. Furthermore, in terms of running time, the proposed method has better stability and scalability performance than traditional methods. This method is helpful to the trajectory pattern mining methods to increase the mining speed and reduce the maximum amount of memory.

Vessel traffic pattern extraction based on automatic identification system data and Hough transformation

CHEN Hongkun, CHA Hao, LIU Liguo, MENG Wei

2018, 38(11): 3332-3335. DOI: 10.11772/j.issn.1001-9081.2018040841

Asbtract ( )

PDF (771KB) ( )

References | Related Articles | Metrics

Traditional trajectory clustering algorithm is no longer applicable due to the lack of continuous ship navigation data for large-scale sea area extraction. To solve this problem, a technique of vessel traffic pattern extraction using Hough transformation was proposed. Based on Automatic Identification System (AIS) data, the target area was divided into grids so that the ship density distribution was analyzed. Considering the problem of density distribution resolution, median filtering and morphological filtering were used to optimize the density distribution. Thus a method combining Hough transformation and Kernel density estimation was proposed to extract vessel traffic pattern and estimate the width of pattern. The experimental verification of the method with real historical AIS data shows that the trajectory clustering method cannot extract vessel traffic pattern in lower ship-density areas, its extracted number of ship trajectories in trajectory clusters accounts for 29.81% of the total number in the area, compared to 95.89% using the proposed method. The experimental result validates the effectiveness of the proposed method.

Financial time series prediction by long short term memory neural network with tree structure

YAO Xiaoqiang, HOU Zhisen

2018, 38(11): 3336-3341. DOI: 10.11772/j.issn.1001-9081.2018040742

Asbtract ( )

PDF (941KB) ( )

References | Related Articles | Metrics

Aiming at the problem that traditional methods can not effectively predict multi-noise and nonlinear time series, focusing on multi-scale features fusion, a prediction method based on tree structure Long Short-Term Memory (LSTM) neural network was proposed and verified. First of all, the core methods of realizing the prediction were proposed, and the internal advantages of the methods were analyzed. Secondly, the prediction model based on tree structure LSTM neural network was constructed. Finally, the model was verified based on the international gold spot transaction data of the last ten years. The results show that the prediction accuracy is nearly 10 percentage points higher than the minimum success rate, and the availability of the methods is proved.

Processing method of INS/GPS information delay based on factor graph algorithm

GAO Junqiang, TANG Xiaqing, ZHANG Huan, GUO Libin

2018, 38(11): 3342-3347. DOI: 10.11772/j.issn.1001-9081.2018040814

Asbtract ( )

PDF (963KB) ( )

References | Related Articles | Metrics

Aiming at the problem of the poor real-time performance of Inertial Navigation System (INS)/Global Positioning System (GPS) integrated navigation system caused by GPS information delay, a processing method which takes advantage of dealing with various asynchronous measurements at an information fusion time in factor graph algorithm was proposed. Before the system received GPS information, the factor nodes of the INS information were added to the factor graph model, and the integrated navigation results were obtained by incremental inference to ensure the real-time performance of the system. After the system received the GPS information, the factor nodes about the GPS information were added to the factor graph model to correct the INS error, thereby ensuring high-precision operation of the system for a long time. The simulation results show that, the navigation state that has just been updated by GPS information can correct the INS error effectively, when the correction effect of real-time navigation state on INS error becomes worse, as the time of GPS information delay becomes longer. The factor graph algorithm avoids the adverse effects of GPS information delay on the real-time performance of INS/GPS integrated navigation system, and ensures the accuracy of the system.

Segmentation of cervical nuclei based on fully convolutional network and conditional random field

LIU Yiming, ZHANG Pengcheng, LIU Yi, GUI Zhiguo

2018, 38(11): 3348-3354. DOI: 10.11772/j.issn.1001-9081.2018050988

Asbtract ( )

PDF (1095KB) ( )

References | Related Articles | Metrics

Aiming at the problem of inaccurate cervical nuclei segmentation due to complex and diverse shape in cervical cancer screening, a new method that combined Fully Convolutional Network (FCN) and dense Conditional Random Field (CRF) was proposed for nuclei segmentation. Firstly, a Tiny-FCN (T-FCN) was built according to the characteristics of the Herlev data set. Utilizing the priori information at the pixel level of the nucleus region, the multi-level features were learned autonomously to obtain the rough segmentation of the cell nucleus. Then, the small incorrect segmentation regions in the rough segmentation were eliminated and the segmentation was refined, by minimizing the energy function of the dense CRF that contains the label, intensity and position information of all pixels in a cell image. The experiment results on Herlev Pap smear dataset show that the precision, recall and Zijdenbos Similarity Index (ZSI) are all higher than 0.9, indicating that the nuclei segmentation boundary obtained by the proposed method is matched excellently with the ground truth, and the segmentation is accurate. Compared to the traditional method in which the indexes of segmentation of abnormal nuclei are lower than those of normal nuclei, the segmentation indexes of abnormal nuclei are superior to those of normal nulei by the proposed method.

Table of Content