Journal of Computer Applications

Review of typical machine learning platforms for big data

JIAO Jiafeng, LI Yun

2017, 37(11): 3039-3047. DOI: 10.11772/j.issn.1001-9081.2017.11.3039

Asbtract ( )

PDF (1608KB) ( )

References | Related Articles | Metrics

Due to the volume, complex and fast-changing characteristics of big data, traditional machine learning platforms are not applicable. Therefore, designing an efficient and general machine learning platform for big data has become an important research issue. By introducing and analyzing the characteristics of machine learning algorithms and the data and model parallelization for large-scale machine learning, some common parallel computing models were presented. Bulk Synchronous Parallel (BSP), Stale Synchronous Parallel (SSP) computing models and the differences between BSP, SSP, and Asynchronous Parallel model (AP) were introduced. Then the typical machine learning platforms based on these parallel models and the advantages and disadvantages of these platforms were mainly introduced, and what kind of big data each typical machine learning platform was best suited for was pointed out. Finally, the typical machine learning platforms were summarized from the aspects of abstract data structure, parallel computing model and fault tolerance mechanism. Some suggestions and prospects were put forward.

Relative importance index of dummy variables in regression model

LI Haichao, WANG Kaijun, HU Miao, CHEN Lifei

2017, 37(11): 3048-3052. DOI: 10.11772/j.issn.1001-9081.2017.11.3048

Asbtract ( )

PDF (819KB) ( )

References | Related Articles | Metrics

To describe the qualitative attributes in the regression model, it is usually necessary to introduce dummy variables. For the regression equation with dummy variables, a method was proposed to describe the different importance of the different dummy variables in the regression equation. The sums of square due to regression with dummy variables were descomposed, including the sum of the dummy variable part and that of non-dummy variable part, and the proportions of the two parts was calculated in the regression equation, and the proportion was taken as the index of relative importance of every dummy variable in regression equations. In sets of Lending Club and Prosper network with nearly 100 thousand lending data, the experimental results about the influence of the purpose of loan on the borrowing success rate and the influence of credit grade on the borrowing rate show that compared with the traditional regression equation which only provides a dummy variable coefficient and cannot shows its importance, the proposed method can show the importance of different dummy variables, and provide an important means to quantitatively analyze the influence degree of qualitative independent variables on the dependent variable in the regression equation.

Rumor detection based on convolutional neural network

LIU Zheng, WEI Zhihua, ZHANG Renxian

2017, 37(11): 3053-3056. DOI: 10.11772/j.issn.1001-9081.2017.11.3053

Asbtract ( )

PDF (748KB) ( )

References | Related Articles | Metrics

Manual rumor detection often consumes a lot of manpower and material resources, and there will be a long detection delay. At present, the existing rumor detection models construct features manually according to the content, user attributes, and pattern of the rumor transmission, which can not avoid one-sided consideration, waste of human and other phenomena. To solve this problem, a rumor detection model based on Convolutional Neural Network (CNN) was presented. The rumor events in microblog were vectorized. The deep features of text were mined through the learning and training in hidden layer of CNN to avoid the problem of feature construction, and those features that were not easily found could be found to produce better results. The experimental results show that the proposed method can accurately identify rumor events, and it is better than Support Vector Machine (SVM), Recurrent Neural Network (RNN) and other contrast algorithms in accuracy rate, precision rate and F1 score.

Dynamic forecasting model of short-term PM2.5 concentration based on machine learning

DAI Lijie, ZHANG Changjiang, MA Leiming

2017, 37(11): 3057-3063. DOI: 10.11772/j.issn.1001-9081.2017.11.3057

Asbtract ( )

PDF (1092KB) ( )

References | Related Articles | Metrics

The forecasted concentration of PM2.5 forecasting model greatly deviate from the measured concentration. In order to solve this problem, the data (from February 2015 to July 2015), consisting of measured PM2.5 concentration, PM2.5 model (WRF-Chem) forecasted concentration and model forecasted data of 5 main meteorological factors, were provided by Shanghai Pudong Meteorological Bureau. Support Vector Machine (SVM) and Particle Swarm Optimization (PSO) algorithm were combined to build rolling forecasting model of hourly PM2.5 concentration in 24 hours in advance. Meanwhile, the nighttime average concentration, daytime average concentration and daily average concentration during the upcoming day were forecasted by rolling model. Compared with Radical Basis Function Neural Network (RBFNN), Multiple Linear Regression (MLR) and WRF-Chem, the experimental results show that the proposed SVM model improves the forecasting accuracy of PM2.5 concentration one hour in advance (according with the results concluded from finished research), and can comparatively well forecast PM2.5 concentration in 24 hours in advance, and effectively forecast the nighttime average concentration, daytime average concentration and daily average concentration during the upcoming day. In addition, the proposed model has comparatively high forecasting accuracies of hourly PM2.5 concentration in 12 hours in advance and nighttime average concentration during the upcoming day.

Mining of accompanying vehicle group from trajectory data based on analogous automatic number plate recognition

WANG Baoquan, JIANG Tonghai, ZHOU Xi, MA Bo, ZHAO Fan

2017, 37(11): 3064-3068. DOI: 10.11772/j.issn.1001-9081.2017.11.3064

Asbtract ( )

PDF (908KB) ( )

References | Related Articles | Metrics

Automatic Number Plate Recognition (ANPR) data is easier to obtain than private Global Positioning System (GPS) data, and it contains more useful information, but the relatively mature GPS track data mining with vehicle group method did not apply to ANPR data, the existing accompanying vehicle group mining algorithm pays attention to the similarity of the trajectory and ignores the time factor when dealing with small amount of ANPR data. A clustering method based on trajectory feature to excavate the accompanying vehicle group was proposed. Aiming at the fact that the sampling points are fixed and the sampling time is uncertain in the ANPR data, whether two objects were accompanied was determined by the number of co-occurrence in the trajectory. The co-occurrence definition introduced the Hausdorff distance, taking into account the location, direction and time characteristics of the trajectory. The accompanying vehicle group with different but adjacent sampling points and similar trajectories was minned to improve the mining efficiency. The experimental results show that the proposed method is more effective than the existing method to excavate the vehicle group, and improves the efficiency by nearly two times when identifying the non-accompanying mode data.

Robust L1-norm non-parallel proximal support vector machine via efficient iterative algorithm

ZHAO Caiyun, WU Changqing, GE Hua

2017, 37(11): 3069-3074. DOI: 10.11772/j.issn.1001-9081.2017.11.3069

Asbtract ( )

PDF (989KB) ( )

References | Related Articles | Metrics

Considering that robust L1-norm Non-parallel Proximal Support Vector Machine (L1-NPSVM) can not guarantee a reliable solution, a new iterative algorithm was proposed to solve the objective of L1-NPSVM. Since the objective problem of L1-NPSVM is invariant to the scale of solution, such that it can be transformed into a maximization problem with an equality constraint. And then the proposed iterative algorithm was used to solve it. The iterative algorithm in each iteration obtained updated solution of each iteration by using weight updating mechanism, and the problem was reduced to solve two fast linear equations in each iteration. The convergence of the algorithm was proved theoretically. Experiments on the common UCI datasets show that the proposed algorithm is not only superior to L1-NPSVM in classification performance, but also has considerable computational advantage.

Incremental fuzzy associative classification method based on evolving vector quantization clustering algorithm

HUO Weigang, QU Feng, CHENG Zhen

2017, 37(11): 3075-3079. DOI: 10.11772/j.issn.1001-9081.2017.11.3075

Asbtract ( )

PDF (773KB) ( )

References | Related Articles | Metrics

In order to improve the efficiency of building Fuzzy Associative Classifier (FAC) on the dynamic data sets, an incremental fuzzy associative classification method based on eVQ (evolving Vector Quantization) clustering algorithm was proposed. Firstly, eVQ clustering algorithm was adopted to incrementally update the parameters of Gauss membership functions of quantitative attributes. Secondly, Update With Early Pruning (UWEP) algorithm was extended to incrementally mine fuzzy frequent itemsets. Finally, Fuzzy CORRelation (FCORR) of Fuzzy Associative Classification Rule (FACR) and the length of antecedent of FACR were regarded as measures to prune and update fuzzy associative classification rule base. The experimental results on four UCI benchmark data sets show that compared with the batch fuzzy association classification modeling method, the proposed method can reduce the time of training the FAC in the premise of not decreasing the accuracy and interpretability. The Gauss membership function updating method based on eVQ clustering algorithm contributes to improve the classification accuracy of the FAC on the dynamic data sets.

Grid clustering algorithm based on density peaks

YANG Jie, WANG Guoyin, WANG Fei

2017, 37(11): 3080-3084. DOI: 10.11772/j.issn.1001-9081.2017.11.3080

Asbtract ( )

PDF (809KB) ( )

References | Related Articles | Metrics

The Density Peak Clustering (DPC) algorithm which required few parameters and no iteration was proposed in 2014, it was simple and novel. In this paper, a grid clustering algorithm which could efficiently deal with large-scale data was proposed based on DPC. Firstly, the N dimensional space was divided into disjoint rectangular units, and the unit space information was counted. Then the central cells of space was found based on DPC, namely, the central cells were surrounded by other grid cells of low local density, and the distance with grid cells of high local density was relatively large. Finally, the grid cells adjacent to their central cells were merged to obtain the clustering results. The experimental results on UCI artificial data set show that the proposed algorithm can quickly find the clustering centers, and effectively deal with the clustering problem of large-scale data, which has a higher efficiency compared with the original density peak clustering algorithm on different data sets, reducing the loss of time 10 to 100 times, and maintaining the loss of accuracy at 5% to 8%.

Semi-supervised community detection algorithm using active link selection based on iterative framework

CHEN Yiying, CHAI Bianfang, LI Wenbin, HE Yichao, WU Congcong

2017, 37(11): 3085-3089. DOI: 10.11772/j.issn.1001-9081.2017.11.3085

Asbtract ( )

PDF (758KB) ( )

References | Related Articles | Metrics

In order to solve the problem that large amounts of supervised information was needed to achieve satisfactory performance, owing to the implementation of the semi-supervised community detection methods based on Non-negative Matrix Factorization (NMF) which selected prior information randomly, an Active Link Selection algorithm for semi-supervised community detection based on Graph regularization NMF (ALS_GNMF) was proposed. Firstly, in the iteration framework, the most uncertain and informative links were selected actively as prior information links. Secondly, the must-link constraints of these links, which generated the prior matrix, were added to enhance the connections in a certain community. At the same time, the cannot-link constraints were added, which modified the adjacency matrix, to weaken the connections between communities. Finally, the prior matrix was used as a graph regularization term to incorporate into the optimization objective function of NMF. And combining with network topology information, higher community discovery accuracy and robustness were achieved with less prior information. At the same prior ratio on both synthetic and real networks, experimental results demonstrate that the ALS_GNMF algorithm significantly outperformes the existing semi-supervised NMF algorithms in terms of efficiency, and it is stable especially on networks with unclear structure.

Active semi-supervised community detection method based on link model

CHAI Bianfang, WANG Jianling, XU Jiwei, LI Wenbin

2017, 37(11): 3090-3094. DOI: 10.11772/j.issn.1001-9081.2017.11.3090

Asbtract ( )

PDF (756KB) ( )

References | Related Articles | Metrics

Link model is able to model community detection problem on networks. Compared with other similar models including symmetric models and conditional models, PPL (Popularity and Productivity Link) deals more types of networks, and detects communities more accurately. But PPL is an unsupervised model, and works badly when the network structure is unclear. In addition, PPL is not able to utilize priors that are easily captained. In order to improve its performance by using as less as possible, an Active Node Prior Learning (ANPL) algorithm was provided. ANPL selected the highest utility and easily labeled pairwise constraints, and generated automatically more informative labeled nodes based on the labeled pairwise constraints. Based on the PPL model,a Semi-supervised PPL (SPPL) model was proposed for community detection, which combined the topology of network and node labels learned from the ANPL algorithm. Experiments on synthetic and real networks demonstrate that using node priors from the ANPL algorithm and the topology of a network, SPPL model excels to unsupervised PPL model and popular semi-supervised community detection models based on Non-negative Matrix Factorization (NMF).

User discovery based on loyalty in social networks

XUE Yun, LI Guohe, WU Weijiang, HONG Yunfeng, ZHOU Xiaoming

2017, 37(11): 3095-3100. DOI: 10.11772/j.issn.1001-9081.2017.11.3095

Asbtract ( )

PDF (869KB) ( )

References | Related Articles | Metrics

Aiming at improving the users' high viscosity in social networks, an algorithm based on user loyalty in social network system was proposed. In the proposed algorithm, double Recency Frequency Monetary (RFM) model was used for mining the different loyalty kinds of users. Firstly, according to the double RFM model, the users' consumption value and behavior value were calculated dynamically and the loyalty in a certain time was got. Secondly, the typical loyal users and disloyal users were found out by using the founded standard curve and similarity calculation. Lastly, the potential loyal and disloyal users were found out by using modularity-based community discovery and independent cascade propagation model. On some microblog datasets of a social network, the quantitative representation of user loyalty was confirmed in Social Network Service (SNS), thus the users could be distinguished based on users' loyalty. The experimental results show that the proposed algorithm can be used to effectively dig out different loyalty kinds of users, and can be applied to personalized recommendation, marketing, etc. in the social network system.

Efficient approach for selecting key users in large-scale social networks

ZHENG Yongguang, YUE Kun, YIN Zidu, ZHANG Xuejie

2017, 37(11): 3101-3106. DOI: 10.11772/j.issn.1001-9081.2017.11.3101

Asbtract ( )

PDF (965KB) ( )

References | Related Articles | Metrics

To select key users with great information dissemination capability efficiently and effectively from large-scale social networks and corresponding historical user massages, an approach for selecting key users was proposed. Firstly, the structure information of the social network was used to construct the directed graph with the user as the node. Based on the Spark calculation framework, the weights of user activity, transmission interaction and information quantity were quantitatively calculated by the historical data of the message, so as to construct a dynamic weighted graph model of social networks. Then, the measurement for user's information dissemination capacity was established based on PageRank and the Spark-based algorithm was given correspondingly for large-scale social networks. Further more, the algorithm for d-distance selection of key users was given to make the overlap of information dissemination ranges of different key users be as less as possible by multiple iterations. The experimental results based on Sina Weibo datasets show that the proposed approach is efficient, feasible and scalable, and can provide underlying techniques to control the spread of bad news and monitor public opinions to a certain extent.

Conditional preference mining based on MaxClique

TAN Zheng, LIU JingLei, YU Hang

2017, 37(11): 3107-3114. DOI: 10.11772/j.issn.1001-9081.2017.11.3107

Asbtract ( )

PDF (1274KB) ( )

References | Related Articles | Metrics

In order to solve the problem that conditional constraints (context constraints) for personalized queries in database were not fully considered, a constraint model was proposed where the context i^+≻i-|X means that the user prefers i⁺ than i^- based on the constraint of context X. Association rules mining algorithm based on MaxClique was used to obtain user preferences, and Conditional Preference Mining (CPM) algorithm combined with context obtained preference rules was proposed to obtain user preferences. The experimental results show that the context preference mining model has strong preference expression ability. At the same time, under the different parameters of minimum support, minimum confidence and data scale, the experimental results of preferences mining algorithm of CPM compared with Apriori algorithm and CONTENUM algorithm show that the proposed CPM algorithm can obviously improve the generation efficiency of user preferences.

Community detection algorithm based on belief propagation in complex networks

YOU Xinxin, GE Meng

2017, 37(11): 3115-3118. DOI: 10.11772/j.issn.1001-9081.2017.11.3115

Asbtract ( )

PDF (655KB) ( )

References | Related Articles | Metrics

The classical Belief Propagation (BP) algorithm can inference the marginal probability distributions and maximum likelihood probability of all nodes by a finite number of iterations. However, BP algorithm always causes strong oscillation in the iterative process, and it uses synchronous way to pass messages which seriously affects the convergence rate. According to a lot of research, three main factors which caused oscillation were found:strong energy, close loop and contradictory direction. Furthermore, a new update formula and an asynchronous way of passing messages were proposed to solve above two problems. Stochastic block model was used to model the network generation process and the result of community division was obtained by using classical expectation maximization algorithm combined with BP. Extensive experimental results on real-world networks show the superior performance of the new method over the state-of-the-art approaches.

Research and implementation of mobile robot path planning method

SHI Jin, DONG Yao, BAI Zhendong, CUI Zechen, DONG Yongfeng

2017, 37(11): 3119-3123. DOI: 10.11772/j.issn.1001-9081.2017.11.3119

Asbtract ( )

PDF (721KB) ( )

References | Related Articles | Metrics

In the environment with unknown dynamic obstacle moving and target point, the radius of the repulsive force is often larger than the radius of the obstacle when the path is planned by the artificial potential field method, which leads to the collision of the dynamic obstacle with the robot. An improved dynamic path planning strategy of artificial potential field based on Morphine algorithm and non-completely waiting strategy was proposed. The non-completely waiting strategy was adopted when the dynamic obstacle collided with the robot on a side. The Morphine algorithm was used to localize the path when the dynamic obstacle collided with the robot face to face. Moreover, the rolling window theory was introduced to improve the accuracy of avoiding dynamic obstacles. Through the simulation tests, compared with the traditional artificial potential field, the proposed algorithm is shortened by 12 steps in the event of a side collision and 6 steps in the event of a face-to-face collision. Therefore, the improved algorithm is more effective in path smoothness and planning steps.

High-performance image super-resolution restruction based on cascade deep convolutional network

GUO Xiao, TAN Wenan

2017, 37(11): 3124-3127. DOI: 10.11772/j.issn.1001-9081.2017.11.3124

Asbtract ( )

PDF (783KB) ( )

References | Related Articles | Metrics

In order to further improve the resolution of existing image super-resolution methods, a High-performance Deep Convolution neural Network (HDCN) was proposed to reconstruct a fixed-scale super-resolution image. By cascading several HDCN models, the problem that many traditional models could not upscale images in alternative scale factors was solved, and a deep edge filter in the cascade process was introduced to reduce cascading errors, and highlight edge information, High-performance Cascade Deep Convolutional neural Network (HCDCN) was got. The super-resolution image reconstruction experiment was carried out on high-performance cascade deep convolution neural network (HCDCN) model on Set5 and Set14 datasets. The experimental results prove the effectiveness of introducing the deep edge-aware filter. By comparing the performance evaluation results of HCDCN method and other image super-resolution reconstruction method, the superior performance of HCDCN method is demonstrated.

Object tracking based on foreground discrimination and circle search

LIN Lingpeng, HUANG Tianqiang, LIN Jing

2017, 37(11): 3128-3133. DOI: 10.11772/j.issn.1001-9081.2017.11.3128

Asbtract ( )

PDF (1049KB) ( )

References | Related Articles | Metrics

Aiming at the problems of low accuracy and even object lost in moving object tracking under occlusion, deformation, rotation, and illumination changes and poor real-time performance of the traditional tracking algorithm, a target tracking algorithm based on foreground discrimination and Circle Search (CS) was proposed. The image perceptual hashing technique was used to describe and match tracked object, and the tracking process was based on the combination of the above was tracking strategies, which could effectively solve the above problems. Firstly, because the direction of motion uncertain and the inter-frame motion was slow, CS algorithm was used to search the local best matching position (around the tracked object) in the current frame. Then, the foreground discrimination PBAS (Pixel-Based Adaptive Segmenter) algorithm was adopted to search for the global optimal object foreground in the current frame. Finally, the one with higher similarity with the object template was selected as the tracking result, and whether to update the target template was determined according to the matching threshold. The experimental results show that the proposed algorithm is better than the MeanShift algorithm in precision, accuracy, and has a better tracking advantage when the target is not moving fast.

Unsupervised video segmentation by fusing multiple spatio-temporal feature representations

LI Xuejun, ZHANG Kaihua, SONG Huihui

2017, 37(11): 3134-3138. DOI: 10.11772/j.issn.1001-9081.2017.11.3134

Asbtract ( )

PDF (1045KB) ( )

References | Related Articles | Metrics

Due to random movement of the segmented target, rapid change of background, arbitrary variation and shape deformation of object appearance, in this paper, a new unsupervised video segmentation algorithm based on multiple spatial-temporal feature representations was presented. By combination of salient features and other features obtained from pixels and superpixels, a coarse-to-fine-grained robust feature representation was designed to represent each frame in a video sequence. Firstly, a set of superpixels was generated to represent foreground and background in order to improve computational efficiency and get segmentation results by graph-cut algorithm. Then, the optical flow method was used to propagate information between adjacent frames, and the appearance of each superpixel was updated by its non-local sptatial-temporal features generated by nearest neighbor searching method with efficient K-Dimensional tree (K-D tree) algorithm, so as to improve robustness of segmentation. After that, for segmentation results generated in superpixel-level, a new Gaussian mixture model based on pixels was constructed to achieve pixel-level refinement. Finally, the significant feature of image was introduced, as well as segmentation results generated by graph-cut and Gaussian mixture model, to obtain more accurate segmentation results by voting scheme. The experimental results show that the proposed algorithm is a robust and effective segmentation algorithm, which is superior to most unsupervised video segmentation algorithms and some semi-supervised video segmentation algorithms.

Video keyframe extraction based on users' interests

YU Huangyue, WANG Han, GUO Mengting

2017, 37(11): 3139-3144. DOI: 10.11772/j.issn.1001-9081.2017.11.3139

Asbtract ( )

PDF (1017KB) ( )

References | Related Articles | Metrics

At present, the video key information extraction technology mainly focuses on the extraction of key frames according to the characteristics of video low-level, and ignores the semantic information related to users' interests. Semantic modeling of video requires a large number of marked video training samples, which is time consuming and laborious. To alleviate this problem, a large amount of Internet image data was used to construct a semantic model based on users' interests, which was rich in content and covered a large amount of event information. However, the images obtained from the Internet were diversed and often accompanied by image noise, the final extraction of video would be greatly affected by brute force migration. The synonym-weight model was used to measure the differences of the semantically similar image groups on the Internet, and these image groups were used to construct a semantic model. The weight of each image group in knowledge migration was determined by the weight value. The experimental results on several challenging video datasets demonstrate that semantic modeling based on users' interests combined with weights is more comprehensive and accurate, so as to effectively guide the video key frame extraction.

Weighted sparse representation based on self-paced learning for face recognition

WANG Xuejun, WANG Wenjian, CAO Feilong

2017, 37(11): 3145-3151. DOI: 10.11772/j.issn.1001-9081.2017.11.3145

Asbtract ( )

PDF (1023KB) ( )

References | Related Articles | Metrics

In recent years, Sparse Representation based Classifier (SRC) has become a hot issue which has been great successful in face recognition. However, when the SRC reconstructed test samples, it is possible to use the training samples with large difference from the test samples, meanwhile, SRC tends to lose locality information and thus produces unstable classification results. A Self-Paced Learning Weighted Sparse Representation based Classifier (SPL-WSRC) was proposed. It could effectively eliminate the training samples with large difference from the test samples. In addition, locality information between the samples was considered by weighting to improve the classification accuracy and stability. The experimental results on three classical face databases show that the proposed SPL-WSRC algorithm is better than the original SRC algorithm. The effect is more obvious, especially when the training samples are enough.

Face detection in bus environment based on cost-sensitive deep quadratic tree

LOU Kang, XUE Yanbing, ZHANG Hua, XU Guangping, GAO Zan, WANG Zhigang

2017, 37(11): 3152-3156. DOI: 10.11772/j.issn.1001-9081.2017.11.3152

Asbtract ( )

PDF (1038KB) ( )

References | Related Articles | Metrics

The problems of face detection in bus environment include ambient illumination changing, image distortion, human body occlusion, abnormal postures and etc. For alleviating these mentioned limitations, a face detection based on cost-sensitive Deep Quadratic Tree (DQT) was proposed. First of all, Normalized Pixel Difference (NPD) feature was utilized to construct and train a single DQT. According to the classification result of the current decision tree, the cost-sensitive Gentle Adaboost method was used to update the sample weight, and a number of deep decision trees were trained. Finally, the classifier was produced by Soft-Cascade method with multiple upgraded deep quadratic trees. The experimental results on Face Detection Data set and Benchmark (FDDB) and bus video show that compared with the existing depth decision tree algorithm, the proposed algorithm has improved the detection rate and detection speed.

Forest fire image segmentation algorithm with adaptive threshold based on smooth spline function

YANG Xubing, TAN Xinyi, ZHANG Fuquan

2017, 37(11): 3157-3161. DOI: 10.11772/j.issn.1001-9081.2017.11.3157

Asbtract ( )

PDF (923KB) ( )

References | Related Articles | Metrics

Based on smooth spline principle, a self-adaptive multi-threshold segmentation algorithm HistSplineReg (Spline Regression for Histogram) was proposed. HistSplineReg is a two-step method. Firstly, a smoothing spline function was regressed to fit the one-dimensional image histogram, and then the extreme value was found by the regression function to achieve multi-threshold automatic segmentation of the image. Compared to the existing multi-threshold methods, the advantages of HistSplineReg lie in 5 aspects:1) it is quite consistent with the human intuition; 2) it is constructed on the smoothing spline, which is a solid mathematic basis; 3) both the number and the size of multiple thresholds can be automatically determined; 4) HistSplineReg can be analytically solved, and its computing burden is mainly concentrated on Cholesky decomposition of the matrix, while the size of matrix depends on the pixel level of the image, rather than the scale of the image; 5) it has only one trade-off parameter to balance the empirical error and regressor's smoothness. Furthermore, for the forest fire recognition task, an experimental reference value was provided. Finally, experiments were conducted on some digital forest fire images in the RGB (Red, Green, Blue) mode. The experimental results show that the histSplineReg method is more effective than Support Vector Regression (SVR) and Polynomial Fitting (PolyFit), which is based on the grayscale image, the color channel, the color image synthesized by each channel segmentation. And the three methods all reflect the red channel information is most significant to the forest fire image segmentation effect.

Remote sensing image segmentation with texture removal

ZHOU Mingfei, WANG Xili

2017, 37(11): 3162-3167. DOI: 10.11772/j.issn.1001-9081.2017.11.3162

Asbtract ( )

PDF (1051KB) ( )

References | Related Articles | Metrics

Focused on the issue that the precise segmentation of remote sensing images which contain complex textures is always difficult, a novel algorithm which combined remote sensing image segmentation with texture removal was proposed. Firstly, the method of texture removal with relative total variation was improved. A new norm constraint was introduced to the relative total variation algorithm, which helped to enhance the major structures in images while removing textures. Meanwhile, the improved texture removal method could assist the following image segmentation. Secondly, mean shift algorithm was used to segment remote sensing images after texture removal by unsupervised clustering. The proposed segmentation algorithm of remote sensing images was tested on different remote sensing images. The experimental results demonstrate that the proposed method can split the main objects from very high resolution remote sensing images. The proposed method obtains better results compared with other methods of remote sensing image segmentation which segmented images without texture removal or segmented remote sensing images combined with other texture removal methods. The proposed method can reduce the influence of texture on image segmentation and improve the accuracy of remote sensing image segmentation.

Iterative adaptive weighted-mean filter for image denoising

ZHANG Xinming, CHENG Jinfeng, KANG Qiang, WANG Xia

2017, 37(11): 3168-3175. DOI: 10.11772/j.issn.1001-9081.2017.11.3168

Asbtract ( )

PDF (1473KB) ( )

References | Related Articles | Metrics

Aiming at the deficiencies of the current filters in removing salt-and-pepper noise from images, such as low denoising performance and slow running speed, an image denosing method based on Iterative Adaptive Weighted-mean Filter (IAWF) was proposed. Firstly, a new method was used to construct the neighborhood weight by using the similarity between the neighborhood pixels and the processed point. Then a new weighted-mean filter algorithm was formed by combing the neighborhood weight with switching trimmed mean filter, making full use of the correlation of the image pixels and the advantages of switching trimmed filter, effectively improving the denoising effect. At the same time, the window size of the filter was automatically adjusted to protect the details as much as possible. Finally, the iterative filter was applied to continue until the noisy points were processed completely in order to process automatically and reduce manual intervention. The simulation results show that compared with several state-of-the-art denoising algorithms, the proposed algorithm is better in Peak Signal-to-Noise Ratio (PSNR), collateral distortion and subjective denoising effect under various noise densities, with higher denoising speed, more suitable for practical applications.

Smoke recognition based on deep transfer learning

WANG Wenpeng, MAO Wentao, HE Jianliang, DOU Zhi

2017, 37(11): 3176-3181. DOI: 10.11772/j.issn.1001-9081.2017.11.3176

Asbtract ( )

PDF (1219KB) ( )

References | Related Articles | Metrics

For smoke recognition problem, the traditional recognition methods based on sensor and image feature are easily affected by the external environment, which would lead to low recognition precision if the flame scene and type change. The recognition method based on deep learning requires a large amount of data, so the model recognition ability is weak when the smoke data is missing or the data source is restricted. To overcome these drawbacks, a new smoke recognition method based on deep transfer learning was proposed. The main idea was to conduct smoke feature transfer by means of VGG-16 (Visual Geometry Group) model with setting ImageNet dataset as source data. Firstly, all image data were pre-processed, including random rotation, cut and overturn, etc. Secondly, VGG-16 network was introduced to transfer the features in the convolutional layers, and to connect the fully connected layers network pre-trained by smoke data. Finally, the smoke recognition model was achieved. Experiments were conducted on open datasets and real-world smoke images. The experimental results show that the accuracy of the proposed method is higher than those of current smoke image recognition methods, and the accuracy is more than 96%.

Automatic detection of pulmonary nodules based on 3D shape index

DONG Linjia, QIANG Yan, ZHAO Juanjuan, YUAN jie, ZHAO Wenting

2017, 37(11): 3182-3187. DOI: 10.11772/j.issn.1001-9081.2017.11.3182

Asbtract ( )

PDF (935KB) ( )

References | Related Articles | Metrics

Aiming at the problem of high misdiagnosis rate, high false positive rate and low detection accuracy in pulmonary nodule computer-aided detection, a method of nodular detection based on three-dimensional shape index and Hessian matrix eigenvalue was proposed. Firstly, the parenchyma region was extracted and the eigenvalues and eigenvectors of the Hessian matrix were calculated. Secondly, the three-dimensional shape index formula was deduced by the two-dimensional shape index, and the improved three-dimensional spherical like filter was constructed. Finally, in the parenchyma volume, the suspected nodule region was detected, and more false-positive regions were removed. The nodules were detected by the three-dimensional volume data, and the detected coordinates were input as the seeds of belief connect, and the three-dimensional data was splited to pick out three-dimensional nodules. The experimental results show that the proposed algorithm can effectively detect different types of pulmonary nodules, and has better detection effect on the ground glass nodules which are more difficult to detect, reduces the false positive rate of nodules, and finally reaches 92.36% accuracy rate and 96.52% sensitivity.

IPTV implicit scoring model based on Hadoop

GU Junhua, GUAN Lei, ZHANG Jian, GAO Xing, ZHANG Suqi

2017, 37(11): 3188-3193. DOI: 10.11772/j.issn.1001-9081.2017.11.3188

Asbtract ( )

PDF (867KB) ( )

References | Related Articles | Metrics

According to the implicit characteristics of IPTV (Internet Protocol Television) user viewing behavior data, a novel implicit rating model was proposed. Firstly, the main features of IPTV user viewing behavior data were introduced, and a new mixed feature implicit scoring model was proposed, which combined with viewing ratio, user interest bias factor and video type influence factor. Secondly, the strategy of viewing behavior based on viewing time and viewing ratio was proposed. Finally, a distributed model architecture based on Hadoop was designed and implemented. The experimental results show that the proposed novel model effectively improves the quality of the recommended results in the IPTV system, improves the time efficiency, and has good scalability for large amounts of data.

Particle swarm optimization algorithm with cross opposition learning and particle-based social learning

ZHANG Xinming, KANG Qiang, WANG Xia, CHENG Jinfeng

2017, 37(11): 3194-3200. DOI: 10.11772/j.issn.1001-9081.2017.11.3194

Asbtract ( )

PDF (1241KB) ( )

References | Related Articles | Metrics

In order to solve the problems of the Social Learning Particle Swarm Optimization (SLPSO) algorithm, such as slow convergence speed and low search efficiency, a Cross opposition learning and Particle-based social learning Particle Swarm Optimization (CPPSO) algorithm was proposed. Firstly, a cross opposition learning mechanism was formulated based on combining general opposition learning, random opposition learning and vertical random cross on the optimal solution. Secondly, the cross opposition learning was adopted for the optimal particle to improve the population diversity, exploration ability and avoid the disadvantage of SLPSO's slow convergence and low search efficiency. Finally, a novel social learning mechanism was adopted for the non-optimal particles in the particle swarm, and the new social learning method used particle-based approach, instead of the dimension-based one of SLPSO, not only improved the exploration capacity, but also improved exploitation and the optimization efficiency. The simulation results on a set of benchmark functions with different dimensions show that the optimization performance, search efficiency and generalizability of the CPPSO algorithm are much better than those of the SLPSO and the advanced PSO algorithms such as Crisscross Search PSO (CSPSO), Self-Regulating PSO (SRPSO), Heterogeneous Comprehensive Learning PSO (HCLPSO) and Reverse learning and Local learning PSO (RLPSO).

Aspect rating prediction based on heterogeneous network and topic model

JI Yugang, LI Yitong, SHI Chuan

2017, 37(11): 3201-3206. DOI: 10.11772/j.issn.1001-9081.2017.11.3201

Asbtract ( )

PDF (863KB) ( )

References | Related Articles | Metrics

Concerning the problem that traditional aspect rating prediction methods just pay attention to textual information while ignoring the structural information in the review network, a novel Aspect rating prediction method based on Heterogeneous Information Network and Topic model (HINToAsp) was proposed for effectively integering textual information and structural information. Firstly, a new review topic model of opinion phrases called Phrase-PLSA (Phrase-based Probabilistic Latent Semantic Analysis) was put forward to integrate textual information of reviews and ratings for mining aspect topics. And then, considering the rich structural information among users, reviews, and items, a topic propagation model was designed by the aid of constructing "User-Review-Item" heterogeneous information network. Finally, a random walk framework was used to combine textual information and structural information effectively, which insured an accurate aspect rating prediction. The experimental results on both Dianping corpora and TripAdvisor corpora demonstrate that HINToAsp is more effective than recent methods like the Quad-tuples PLSA (QPLSA) model, the Gaussian distribution for RAting Over Sentiments (GRAOS) model and the Sentiment-Aligned Topic Model (SATM), and has better performance on recommendation system.

Density clustering method based on complex learning classification system

HUANG Hongwei, GE Xiaotian, CHEN Xuansong

2017, 37(11): 3207-3211. DOI: 10.11772/j.issn.1001-9081.2017.11.3207

Asbtract ( )

PDF (779KB) ( )

References | Related Articles | Metrics

A density clustering method based on eXtended Classifier Systems (XCS) was proposed, which could be used to cluster the two-dimensional data sets with arbitrary shapes and noises. The proposed method was called Density XCS Clustering (DXCSc), which mainly included the following three processes:1) Based on the learning classification system, regular population of input data was generated and compressed. 2) The generated rules were regarded as two-dimensional data points, and then the two-dimensional data points were clustered based on idea of density clustering. 3) The regular population after density clustering was properly aggregated to generate the final regular population. In the first process, the learning classifier system framework was used to generate and compact the regular population. In the second process, the rule cluster centers were characterized by a higher density than their neighbors and by a relatively large distance from points with higher densities. In the third process, the relevant clusters were properly merged using the graph segmentation method. In the experiments, the proposed DXCSc was compared with K-means, Affinity Propagation (AP) and Voting-XCSc on a number of challenging data sets. The experimental results show that the proposed approach outperforms K-means and Voting-XCSc in precision.

Small-size fingerprint matching based on deep learning

ZHANG Yongliang, ZHOU Bing, ZHAN Xiaosi, QIU Xiaoguang, LU Tianpei

2017, 37(11): 3212-3218. DOI: 10.11772/j.issn.1001-9081.2017.11.3212

Asbtract ( )

PDF (1270KB) ( )

References | Related Articles | Metrics

Focused on the issue that the traditional fingerprint matching methods based on minutiae are mainly applicable for large-size fingerprint and the accuracy rate would reduce significantly when dealing with small-size fingerprint from smart phone, a small-size fingerprint matching method based on deep learning was proposed. Firstly, the detailed information of minutiae was extracted from fingerprint images. Secondly, the Regions Of Interest (ROI) were searched and labeled based on minutiae. Then a lightweight deep neural network was built and improved from original residual module. In addition, binary feature pattern and triplet loss were used to optimize and train the proposed model respectively. Finally, the small-size fingerprint matching was accomplished with the fusion strategy of registration and matching. The experimental results show that the Equal Error Rate (EER) of the proposed method can reach 0.50% and 0.58% on public FVC_DB1 and in-house database respectively, which is much lower than the traditional fingerprint matching methods based on minutiae, and can improve the performance of small-size fingerprint matching effectively and meet the requirements on smart phone.

Cooperative differential evolution algorithm for large-scale optimization problems

DONG Xiaogang, DENG Changshou, TAN Yucheng, PENG Hu, WU Zhijian

2017, 37(11): 3219-3225. DOI: 10.11772/j.issn.1001-9081.2017.11.3219

Asbtract ( )

PDF (1056KB) ( )

References | Related Articles | Metrics

A new method of large-scale optimization based on divide-and-conquer strategy was proposed. Firstly, based on the principle of additive separability, an improved variable grouping method was proposed. The randomly accessing point method was used to check the correlation between all variables in pairs. At the same time, by making full use of the interdependency information of learning, the large groups of separable variables were re-grouped. Secondly, a new subcomponent optimizer was designed based on an improved differential evolution algorithm to enhance the subspace optimization performance. Finally, this two kinds of improvements were introduced to co-evolutionary framework to construct a DECC-NDG-CUDE (Cooperative differential evolution with New Different Grouping and enhancing Differential Evolution with Commensal learning and Uniform local search) algorithm. Two experiments of grouping and optimization were made on 10 large-scale optimization problems. The experimental results show the interdependency between variables can be effectively identified by the new method of grouping, and the performance of DECC-NDG-CUDE is better than two state-of-the-art algorithms DECC-D (Differential Evolution with Cooperative Co-evolution and differential Grouping) and DECCG (Differential Evolution with Cooperative Co-evolution and Random Grouping).

Super-resolution and frontalization in unconstrained face images

SUN Qiang, TAN Xiaoyang

2017, 37(11): 3226-3230. DOI: 10.11772/j.issn.1001-9081.2017.11.3226

Asbtract ( )

PDF (963KB) ( )

References | Related Articles | Metrics

Concerning the problem that face recognition is affected by the factors such as attitude, occlusion, resolution and so on, a method for image super-resolution and face frontalization in unconstrained image was proposed, which could generate high-quality and standard front images. The projection matrix between the input image and 3D model was estimated to generate the standard front image. Also, through the characteristics of face symmetry, the missing pixels by occlusion and attitude could be filled. In order to avoid the loss of pixel information during the process of generating standard front image and improve the image quality, a deeply-recursive convolutional network which had 16 layers was introduced for image super-resolution. To ease the difficulty of training, two extensions were proposed:recursive-supervision and skip-connection. The experimental results on the processed LFW datasets show that it is surprisingly effective when used for face recognition and gender estimation.

Foreground extraction with genetic mechanism and difference of Guassian

CHEN Kaixing, LIU Yun, WANG Jinhai, YUAN Yubo

2017, 37(11): 3231-3237. DOI: 10.11772/j.issn.1001-9081.2017.11.3231

Asbtract ( )

PDF (1023KB) ( )

References | Related Articles | Metrics

Aiming at the difficult problem of unsupervised or automatic foreground extraction, an automatic foreground extraction method based on genetic mechanism and difference of Gaussian, named GFO, was proposed. Firstly, Gaussian variation was used to extract the relative important regions in the image, which were defined as candidate seed foregrounds. Secondly, based on the edge information of the original image and the candidate seed foregrounds, the contour of foreground object contour was generated according to connectivity and convex sphere principle, called star convex contour. Thirdly, the adaptive function was constructed, the seed foreground was selected, and the genetic mechanism of selection, crossover and mutation was used to obtain the accurate and valid final foreground. The experimental results on the Achanta database and multiple videos show that the performance of the GFO method is superior to the existing automatic foreground extraction based on difference of Gaussian (FMDOG) method, and have achieved a good extraction effect in recognition accuracy, recall rate and F_β index.

Color based compact hierarchical image representation

ZHU Jie, WU Shufang, XIE Bojun, MA Liyan

2017, 37(11): 3238-3243. DOI: 10.11772/j.issn.1001-9081.2017.11.3238

Asbtract ( )

PDF (1047KB) ( )

References | Related Articles | Metrics

The spatial pyramid matching method provides the spatial information by splitting an image into different cells. However, spatial pyramid matching can not match different parts of the objects well. A hierarchical image representation method based on Color Level (CL) was proposed. The class-specific discriminative colors of different levels were obtained from the viewpoint of feature fusion in CL algorithm, and then an image was iteratively split into different levels based on these discriminative colors. Finally, image representation was constructed by concatenating the histograms of different levels. To reduce the dimensionality of image representation, the Divisive Information-Theoretic feature Clustering (DITC) method was used to cluster the dictionary, and the generated compact dictionary was used for final image representation. Classification results on Soccer, Flower 17 and Flower 102 datasets, demonstrate that the proposed method can obtain satisfactory results in these datasets.

Deep face age classification under unconstrained conditions

ZHANG Ke, GAO Ce, GUO Liru, YUAN Jinsha, ZHAO Zhenbing

2017, 37(11): 3244-3248. DOI: 10.11772/j.issn.1001-9081.2017.11.3244

Asbtract ( )

PDF (970KB) ( )

References | Related Articles | Metrics

Concerning low accuracy of age classification of face images under unrestricted conditions, a new method of face age classification under unconstrained conditions based on deep Residual Networks (ResNets) and large dataset pre-training was proposed. Firstly, the deep residual networks were used as the basis convolutional neural network model to deal with the problem of face age classification. Secondly, the deep residual networks were trained on the ImageNet dataset to learn the expression of basic image features. Thirdly, the large-scale face age images IMDB-WIKI was cleaned, and the IMDB-WIKI-8 dataset was established for fine-tuning the deep residual networks, and migration learning from the general object image to face age image was achieved to make the model adapt to the distribution of the age group and improve the network learning capability. Finally, the fine-tuned network model was trained and tested on the unconstrained Adience dataset, and the age classification accuracy was obtained by the cross-validation method. Through the comparison of 34/50/101/152-layer residual networks, it could be seen that the more layers of the network have the higher accuracy of age classification. And the best state-of-the-art age classification result on Adience dataset with the accuracy of 65.01% was achieved by using the 152-layer residual network. The experimental results show that the combination of deeper residual network and large dataset pretraining can effectively improve the accuracy of face age classification.

Clothing retrieval based on landmarks

CHEN Aiai, LI Lai, LIU Guangcan, LIU Qingshan

2017, 37(11): 3249-3255. DOI: 10.11772/j.issn.1001-9081.2017.11.3249

Asbtract ( )

PDF (1166KB) ( )

References | Related Articles | Metrics

At present, the same or similar style clothing retrieval is mainly text-based or content-based. The text-based algorithms often require massive labled samples, and the shortages of exist label missing and annotation difference caused by artificial subjectivity. The content-based algorithms usually extract image features, such as color, shape, texture, and then measured the similarity, but it is difficult to deal with background color interference, and clothing deformation due to different angles, attitude, etc. Aiming at these problems, clothing retrieval based on landmarks was proposed. The proposed method used cascaded deep convolutional neural network to locate the key points and combined the low-level visual information of the key point region as well as the high-level semantic information of the whole image. Compared with traditional methods, the proposed method can effectively deal with the distortion of clothing and complex background interference due to angle of view and attitude. Meanwhile, it does not need huge labeled samples, and is robust to background and deformation. Experiments on two large scale datasets Fashion Landmark and BDAT-Clothes show that the proposed algorithm can effectively improve the precision and recall.

Compressed sensing based data gathering in wireless sensor networks: a survey

QIAO Jianhua, ZHANG Xueying

2017, 37(11): 3261-3269. DOI: 10.11772/j.issn.1001-9081.2017.11.3261

Asbtract ( )

PDF (1635KB) ( )

References | Related Articles | Metrics

In order to have a comprehensive understanding and evaluation for the Compressive Data Gathering (CDG) in Wireless Sensor Network (WSN), a systematic introduction to the related research results at home and abroad so far was made. Firstly, the establishment of the frameworks of CDG and improved methods was introduced. Secondly, according to the transmission modes of WSN and Compressed Sensing (CS) theory respectively, the various methods of CDG were elaborated by classification. Then the problems of adaptation and optimization of CDG, the application of CS combined with other methods, and some examples of practical application were illustrated. Finally, the disadvantages in CDG and the development directions of CDG were pointed out.

Upper bounds on sum rate of 3D distributed MIMO systems over K fading cpmposite channels

PENG Hongxing, HU Yiwen, YANG Xueqing, LI Xingwang

2017, 37(11): 3270-3275. DOI: 10.11772/j.issn.1001-9081.2017.11.3270

Asbtract ( )

PDF (861KB) ( )

References | Related Articles | Metrics

Concerning the problems that Two-Dimensional Multiple-Input Multiple-Output (2D MIMO) systems only consider the effects of horizontal radiation pattern, ignoring the effects of vertical radiation pattern, and the closed-form on the sum rate of 2D MIMO system over K (Rayleigh/Gamma) fading channels involves special functions, two closed-form upper bounds on achievable sum rate of Three Dimensional Distributed Multiple-Input Multiple-Output (3D D-MIMO) systems with Zero-Forcing (ZF) receivers over K composite fading channels were proposed. The upper bounds considered Rayleigh multipath fading, Gamma shadow fading, geometric path-loss, 3D antenna radiation loss, and user distribution. The experimental results show that the obtained expressions accurately match with the Monte Carlo simulation conclusions.

Indoor positioning technology based on improved access point selection and K nearest neighbor algorithm

LI Xinchun, HOU Yue

2017, 37(11): 3276-3280. DOI: 10.11772/j.issn.1001-9081.2017.11.3276

Asbtract ( )

PDF (913KB) ( )

References | Related Articles | Metrics

Since indoor environment is complex and equal signal differences are assumed to equal physical distances in the traditional K Nearest Neighbor (KNN) approach, a new Access Point (AP) selection method and KNN indoor positioning algorithm based on scaling weight were proposed. Firstly, in the improved AP selection method, box plot was used to filter Received Signal Strength (RSS) outliers and create a fingerprint database. The AP with high loss rate in the fingerprint database were removed. The standard deviation was used to analyze the variations of RSS, and TOP-N APs with less interference were selected. Secondly, the scaling weight was introduced into the traditional KNN algorithm to construct a scaling weight model based on RSS. Finally, the first K reference points which obtained the minimum effective signal distance were calculated to get the unknown position coordinates. In the localization simulation experiments, the mean of error distance by improved AP selection method is 21.9% lower than that by KNN. The mean of error distance by the algorithm which introduced scaling weight is 1.82 m, which is 53.6% lower than that by KNN.

DCI control model of digital works based on blockchain

LI Yue, HUANG Junqin, WANG Ruijin

2017, 37(11): 3281-3287. DOI: 10.11772/j.issn.1001-9081.2017.11.3281

Asbtract ( )

PDF (1030KB) ( )

References | Related Articles | Metrics

In order to solve the problems of copyright registration, rampant piracy and copyright disputes faced by digital intellectual property under Internet ecology, a Digital Copyright Identifier (DCI) control model of digital works without trusted third party was proposed. Firstly, the Peer-to-Peer (P2P) architecture based on the concept of de-centralization of blockchain was constructed. The blockchain replaced the traditional database as the core of storage mechanism. Through the creation of transactions, construction of blocks, legitimacy validation and link of blocks a digital work blockchain transaction information storage structure was built, guaranteeing the copyright information not be tampered and traceable. Secondly, the digital distribution protocol based on smart contract was proposed, three types of contracts include copyright registration, inquiry and transfer were designed, and the transactions were generated by automatically executing the preset instructions to ensure the transparency and high efficiency of models. Theoretical analysis and simulation show that the probability of forged block attack is close to zero in the digital work blockchain network, compared with the traditional copyright authentication mechanism based on trusted third party, the model has better architectural security. The experimental results show that the model simplifies the threshold of digital copyright registration, enhances the authority of copyright certification and has better real-time and robustness.

Mining denial of service vulnerability in Android applications automatically

ZHOU Min, ZHOU Anmin, LIU Liang, JIA Peng, TAN Cuijiang

2017, 37(11): 3288-3293. DOI: 10.11772/j.issn.1001-9081.2017.11.3288

Asbtract ( )

PDF (1044KB) ( )

References | Related Articles | Metrics

Concerning the fact that when the receiver of an Intent does not validate empty data and abnormal data, the process will crash and cause denial of service, an automated Android component vulnerability mining framework based on static analysis techniques and fuzzing test techniques was proposed. In this framework, reverse analysis techniques and static data flow analysis techniques were used to extract package name, component, Intent with the data of a traffic and data flow paths from exported component to private component to assist fuzzing test. In addition, more mutation strategy on the attributes of Intent (such as Action, Category, Data and Extra) were added while generating Intent tests and the Accessibility technology was adopted to close the crash windows in order to realize automation. Finally, a tool named DroidRVMS was implemented, and a comparative experiment with Intent Fuzzer was designed to verify the validity of the framework. The experimental results show that DroidRVMS can find denial of service vulnerability resulting from dynamic broadcast receiver and most types of exceptions.

DEX unpacking technology in ART virtual machine

JIANG Zhongqing, ZHOU Anmin, JIA Peng

2017, 37(11): 3294-3298. DOI: 10.11772/j.issn.1001-9081.2017.11.3294

Asbtract ( )

PDF (980KB) ( )

References | Related Articles | Metrics

Based on the systematic study and research on the existing DEX packing and unpacking technologies, a DEX unpacking scheme based on Android ART Virtual Machine (VM) was proposed and implemented. The method could extract the original DEX file from the enhanced Android application. The core idea is to accomplish the zero-knowledge unpacking in a strong compatible way by combining simulation execution with static instrumentation. Firstly, the unpacking point was achieved by inserting monitoring codes into the interpreter of ART. Then, the memory location of the data belonging to original DEX file was obtained by performing simulation execution and analyzing related structs. Finally, the original DEX file was restored by collecting and reassembling the data according to the format of DEX file. The experimental results indicate that the proposed automatically unpacking method can well perform zero-knowledge unpacking by just bringing in little time delay when application launching.

Efficient verifiable outsourced decryption based on attribute-based encryption and fixed ciphertext length

LI Cong, YANG Xiaoyuan, WANG Xu'an, BAI Ping

2017, 37(11): 3299-3303. DOI: 10.11772/j.issn.1001-9081.2017.11.3299

Asbtract ( )

PDF (806KB) ( )

References | Related Articles | Metrics

The traditional key policy attribute base encryption and decryption scheme has the disadvantages that the ciphertext length increases linearly with the increase of the number of attributes, and consumes a large amount of communication bandwidth of the user in the communication process. The improved scheme of attribute encryption was proposed. Based on the encryption of key policy attributes, a verifiable packet decryption scheme with fixed ciphertext length was proposed. In the non-monotonic access structure, the cipher length was fixed, and the communication bandwidth was effectively saved. Through the improvement of outsourced key generation algorithm, a primary modular exponentiation operation was realized, and the generation time of key generation was effectively shortened.The hash function was used to realize the verification of the decryption and its security was proved.

Load-aware dynamic scheduling mechanism based on security strategies

GU Zeyu, ZHANG Xingming, LIN Senjie

2017, 37(11): 3304-3310. DOI: 10.11772/j.issn.1001-9081.2017.11.3304

Asbtract ( )

PDF (1196KB) ( )

References | Related Articles | Metrics

Concerning the flow rule tampering attacks and other single point vulnerability threats towards Software Defined Network (SDN) controller, traditional security solutions such as backup and fault-tolerant mechanisms which are based on passive defense defects, cannot fundamentally solve the control layer security issues. Combined with the current moving target defense and cyberspace mimic defense, a dynamic security scheduling mechanism based on heterogeneous redundant structure was proposed. A controller scheduling model was established in which the dynamic scheduling strategy was designed based on security principle combined with attack exception and heterogeneity. By considering the system load, the scheduling problem was transformed into a dynamic two-objective optimization problem by LA-SSA (Load-Aware Security Scheduling Algorithm) to achieve an optimal scheduling scheme. Simulation results show that compared with static structure, the dynamic scheduling mechanism has obvious advantages in cumulative number of exceptions and output safety rate, and the dynamic and diversity in the security scheduling mechanism can significantly improve the system's ability to resist attacks.The load variance of LA-SSA is more stable than that of safety priority scheduling, and the security imbalance is avoided, and the effectiveness of the security scheduling mechanism is verified.

Haze forecast based on time series analysis and Kalman filtering

ZHANG Hengde, XIAN Yunhao, XIE Yonghua, YANG Le, ZHANG Tianhang

2017, 37(11): 3311-3316. DOI: 10.11772/j.issn.1001-9081.2017.11.3311

Asbtract ( )

PDF (936KB) ( )

References | Related Articles | Metrics

In order to improve the accuracy of haze forecast and resolve the time lagging and low accuracy of temporal model, a mixed forecast method based on time series analysis and Karman filter was proposed. Firstly, the stability of time series was tested by graph analysis and eigenvalue analysis (ADF). Unstable time series were converted to stable ones by differential operation. A statistical function was established based on the stable time series. And then, the obtained model equations were used as the state and observation equation for Kalman filtering. Final haze forecast was based on recursion by Karman filtering. The experimental results showed that the accuracy of haze forecast is effectively improved by the mixed forecast method based on time series analysis and Karman filtering.

Short-term power load forecasting method combining with multi-algorithm & multi-model and online second learning

ZHOU Mo, JIN Min

2017, 37(11): 3317-3322. DOI: 10.11772/j.issn.1001-9081.2017.11.3317

Asbtract ( )

PDF (1027KB) ( )

References | Related Articles | Metrics

In order to improve the forecasting accuracy of the short-term power load, a forecasting method combining multi-algorithm & multi-model and online second learning was newly proposed. First, the input variables were selected by using mutual information and statistical information and a dataset was constructed. Then, multiple training sets were generated by performing diverse sampling with bootstrap on the original training set. Multiple models were obtained using different artificial intelligence and machine-learning algorithms. Finally, the offline second-learning method was improved. A new training set was generated using the actual load, and the multi-model forecasts for recent period within the forecasted time, which is trained by online second learning to obtain the final forecasting results. The load in Guangzhou, China was studied. Compared to the optimal single-model, single-algorithm & multi-model and multi-algorithm & single-model, Mean Absolute Percentage Error (MAPE) of the proposed model was reduced by 21.07%, 7.64% and 5.00%, respectively, in the daily total load forecasting, and by 16.02%, 7.60%, and 13.14%, respectively, in the daily peak load forecasting. The experimental results show that the proposed method can improve the prediction accuracy of the power load, reduce costs, implement optimal scheduling management, and ensure security with early warnings in smart grids.

College enrollment consultation algorithm based on deep autoencoders

FENG Shizhou, ZHOU Shangbo

2017, 37(11): 3323-3329. DOI: 10.11772/j.issn.1001-9081.2017.11.3323

Asbtract ( )

PDF (1124KB) ( )

References | Related Articles | Metrics

College enrollment consultation service usually relies on artificial reply or keyword matching Question and Answer (Q&A) system, which exists the problems of low efficiency and irrelevant answers. In addition, a consultation text is often a short statement, therefore its vectorized representation may easily lead to the high-dimensional sparse problem. To solve the problems mentioned above, an enrollment consultation algorithm based on Stacked Denoising Sparse AutoEncoders (SDSAE) was proposed. First of all, to improve generalization ability of the algorithm, an autoencoder network was used to extract features and reduce the data dimension; at the same time, dataset enhancement technique and noise-adding technique were introduced to solve the problems of small training sample set and uneven classification. After low dimensional representation of short texts being obtained, a text classification was conducted afterwards by using Back Propagation (BP) algorithm. The experimental results show that the proposed algorithm has a better classification performance over BP, Support Vector Machine (SVM), Extreme Learning Machine (ELM) algorithm and etc., and it significantly improves the classification effect of enrollment consultant texts.

Production scheduling and preventive maintenance integrated optimization based on catastrophe mechanism

WU Qingsong, YANG Hongbing, FANG Jia

2017, 37(11): 3330-3334. DOI: 10.11772/j.issn.1001-9081.2017.11.3330

Asbtract ( )

PDF (769KB) ( )

References | Related Articles | Metrics

On the purpose of integrated optimization of production scheduling and preventive maintenance for multi-product tasks which in producing workshops, an integrated optimization model of production scheduling and preventive maintenance was established comprehensively, in which processing sequence, batch quantity, preventive maintenance measures and other factors were taken into account consequently, on the premise that there are sufficient orders, as well as the joint optimization objective to minimize overall manufacturing costs and processing time. In view of the characteristics of the model, based on the non-dominated sorting genetic algorithm, a single-parent genetic algorithm with variable-length genome was put forward as the resolving method for the model based on the catastrophe mechanism and glory space, which keeps in combination with introducing interruption and splice operators. Besides, under different parameter conditions and various scales of problems, simulative experiments were conducted to verify the efficiency of the proposed algorithm to resolve complex integrated optimization problems of production scheduling and preventive maintenance.

Session identification algorithm based on dynamic time threshold of adjacent requests

ZENG Ling, XIAO Ruliang

2017, 37(11): 3335-3338. DOI: 10.11772/j.issn.1001-9081.2017.11.3335

Asbtract ( )

PDF (674KB) ( )

References | Related Articles | Metrics

Focusing on the issue of improving the efficiency of session sequence modeling in the anomaly detection analysis of big data platform, a session identification algorithm based on Dynamic Adjustive Interval Time threShold of adjacent requests (DAITS) was proposed. Firstly, the factor of website pages and the average factor of users access time to the page were combined. Then, the appropriate weighting factor was used to dynamically adjust the time threshold. Finally, the session was divided according to whether the time threshold was exceeded. The experimental results show that compared with the traditional methods of using fixed thresholds, the precision of session identification was increased by 14.8% and the recall was increased by 13.2%; compared with the existing methods with dynamic adjustive thresholds, the precision of session identification was increased by 6.2% and the recall was increased by 3.2%.

Brain network analysis and classification for patients of Alzheimer's disease based on high-order minimum spanning tree

GUO Hao, LIU Lei, CHEN Junjie

2017, 37(11): 3339-3344. DOI: 10.11772/j.issn.1001-9081.2017.11.3339

Asbtract ( )

PDF (1091KB) ( )

References | Related Articles | Metrics

The use of resting-state functional magnetic resonance imaging to study the functional connectivity network of the brain is one of the important methods of current brain disease research. This method can accurately detect a variety of brain diseases, including Alzheimer's disease. However, the traditional network only studies the correlation between the two brain regions, and lacks a deeper interaction between the brain regions and the association between functional connections. In order to solve these problems, a method was proposed to construct a functional connectivity network of high-order minimum spanning tree, which not only ensured the physiological significance of functional connectivity network, but also studied more complex interactive information in the network and improves the accuracy of classification. The classification results show that the resting-state functional magnetic resonance imaging classification method based on the functional connectivity network of high-order minimum spanning tree greatly improves the accuracy of Alzheimer's disease detection.

Table of Content