Journal of Computer Applications

Ensemble learning based on probability calibration

JIANG Zhengshen, LIU Hongzhi

2016, 36(2): 291-294. DOI: 10.11772/j.issn.1001-9081.2016.02.0291

Asbtract ( )

PDF (800KB) ( )

References | Related Articles | Metrics

Since the lackness of diversity may lead to bad performance in ensemble learning, a new two-phase ensemble learning method based on probability calibration was proposed, as well as two methods to reduce the impact of multiple collinearity. In the first phase, the probabilities given by the original classifiers were calibrated using different calibration methods. In the second phase, another classifier was trained using the calibrated probabilities and the final result was predicted. The different calibration methods used in the first phase provided diversity for the second phase, which has been shown to be an important factor to enhance ensemble learning. In order to address the limited improvement due to the correlation between base classifiers, two methods to reduce the multiple collinearity were also proposed, that is, choose-best and bootstrap sampling method. The choose-best method just selected the best base classifier among original and calibrated classifiers; the bootstrap method combined a set of classifiers, which were chosen from the base classifiers with replacement. The experimental results showed that the use of different calibrated probabilities indeed improved the effectiveness of the ensemble; after using the choose-best and bootstrap sampling methods, further improvement was also achieved. It means that probability calibration provides a new way to produce diversity, and the multiple collinearity caused by it can be solved by sampling method.

Research and application of high-precision indoor location-aware big data

DENG Zhongliang, ZHANG Senjie, JIAO Jichao, XU Lianming

2016, 36(2): 295-300. DOI: 10.11772/j.issn.1001-9081.2016.02.0295

Asbtract ( )

PDF (985KB) ( )

References | Related Articles | Metrics

With the development of indoor positioning technology, a large amount of indoor location data and user data for consumer behavior makes the indoor Location Big Data (LBD) research and application possible. High-precision indoor location technology breaks the bottleneck of indoor location data with low accuracy. By clustering the indoor location data and dimension reduction pretreatment, a mining model was set up to extract the characteristics of custom and flow in the indoor shopping area. Then using the associated user consumption behavior to predict the characteristics of consumer behaviors, a collaborative mining method and architecture for large data of indoor location was put forward. Experiments on location dataset of billions of users in an airport and a shopping mall in Xidan were conducted. The results verify the accuracy and feasibility of the mining method based on this architecture of indoor positioning data.

Space coordinate transformation algorithm for built-in accelerometer data of smartphone

ZHAO Hong, GUO Lilu

2016, 36(2): 301-306. DOI: 10.11772/j.issn.1001-9081.2016.02.0301

Asbtract ( )

PDF (896KB) ( )

References | Related Articles | Metrics

The coordinate system for smartphones' built-in acceleration sensor is fixed on the equipment itself, the data collected by the smartphone is constantly drifting due to the change of smartphone's posture. Affected by this, even the same movement process, the acceleration is difficult to keep consistent with the previous one. To solve this problem, the acceleration was mapped from smartphone to inertial coordinate system by using space coordinate transformation algorithm, to ensure that the sensor data can accurately reflect actual motion state no matter in what gesture the smartphone is. To verify the effectiveness of this method, a new method for online acquiring and real-time processing smartphone's sensor data was designed. With this method, the feasibilities of direction cosine algorithm and quaternion algorithm were tested in rotation experiments. Then, the performance of quaternion algorithm was further tested in pedometer experiments. The experimental results show that the direction cosine algorithm fails to achieve comprehensive coordinate transformation due to the measurement range limit; while the quaternion algorithm based on rotation vector sensor data can achieve full conversion, and the recognition rate of gait using transformed acceleration is over 95%, which can accurately reflect the actual state of motion.

Application of factorization machine in mobile App recommendation based on deep packet inspection

SUN Liangjun, FAN Jianfeng, YANG Wanqi, SHI Yinhuan

2016, 36(2): 307-310. DOI: 10.11772/j.issn.1001-9081.2016.02.0307

Asbtract ( )

PDF (550KB) ( )

References | Related Articles | Metrics

To extract features from Deep Packet Inspection (DPI) data and perform mobile application recommendation, using the DPI data collected from Internet Service Provider (ISP) in Jiangsu Telecom, the access history data of active users defined by the communications operator was processed by matrix factorization recommendation (including Singular Value Decomposition (SVD) and Non-negtive Matrix Factorization (NMF)), SVD recommendation and factorization machine recommendation algorithms for mobile phone application recommendation. The results show that factorization machine algorithm achieves better performance, it means that factorization machine algorithm can better describe the latent connection in the user-item relationship.

Parallel optimization sampling clustering K-means algorithm for big data processing

ZHOU Runwu, LI Zhiyong, CHEN Shaomiao, CHEN Jing, LI Renfa

2016, 36(2): 311-315. DOI: 10.11772/j.issn.1001-9081.2016.02.0311

Asbtract ( )

PDF (883KB) ( )

References | Related Articles | Metrics

Focusing on the low accuracy and slow convergence of K-means clustering algorithm, an improved K-means algorithm based on optimization sample clustering named OSCK (Optimization Sampling Clustering K-means Algorithm) was proposed. Firstly, multiple samples were obtained from mass data by probability sampling. Secondly, based on Euclidean distance similarity principle of optimal clustering center, the results of sample clustering were modeled and evaluated, and the sub-optimal solution of sample clustering results was removed. Finally, the final k clustering centers were got by weighted integration evaluation of clustering results, and the final k clustering centers were used as cluster centers of big data set. Theoretical analysis and experimental results show that the proposed method for mass data analysis with respect to the comparison algorithm has better clustering accuracy, and has strong robustness and scalability.

Parallelized recommendation algorithm in location-based social network

ZENG Xuelin, WU Bin

2016, 36(2): 316-323. DOI: 10.11772/j.issn.1001-9081.2016.02.0316

Asbtract ( )

PDF (1398KB) ( )

References | Related Articles | Metrics

Since the traditional collaborative filtering algorithm cannot make full use of information implied in check-ins of users in recommendation process, which contains users' preference, location and social relationship, a recommendation algorithm was proposed, which exploits past user behavior, the check-in information and social relation of users to improve the precision of Point of Interests (POI) recommendation, namely Location-Friendship Based Collaborative Filtering (LFBCF). And the recommendation was implemented on distributed computing platform Spark to support large scale dataset in experiments. Two real datasets in Location-based Social Network (LBSN) including Gowalla and Brightkite were employed in experiments. The amount of check-ins, the distance between locations and the social relationship were analyzed to verify the proposed algorithm. The comparison of precision and F-measure with traditional algorithm confirms the effectiveness of the proposed algorithm; and the comparison of speed-up ratio between the parallelized algorithm and serial algorithm demonstrates the significance of parallelization and superiority of performance.

Multi-objective immune system algorithm for task scheduling in cloud computing

DUAN Kairong, ZHANG Gongxuan

2016, 36(2): 324-329. DOI: 10.11772/j.issn.1001-9081.2016.02.0324

Asbtract ( )

PDF (874KB) ( )

References | Related Articles | Metrics

To address the task scheduling problem in cloud, a new multi-objective immune system algorithm named IMISA (Improved Multi-objective Immune System Algorithm) was proposed to optimize completion time and monetary cost simultaneously. In this work, the assignment of the fitness value was different from the traditional way, the antibodies were divided into non-dominated antibodies and dominated antibodies. The exclusive dominated area was regarded as the antigen affinity, and the polygonal area surrounded by all the non-dominated solutions and a selection of the dominated antibody were treated as the corresponding antibody-antigen affinity. After that, clonal probability was calculated according to these affinities and new offspring was generated by cloning and mutation. Simulation experiments on CloudSim indicate that, compared with NSGA-Ⅱ and Multi-objective Immune System Algorithm (MISA), the proposed algorithm can produce schemes with shorter completion time and lower monetary cost, and it also can achieve Pareto fronts with better quality in both convergence and diversity.

Distributed deduplication storage system based on Hadoop platform

LIU Qing, FU Yinjin, NI Guiqiang, MEI Jianmin

2016, 36(2): 330-335. DOI: 10.11772/j.issn.1001-9081.2016.02.0330

Asbtract ( )

PDF (985KB) ( )

References | Related Articles | Metrics

Focusing on the issues that there is a lot of data redundancy in data center, especially the backup data has caused a tremendous waste on storage space, a deduplication prototype based on Hadoop platform was proposed. Deduplication technology which detects and eliminates redundant data in a particular data set can greatly reduce the data storage capacity and optimize the utilization of storage space. Using the two big data management tools——Hadoop Distributed File System (HDFS) and non-relational database HBase, a scalable and distributed deduplication storage system was designed and implemented. In this system, the MapReduce parallel programming framework was responsible for parallel deduplication, and HDFS was responsible for data storage after deduplication. The index table was stored in HBase for efficient chunk fingerprint indexing. The system was also tested with virtual machine image file sets. The results demonstrate that the Hadoop based distributed deduplication system can ensure high throughput and excellent scalability as well as guaranting high deduplication rate.

Strength model of user relationship based on latent regression

HAN Zhongming, TAN Xusheng, CHEN Yan, YANG Weijie

2016, 36(2): 336-341. DOI: 10.11772/j.issn.1001-9081.2016.02.0336

Asbtract ( )

PDF (1017KB) ( )

References | Related Articles | Metrics

To effectively measure the strength of the directed relationship among the users in social network, based on the directed interaction frequency, a smooth model for computing the interaction strength of the user was proposed. Furthermore, user interaction strength was taken as dependent variable and user relationship strength was taken as latent variable, a latent regression model was constructed, and an Expectation-Maximization (EM) algorithm for parameter estimation of the latent regression model was given. Comprehensive experiments were conducted on two datasets extracted from Renren and Sina Weibo in the aspects of the best friends and the intensity ranking. On Renren dataset, the result of TOP-10 best friends chosen by the proposed model was compared with that of manual annotation, the mean of Normalized Discounted Cumulative Gain (NDCG) of the model was 69.48%, the average of Mean Average Precision (MAP) of the model was 66.3%, both of the parameters were significantly improved; on Sina Weibo dataset, the range of infection spread by nodes with higher relationship strength increased by 80% compared to the other nodes. The experimental results show that the proposed model can effectively measure user relationship strength.

Parallel fuzzy C-means clustering algorithm in Spark

WANG Guilan, ZHOU Guoliang, SA Churila, ZHU Yongli

2016, 36(2): 342-347. DOI: 10.11772/j.issn.1001-9081.2016.02.0342

Asbtract ( )

PDF (901KB) ( )

References | Related Articles | Metrics

With the growing data volume and timeliness requirement, the clustering algorithms need to be adaptive to big data and higher performance. A new algorithm named Spark Fuzzy C-Means (FCM) was proposed based on Spark distributed in-memory computing platform. Firstly, the matrix was partitioned into vector set horizontally and distributedly stored, which meant different vectors were distributed in different nodes. Then based on the characteristics of FCM algorithm, matrix operations were redesigned considering distributed storage and cache sensitivity, including multiplication, addition and transpose. Finally, Spark-FCM algorithm which combined with matrix operations and Spark platform was implemented. The primary data structures of the algorithm adopted distributed matrix storage with fewer moving data between nodes and distributed computing in each step. The test results in stand-alone and cluster environments show that Spark-FCM has good scalability and can adjust to large-scale data sets, the performance and the size of data shows a linear relationship, and the performance in cluster environment is 2 to 3 times higher than that in stand-alone.

Parallel cube computing in Spark

SA Churila, ZHOU Guoliang, SHI Lei, WANG Liuwang, SHI Xin, ZHU Yongli

2016, 36(2): 348-352. DOI: 10.11772/j.issn.1001-9081.2016.02.0348

Asbtract ( )

PDF (769KB) ( )

References | Related Articles | Metrics

In view of the poor real-time response capability of traditional OnLine Analytical Processing (OLAP) when processing big data, how to accelerate computation of data cubes based on Spark was investigated, and a memory-based distributed computing framework was put forward. To improve parallelism degree and performance of Bottom-Up Construction (BUC), a novel algorithm for computation of data cubes was designed based on Spark and BUC, referred to as BUCPark (BUC on Spark). Moreover, to avoid the expansion of iterative data cube in memory, BUCPark was fruther improved to LBUCPark (Layered BUC on Spark) which could take full advantage of reused and shared memory mechanism. The experimental results show that LBUCpark outperforms BUC and BUCPark algorithms in terms of computing performace, and it is capable of computing data cube efficiently in big data era.

Star join algorithm based on multi-dimensional Bloom filter in Spark

ZHOU Guoliang, SA Churila, ZHU Yongli

2016, 36(2): 353-357. DOI: 10.11772/j.issn.1001-9081.2016.02.0353

Asbtract ( )

PDF (765KB) ( )

References | Related Articles | Metrics

To meet the high performance analysis requirements for real-time data in On-Line Analytical Processing (OLAP) system, a new star join algorithm which is suitable for Spark platform was proposed based on Multi-Dimensional Bloom Filter (MDBF), namely SMDBFSJ (Spark Multi-Dimensional Bloom Filter Star Join). First of all, MDBF was built according to the dimension tables and broadcasted to all the nodes based on the feature of small size. Then the fact table was filtered completely on the local node, and there was no data movement between nodes. Finally, the filtered fact table and dimension tables were joined using repartition join model to get the final result. SMDBFSJ algorithm avoides the data moving of fact table, reduces the size of broadcasting data using MDBF, as well as fully combines the advantages of broadcast join and repartition join. The experimental results prove the validity of SMDBFSJ algorithm, in stand-alone and cluster environments. SMDBFSJ algorithm can obtain about three times of performance improvement compared with repartition join in Spark.

Similarity nodes query algorithm on large dynamic graph based on the snapshots

SONG Baoyan, JI Wanting, DING Linlin

2016, 36(2): 358-363. DOI: 10.11772/j.issn.1001-9081.2016.02.0358

Asbtract ( )

PDF (951KB) ( )

References | Related Articles | Metrics

In the evolution of dynamic graph topology, in order to quantify the change of the relation between the nodes within a certain time, a concept, namely ubiquitous similarity node, was defined, and the level of ubiquitous similarity with the current node was measured by the frequent degree of interaction with the current node and the uniformity of distribution, and a similarity node query processing algorithm for large dynamic graph based on the snapshots was proposed. The concrete content includes: the snapshot expression of the dynamic evolution of graph, namely evolution dynamic graph; the semantic representation and its formal representation of the nodes' ubiquitous similarity in the dynamic evolution of graph, which was characterized by the frequent degree of interaction and uniformity coefficient of distribution; the matrix representation and processing method of the semantic of the nodes' ubiquitous similarity; the query algorithm for ubiquitous similarity nodes. The experimental results on the synthetic dataset and the real dataset show that the proposed algorithm can deal with the nodes' ubiquitous similarity query on the large dynamic graph, and be implemented in the practical applications.

Mobile social network oriented user feature recognition of age and sex

LI Yuanhao, LU Ping, WU Yifan, WEI Wei, SONG Guojie

2016, 36(2): 364-371. DOI: 10.11772/j.issn.1001-9081.2016.02.0364

Asbtract ( )

PDF (1248KB) ( )

References | Related Articles | Metrics

Mobile social network data has complex network structure, mutual label influence between nodes, variety of information including interactive information, location information, and other complex information. As a result, it brings many challenges to identify the characteristics of the user. In response to these challenges, a real mobile network was studied, the differences between the tagged users with different characteristics were extracted using statistical analysis, then the user's features of age and sex were recognized using relational Markov network prediction model. Analysis shows that the user of different age and sex has significant difference in call probability at different times, call entropy, distribution and discreteness of location information, gather degree in social networks, as well as binary and ternary interaction frequency. With these features, an approach for inferring the user's age and gender was put forward, which used the binary and ternary interaction relation group template, combined with the user's own temporal and spatial characteristics, and calculated the total joint probability distribution by relational Markov network. The experimental results show that the prediction accuracy of the proposed recognition model is at least 8% higher compared to the traditional classification methods, such as C4.5 decision tree, random forest, Logistic regression and Naive Bayes.

Parallel sparse subspace clustering via coordinate descent minimization

WU Jieqi, LI Xiaoyu, YUAN Xiaotong, LIU Qingshan

2016, 36(2): 372-376. DOI: 10.11772/j.issn.1001-9081.2016.02.0372

Asbtract ( )

PDF (877KB) ( )

References | Related Articles | Metrics

Since the rapidly increasing data scale imposes a great computational challenge to the problem of Sparse Subspace Clustering (SSC), the existing optimization algorithms e.g. ADMM (Alternating Direction Method of Multipliers) for SSC are implemented in a sequential way which is unable to make use of multi-core processors to improve computational efficiency. To address this issue, a parallel SSC based on coordinate descent was proposed,inspired by a simple observation that the SSC can be formulated as a sequence of sample based sparse self-expression sub-problems. The proposed algorithm solves individual sub-problems by using a coordinate descent algorithm with fewer parameters and fast convergence. Based on the fact that the self-expression sub-problems are independent, a strategy was adopted to solve these sub-problems simultaneously on different processor cores, which brings the benefits of low computer resource consumption and fast running speed, it means that that the proposed algorithm is suitable for large scale clustering. Experiments on simulated data and Hopkins-155 motion segmentation dataset demonstrate that the proposed parallel SSC method on multi-core processors significantly improves the computational efficiency and ensures the accuracy when compared with ADMM.

Stochastic nonlinear dimensionality reduction based on nearest neighbors

TIAN Shoucai, SUN Xili, LU Yonggang

2016, 36(2): 377-381. DOI: 10.11772/j.issn.1001-9081.2016.02.0377

Asbtract ( )

PDF (781KB) ( )

References | Related Articles | Metrics

As linear dimensionality reduction methods usually cannot produce satisfactory low-dimensional embedding when applied to data with nonlinear structure, a new nonlinear dimensionality reduction method named NNSE was proposed to keep the local nearest neighbor information in the high-dimensional space. Firstly, the nearest neighbor points were found by calculating the Euclidean distance between the sample points in the high-dimensional space, then a random initial distribution of the data points was generated in the low-dimensional space. Secondly, by moving the data points towards the mean position of their nearest neighbors found in the high-dimensional space, the data point positions were iteratively optimized until the embedding becomes stable. In the comparison with a state-of-the-art nonlinear stochastic dimensionality reduction method named t-SNE (t-distributed Stochastic Neighbor Embedding), the low-dimensional embedding produced by NNSE method is similar to the visualization produced by the t-SNE method. However, it is shown that the NNSE method is superior to t-SNE in preserving the local nearest neighbor information in the low-dimensional embedding by using a quantitative indicator.

Universal steganalysis scheme of advanced audio coding

XIONG Hao, REN Yanzhen, WANG Lina

2016, 36(2): 382-386. DOI: 10.11772/j.issn.1001-9081.2016.02.0382

Asbtract ( )

PDF (893KB) ( )

References | Related Articles | Metrics

Focusing on the issue that Advanced Audio Coding (AAC) is being transmitted unsafely and the steganalysis scheme's development has fallen behind relatively, a universal steganalysis scheme of AAC compressed domain was proposed. Based on the embedding influence caused by several known AAC Modefied Discrete Cosine Transform (MDCT) stegonagraphy methods, several steganalysis sub-featrues were constructed from multi-order differential correlations of AAC's inter-frame and intra-frame MDCT coefficients, then the sub-features were fused by different weights according to AAC's own coding principle, and the random forest classifier was used to distinguish embedded audio and normal audio. The experimental results show that the proposed steganalysis scheme has good performance in general detection compared to the existing algorithm; especially, the detecting rate of each stegonagraphy method is higher than 80% when the realative embedding rate is 50%.

Semi-supervised extreme learning machine and its application in analysis of near-infrared spectroscopy data

JING Shibo, YANG Liming, LI Junhui, ZHANG Siyun

2016, 36(2): 387-391. DOI: 10.11772/j.issn.1001-9081.2016.02.0387

Asbtract ( )

PDF (729KB) ( )

References | Related Articles | Metrics

When insufficient training information is available, supervised Extreme Learning Machine (ELM) is difficult to use. Thus applying semi-supervised learning to ELM, a Semi-Supervised ELM (SSELM) framework was proposed. However, it is difficult to find the optimal solution of SSELM due to its nonconvexity and nonsmoothness. Using combinatorial optimization method, SSELM was solved by reformulating SSELM as a linear mixed integer program. Furthermore, SSELM was used for the direct recognition of medicine and seeds datasets using Near-InfraRed spectroscopy (NIR) technology. Compared with the traditional ELM methods, the experimental results show that SSELM can improve the generation when insufficient training information is available, which indicates the feasibility and effectiveness of the proposed method.

Classification algorithm of support vector machine with privacy preservation based on information concentration

DI Lan, YU Xiaotong, LIANG Jiuzhen

2016, 36(2): 392-396. DOI: 10.11772/j.issn.1001-9081.2016.02.0392

Asbtract ( )

PDF (862KB) ( )

References | Related Articles | Metrics

The classificationn decision process of Support Vector Machine (SVM) involves the study of original training samples, which easily causes privacy disclosure. To solve this problem, a classification approach with privacy preservation called IC-SVM (Information Concentration Support Vector Machine) was proposed based on information concentration. Firstly, the original training data was concentrated using Fuzzy C-Means (FCM) clustering algorithm according to each sample point and its neighbors. Then clustering centers were reconstructed to get new samples through information concentration. Finally, the new samples were trained to get decision function, by which classification was done. The experimental results on UCI and PIE show that the proposed method achieves good classification accuracy as well as preventing privacy disclosure.

k-nearest neighbor data imputation algorithm combined with locality sensitive Hashing

ZHENG Qibin, DIAO Xingchun, CAO Jianjun, ZHOU Xing, XU Yongping

2016, 36(2): 397-401. DOI: 10.11772/j.issn.1001-9081.2016.02.0397

Asbtract ( )

PDF (814KB) ( )

References | Related Articles | Metrics

k-Nearest Neighbor (kNN) algorithm is commonly used in data imputation. It is of poor efficiency because of the similarity computation between every tow records. To solve the efficiency problem, an improved kNN data imputation algorithm combined with Locality Sensitive Hashing (LSH) named LSH-kNN was proposed. First, all the complete records were indexed in LSH way. Then corresponding LSH ways for nominal, numeric and mixed-type incomplete data were put forward, and LSH values for all the incomplete records were computed in the proposed way to find candidate similar records. Finally, the incomplete records' real distance to candidate similar records were calculated, and the top-k similar records for kNN imputation were found. The experimental results show that the proposed method LSH-kNN has higher efficiency than traditional kNN as well as keeping almost the same accuracy.

Cross-domain access control for e-government cloud based on classification

CHI Yaping, WANG Yan, WANG Huili, LI Xin

2016, 36(2): 402-407. DOI: 10.11772/j.issn.1001-9081.2016.02.0402

Asbtract ( )

PDF (875KB) ( )

References | Related Articles | Metrics

Since the access control grain is not enough fine while users share resource during e-government cloud cross-domain access, a cross-domain access control scheme based on user's classification was proposed. In this scheme, a typical cloud computing access control mechanism——Identity and Access-control Management (IAM) was adopted, the assertion attribute authentication based on user classification was implemented, the obstruction caused by heterogeneity during resource sharing was also eliminated, and a fine-grained cross-domain access control mechanism was provided. Finally, a cross-domain system for cloud computer environment based on Shibboleth and secure component keystone of OpenStack was built, the feasibility of the scheme was proved by the test of comparing the tokens between inter-domain and outer-domain of a user.

Recognition of Chinese news event correlation based on grey relational analysis

LIU Panpan, HONG Xudong, GUO Jianyi, YU Zhengtao, WEN Yonghua, CHEN Wei

2016, 36(2): 408-413. DOI: 10.11772/j.issn.1001-9081.2016.02.0408

Asbtract ( )

PDF (895KB) ( )

References | Related Articles | Metrics

Concerning the low accuracy of identifying relevant Chinese events, a correlation recognition algorithm for Chinese news events based on Grey Relational Analysis (GRA) was proposed, which is a multiple factor analysis method. Firstly, three factors that affect the event correlation, including co-occurrence of triggers, shared nouns between events and the similarity of the event sentences, were proposed through analyzing the characteristics of Chinese news events. Secondly, the three factors were quantified and the influence weights of them were calculated. Finally, GRA was used to combine the three factors, and the GRA model between events was established to realize event correlation recognition. The experimental results show that the three factors for event correlation recognition are effective, and compared with the method only using one influence factor, the proposed algorithm improves the accuracy of event correlation recognition.

News recommendation method by fusion of content-based recommendation and collaborative filtering

YANG Wu, TANG Rui, LU Ling

2016, 36(2): 414-418. DOI: 10.11772/j.issn.1001-9081.2016.02.0414

Asbtract ( )

PDF (678KB) ( )

References | Related Articles | Metrics

To solve poor diversity problem of user interests in content-based news recommendation and cold-start problem in hybrid recommendation, a new method of news recommendation based on fusion of content-based recommendation and collaborative filtering was proposed. Firstly, the content-based method was used to find the user's interest. Secondly, similar user group of the target user was found out by using hybrid similarity pattern which contains content similarity and behavior similarity, and the user's potential interest was found by predicting the user's interest in feature words. Next, the user interest model with characteristics of personalization and diversity was obtained by fusing user's existed interest and potential interest. Lastly, the recommendation list was output after calculating the similarity of candidate news and fusion model. The experimental results show that, compared with the content-based recommendation methods, the proposed method obviously increases F-measure and Diversity; and it has equivalent performance with hybrid recommendation method, however it does not need time to accumulate enough user clicks of candidate news and has no cold start problem.

Personalized recommendation algorithm based on location bitcode tree

LIANG Junjie, GAN Wenting, YU Dunhui

2016, 36(2): 419-423. DOI: 10.11772/j.issn.1001-9081.2016.02.0419

Asbtract ( )

PDF (915KB) ( )

References | Related Articles | Metrics

Since collaborative filtering recommendation algorithm is inefficient in large data environment, a personalized recommendation algorithm based on location bitcode tree, called LB-Tree, was developed. Combined with the characteristics of the MapReduce framework, a novel approach which applyed the index structure in personalized recommendation processing was proposed. For efficient parallel computing in MapReduce, a novel storage strategy based on the differences between clusters was presented. According to the distribution, each cluster was partitioned into several layers by concentric circles with the same centroid, and each layer was expressed by binary bitcodes with different length. To make the frequently recommended data search path shorter and quickly determine the search space by using the index structure, an index tree was constructed by bitcodes of all the layers. Compared with the Top-N recommendation algorithm and Similarity-Based Neighborhood Method (SBNM), LB-Tree has the highest accuracy with the slowest time-increasing, which verifies the effectiveness and efficiency of LB-Tree.

Automatic identification of new sentiment word about microblog based on word association

CHEN Xin, WANG Suge, LIAO Jian

2016, 36(2): 424-427. DOI: 10.11772/j.issn.1001-9081.2016.02.0424

Asbtract ( )

PDF (609KB) ( )

References | Related Articles | Metrics

Aiming at new sentiment word identification, an automatic extraction of new words about microblog was proposed based on the word association. Firstly, a new word, which was incorrectly separated into several words using the Chinese auto-segmentation system, should be assembled as the candidate word. In addition, to make full use of the semantic information of word context, the spatial representation vector of the candidate words was obtained by training a neural network. Finally, using the existing emotional vocabulary as a guide, combining the association-sort algorithm based on vocabulary list and the max association-sort algorithm, the final new emotional word was selected from candidate words. The experimental results on the task No. 3 of COAE2014 show that the precision of the proposed method increases at least 22%, compared to Pointwise Mutual Information (PMI), Enhanced Mutual Information (EMI), Normalized Multi-word Expression Distance (NMED), New Word Probability (NWP), and identification of new sentiment word based on word embedding, which proves the effectiveness of the proposed method.

Multimedia sentiment analysis based on convolutional neural network

CAI Guoyong, XIA Binbin

2016, 36(2): 428-431. DOI: 10.11772/j.issn.1001-9081.2016.02.0428

Asbtract ( )

PDF (787KB) ( )

References | Related Articles | Metrics

In recent years, more and more multimedia contents were used on social media to share users' experiences and emotions. Compared to single text or image, the complementation of text and image can further fully reveal the real emotion of users. Concerning the sentiment shortage of single text or image, a method based on Convolutional Neural Network (CNN) was proposed for multimedia sentiment analysis. In order to explore the influence of semantic representation in different level, image features were combined with different level (word-level, phrase-level and sentence-level) text features to construct CNN. The experimental results on two real-world datasets demonstrate that the proposed method gets more accurate prediction on multimedia sentiment analysis by capturing the internal relations between text and image.

Automatic short text summarization method based on multiple mapping

LU Ling, YANG Wu, CAO Qiong

2016, 36(2): 432-436. DOI: 10.11772/j.issn.1001-9081.2016.02.0432

Asbtract ( )

PDF (860KB) ( )

References | Related Articles | Metrics

Traditional automatic text summarization has generally no word count requirements while many social network platforms have word count limitation. Balanced performance is hardly obtained in short text summarization by traditional digest technology because of the limitation of word count. In view of this problem, a new automatic short text summarization method was proposed. Firstly, the values of relationship mapping, length mapping, title mapping and position mapping were calculated to respectively form some sets of candidate sentences. Secondly, the candidate sentences sets were mapped to abstract sentences set by multiple mapping strategies according to series of multiple mapping rules, and the recall ratio was increased by putting central sentences into the set of abstract sentences. The experimental results show that multiple mappings can obtain stable performance in short text summarization, the F measures of ROUGE-1 and ROUGE-2 tests are 0.49 and 0.35 respectively, which are better than the average level of NLP&CC2015 evaluation, proving the effectiveness of the method.

Face recognition based on deep neural network and weighted fusion of face features

SUN Jinguang, MENG Fanyu

2016, 36(2): 437-443. DOI: 10.11772/j.issn.1001-9081.2016.02.0437

Asbtract ( )

PDF (1056KB) ( )

References | Related Articles | Metrics

It is difficult to extract suitable face feature for classification, and the face recognition accuracy is low under unconstrained condition. To solve the above problems, a new method based on deep neural network and weighted fusion of face features, namely DLWF, was proposed. First, facial feature points were located by using Active Shape Model (ASM), then different organs of face were sampled according to those facial feature points. The corresponding Deep Belief Network (DBN) was trained by the regional samples to get optimal network parameters. Finally, the similarity vector of different organs was obtained by using Softmax regression. The weighted fusion of multiple regions in the similarity vector method was used for face recognition. The recognition accuracy got to 97% and 88.76% respectively on the ORL and LFW face database; compared with the traditional recognition algorithm including Principal Components Analysis (PCA), Support Vector Machine (SVM), DBN, and Face Identity-Preserving (FIP) + Linear Discriminant Analysis (LDA), no matter under the constrained condition or the unconstrained condition, recognition rates were both improved. The experimental results show that the proposed algorithm has high efficiency in face recognition.

Vehicle logo recognition using convolutional neural network combined with multiple layer feature

ZHANG Li, ZHANG Dongming, ZHENG Hong

2016, 36(2): 444-448. DOI: 10.11772/j.issn.1001-9081.2016.02.0444

Asbtract ( )

PDF (800KB) ( )

References | Related Articles | Metrics

Concerning the inaccurate vehicle information captured by the license plate of the existing intelligent traffic system, a vehicle logo recognition method based on the Convolutional Neural Network (CNN) combined with different layer features, namely Multi-CNN, was proposed. Firstly, the different layer features were obtained using CNN. Secondly, the extracted features were joined together and regarded as the input of the fully connected layer to get classifiers. The theoretical analysis and simulation results show that, compared with the traditional method, Multi-CNN method can reduce the training time and increase the recognition accuracy to 98.7%.

Enterprise abbreviation prediction based on constitution pattern and conditional random field

SUN Liping, GUO Yi, TANG Wenwu, XU Yongbin

2016, 36(2): 449-454. DOI: 10.11772/j.issn.1001-9081.2016.02.0449

Asbtract ( )

PDF (990KB) ( )

References | Related Articles | Metrics

With the continuous development of enterprise marketing, the enterprise abbreviation has been widely used. Nevertheless, as one of the main sources of unknown words, the enterprise abbreviation can not be effectively identified. A methodology on predicting enterprise abbreviation based on constitution pattern and Conditional Random Field (CRF) was proposed. First, the constitution patterns of enterprise name and abbreviation were summarized from the perspective of linguistics, and the Bi-gram algorithm was improved by a combination of lexicon and rules, namely CBi-gram. CBi-gram algorithm was used to realize the automatic segmentation of the enterprise name and improve the recognition accuracy of the company's core word. Then the enterprise type was subdivided by CBi-gram, and the abbreviation rule sets were collected by artificial summary and self-learning method to reduce noise caused by unsuitable rules. Besides, in order to make up the limitations of artificial building rules on abbreviations and mixed abbreviation, the CRF was introduced to generate enterprise abbreviation statistically, and word, tone and word position were used as characteristics to train model as supplementary. The experimental results show that the method exhibites a good performance and the output can fundamentally cover the usual range of enterprise abbreviations.

Personal title and career attributes extraction based on distant supervision and pattern matching

YU Dong, LIU Chunhua, TIAN Yue

2016, 36(2): 455-459. DOI: 10.11772/j.issn.1001-9081.2016.02.0455

Asbtract ( )

PDF (1000KB) ( )

References | Related Articles | Metrics

Focusing on the issue of extracting title and career attributes from unstructured text for specific person, an distant supervision and pattern matching based method was proposed. Features of personal attributes were described from two aspects of string pattern and dependency pattern. Title and career attributes were extracted by two stages. At first, both distant supervision and human annotated knowledge were used to build high coverage pattern base to discover and extract a candidate attribute set. Then the literal connections among multiple attributes and dependency relations between the specific person and candidate attributes were used to design a filtering rule set. Test on CLP-2014 PAE share task shows that the F-score of the proposed method reaches 55.37%, which is significantly higher than the best result of the evaluation (F-measure 34.38%), and it also outperforms the method based on supervised Conditional Random Field (CRF) sequence tagging method with F-measure of 43.79%. The experimental results show that by carrying out a filter process, the proposed method can mine and extract title and career attributes from unstructured document with a high coverage rate.

Micro-blog clustering and topic word extraction based on hashtag and forwarding relationship

SHU Jue, CHENG Weiqing, DENG Cong

2016, 36(2): 460-464. DOI: 10.11772/j.issn.1001-9081.2016.02.0460

Asbtract ( )

PDF (813KB) ( )

References | Related Articles | Metrics

Concerning the low accuracy of micro-blog clustering, on the basis of research on the micro-blog data, micro-blog hashtag was used to enhance vector space model, and micro-blog forwarding relationship was used to improve the accuracy of clustering. With the information such as forwarding number, comment number of a micro-blog and information of the user who posted the blog, topic keywords of the clusters were extracted. Clustering results on the experiments of Sina micro-blog dataset show that, compared with k-means algorithm and ICST-WSNB (a short Chinese text incremental clustering algorithm based on weighted semantics and Naive Bayes), the accuracy of the proposed clustering method based on topic labels and forwarding relationship increases by 18.5% and 6.63% respectively; the recall and F-value are also improved. The experimental results show that the proposed clustering algorithm based on micro-blog topic label and forwarding relationship can effectively improve the accuracy of micro-blog clustering, and then get more appropriate topic words.

Patent knowledge extraction method for innovation design

MA Jianhong, ZHANG Mingyue, ZHAO Yanan

2016, 36(2): 465-471. DOI: 10.11772/j.issn.1001-9081.2016.02.0465

Asbtract ( )

PDF (1005KB) ( )

References | Related Articles | Metrics

Patent contains lots of information about background, technology, function and so on, which plays an important role in innovation field. Patent is something created by innovation knowledge, at the same time, it promotes us to make more use of innovation knowledge and break the inherent thinking and the limitation of knowledge, which inspires designers in the process of product design. From the term of innovation design, a new method for extracting innovation knowledge was proposed based on combination feature and maximum entropy classifier. The natural language processing was used, patent terms recognition algorithm was given, and word feature and syntactic feature of the closed package tree in the shortest path were jointed to compute the middle result. After that, the maximum entropy algorithm was applied to extract innovation knowledge based on semantic analysis and mark the attributes of knowledge. The results show that the combination feature can effectively deal with patent issues which need to be solved, and the relationships among the semantic role of knowledge innovation about target function, function principle and position feature in the technical scheme.

Application of trust model in evaluation of haze perception source

CHEN Zhenguo, TIAN Liqin

2016, 36(2): 472-477. DOI: 10.11772/j.issn.1001-9081.2016.02.0472

Asbtract ( )

PDF (868KB) ( )

References | Related Articles | Metrics

As the source of the haze data, the reliability of the haze monitoring sites is very important to the reliability of the big data. Due to the lack of effective evaluation method for the haze monitoring points, the monitoring data is not reliable enough. In order to solve the problem that the perceived data was not reliable, a kind of perceptual source trust evaluation and selection model was proposed based on the data trigger detection method. When the perceived data arrived, the K-Means clustering algorithm and the statistical results were firstly used to calculate the benchmark data, then the trust degree of data was calculated by using the current perceived data, the benchmark data and the threshold values. Secondly, according to the location of the perceptual source, neighbor relationship was determined. The current perceived data and the data of the neighbors were compared, according to the absolute value of the difference and the value of the threshold, the neighbor recommendation trust degree was calculated. Finally, the comprehensive trust degree was calculated by using the truest degree of perceived data, the historical trust degree and the recommendation trust degree of the neighbor. The initial value of the historical trust was set as the number of monitoring items, and then updated by the comprehensive trust. Theoretical analysis and simulation results prove that the proposed method can effectively evaluate the perceived source, avoid the abnormal data, and reduce post processing overhead.

Autonomous mobile strategy of carrier in wireless sensor network

TANG Haijian, BAO Yu, MIN Xuan, LUO Yuxuan, ZOU Yuchi

2016, 36(2): 478-482. DOI: 10.11772/j.issn.1001-9081.2016.02.0478

Asbtract ( )

PDF (806KB) ( )

References | Related Articles | Metrics

Concerning the limitations of person safety and difficulties in nodes repair, placement, search and rescue caused by complex or unreachable special areas where the Wireless Sensor Network (WSN) deployed in, an autonomous mobile strategy of carrier in WSN was proposed. Firstly, the localization of the carrier with fewer anchor nodes was realized by combining the maximum likelihood method and Received Signal Strength Indication (RSSI). Then, relying on the mathematical model, carrier moved autonomously in WSN by acquiring current position information and target node coordinates to amend the direction angle and select the next target node. The simulation results show that the proposed strategy can ensure the carrier to reach the destination along the shorter path and in less time, and the higher the density of sensor nodes is, the more likely this strategy will succeed. The WSN with 130, 180 and 300 nodes were simulated respectively, and the success rate was as high as 96.7%.

Cooperative behavior based on evolutionary game in delay tolerant networks

XU Xiaoqiong, ZHOU Zhaorong, MA Xiaoxia, YANG Liu

2016, 36(2): 483-487. DOI: 10.11772/j.issn.1001-9081.2016.02.0483

Asbtract ( )

PDF (883KB) ( )

References | Related Articles | Metrics

Due to the limited resources, nodes in Delay Tolerant Network (DTN) behave selfishly, i.e. nodes refuse to help forward message for others. In order to improve the cooperative behavior of nodes, and enhance the overall network performance, a new incentive mechanism of node behavior based on Evolutionary Game Theory (EGT) was proposed. In the proposed mechanism, the prisoner's dilemma model was employed to establish payoff matrix between the node and its neighbors. Then, based on the degree centricity, social authority of the node was defined. Further, when designing the strategy update rule, the influence of social authority was considered. That is, nodes with higher social authority were selected from the current neighborhood to imitate and learn. Finally, on the basis of real dynamic network topology, the simulation experiments were conducted by the Opportunistic Network Environment (ONE) simulator. The simulation results show that, compared with the Fermi update rule which chooses neighbors randomly, the strategy update rule which considers the social authority can promote the cooperative behavior, accordingly, improve the overall performance of the network.

Wi-Fi fingerprinting clustering for indoor place of interest positioning

WANG Yufan, AI Haojun, TU Weiping

2016, 36(2): 488-491. DOI: 10.11772/j.issn.1001-9081.2016.02.0488

Asbtract ( )

PDF (606KB) ( )

References | Related Articles | Metrics

Wi-Fi fingerprint acquisition and modeling is a time-consuming work, while crowdsourcing is an effective way to solve this problem. The feasibility of unsupervised clustering was demonstrated for Place of Interest (POI) positioning, which is benefit to generate radio map by crowded source. At first, a framework of Wi-Fi fingerprint localization algorithm was given, then the k-means, affinity propagation and adaptive propagation were applied to this framework. Using BP neural network as a supervised learning reference, an evaluation was executed in a laboratory to analyze the relationship between indoor POI partition and spatial division, and the Radio Signal Strength Indications (RSSI) were collected in POI. Compared the clustering results in the POI spatial space, the recall and the precision of the three clustering algorithms were both over 90%. The experimental results show that the unsupervised clustering method is an effective solution for coarse-grained POI indoor positioning application.

Fourier spectrum analysis for non-uniform sampled signals

FANG Jianchao, MAO Xuesong

2016, 36(2): 492-494. DOI: 10.11772/j.issn.1001-9081.2016.02.0492

Asbtract ( )

PDF (629KB) ( )

References | Related Articles | Metrics

For dealing with the problem of being unable to sample beat signals with equal interval that is inherent in the Pseudo random Noise (PN) code modulated Doppler laser radar, a new Discrete Fourier Transform (DFT) method which applies to non-uniform sampling data was proposed. Firstly, system model of Doppler laser radar for simultaneously measuring range and speed was provided, and the reason of being unable to sample beat signals with equal interval was pointed out. Then, by theoretical deducing, a new spectrum analysis method was proposed for processing non-uniform sampling signals. Finally, simulations were performed to verify the feasibility of applying the proposed method to non-uniform sampling data. As a result, within Doppler frequency range which is created by moving targets in roads, the method can obtain the frequency of non-uniform sampling Doppler signals efficiently even when Signal-to-Noise Ratio (SNR) is low to 0 dB.

Optimization design of preventing fault injection attack on distributed embedded systems

WEN Liang, JIANG Wei, PAN Xiong, ZHOU Keran, DONG Qi, WANG Junlong

2016, 36(2): 495-498. DOI: 10.11772/j.issn.1001-9081.2016.02.0495

Asbtract ( )

PDF (613KB) ( )

References | Related Articles | Metrics

Security-critical distributed systems have faced with malicious snooping and fault injection attack challenges. Traditional researches mainly focus on preventing malicious snooping which disregard fault injection attack threat. Concerning the above problem, the fault detection for message' encryption/decryption was considered, to maximize the fault coverage and minimize the heterogeneous degree of the messages' fault coverage. Firstly, Advanced Encryption Standard (AES) was used to protect confidentiality. Secondly, five fault detection schemes were proposed, and their fault coverage rates and time overheads were derived and measured, respectively. Finally, an efficient heuristic algorithm based on Simulated Annealing (SA) under the real-time constraint was proposed, which can maximize the fault coverage and minimize the heterogeneity. The experimental results show that the objective function value achieved by the proposed algorithm is 18% higher than that of the greedy algorithm at least, verifying the efficiency and robustness of the proposed algorithm.

Network alerts depth information fusion method based on time confrontation

QIU Hui, WANG Kun, YANG Haopu

2016, 36(2): 499-504. DOI: 10.11772/j.issn.1001-9081.2016.02.0499

Asbtract ( )

PDF (932KB) ( )

References | Related Articles | Metrics

Due to using a single point in time for the processing unit, current network alerts information fusion methods cannot adapt to the network attacks with high concealment and long duration. Aiming at this problem, a network alerts depth information fusion method based on time confrontation was proposed. In view of multi-source heterogeneous alerts data flow, firstly, the alerts were collected and saved in a long time window. Then the alerts were clustered using a clustering algorithm based on sliding window. Finally, the alerts were fused by introducing window attenuation factor. The experimental results on real data set show that, compared with Basic-DS and EWDS (Exponential Weight DS), the proposed method has higher True Positive Rate (TPR) and False Positive Rate (FPR) as well as lower Data to Information Rate (DIR) because of longer time window. Actual test and theoretical analysis show that the proposed method is more effective on detecting network attacks, and can satisfy real-time processing with less time delay.

Byzantine fault tolerance schema for service-oriented computing and its correctness proof

CHEN Liu, ZHOU Wei

2016, 36(2): 505-510. DOI: 10.11772/j.issn.1001-9081.2016.02.0505

Asbtract ( )

PDF (1007KB) ( )

References | Related Articles | Metrics

A Byzantine Fault Tolerance (BFT) schema was proposed to solve the problem that most Byzantine fault tolerance protocols were not suitable for service-oriented computing and other emerging computing models because of the assumption that the services were passive and independent. Service replicas were created on both sides of service requester and service provider. State machine replication algorithm was used to reach agreement on the ID and the content of the request after three rounds of communications among service replicas. After receiving a request, replicas submitted the request to upper application logic. After receiving the reply, replicas on service requester reached agreement on the ID and the content of the reply after three rounds of communications among services replicas and then accepted the reply. To deal with the problem of only having simple correctness reasoning and lacking of formal verification, an I/O automaton was used to model the protocol and simulation relation method was used as a tool to prove the correctness of the protocol more formally and rigorously. A highly abstract simple I/O automata S was constructed, which meeted safety and liveness. The parties of the protocol were broken down into several simple member I/O automata including front-end automaton, back-end automaton and multicast channel automaton. It is proved that the system composed of member I/O automata realizes the automata S. I/O automaton can accurately describe the protocol, which makes the correctness proof more standard than inductive reasoning.

Windows clipboard operations monitoring based on virtual machine monitor

ZHOU Dengyuan, LI Qingbao, ZHANG Lei, KONG Weiliang

2016, 36(2): 511-515. DOI: 10.11772/j.issn.1001-9081.2016.02.0511

Asbtract ( )

PDF (803KB) ( )

References | Related Articles | Metrics

The existing methods for monitoring clipboard operations cannot defend kernel-level attacks and satisfy the practical needs due to the simple protection strategy. In order to mitigate these disadvantages, a clipboard operations monitoring technique for document contents based on Virtual Machine Monitor (VMM) was proposed, as well as a classification protection strategy for electronic documents based on clipboard operations monitoring. Firstly, system calls were intercepted and identified in VMM by modifying the shadow registers. Secondly, a mapping table between process identifier and document path was created by monitoring the document open operations, then the document path could be obtained by process identifier when the clipboard operations were intercepted. Finally, clipboard operations were filtered according to classification protection strategy. The experimental results show that the performance loss to Guest OS file system caused by the monitoring system decreases with the increase of the record size; when the record size reaches more than 64 KB, the performance loss is within 10%, which has little effect on the user.

Computational complexity optimization for mobile audio bandwidth extension

HANG Bo, WANG Yi, KANG Changqing

2016, 36(2): 516-520. DOI: 10.11772/j.issn.1001-9081.2016.02.0516

Asbtract ( )

PDF (761KB) ( )

References | Related Articles | Metrics

Mobile devices are mostly computational complexity sensitive because of the limited computing resource. The BandWidth Extension (BWE) algorithm in audio codec standard of China for mobile communication named AVS P10 was proposed to improve the mobile audio quality, but the computational complexity of the algorithm is too high to implement in mobile devices. The original BWE algorithm processes was analyzed, and the main reason of high computational complexity was identified to be the frequently usage of time-frequency transformation. Based on the analysis, a computational complexity optimization scheme was proposed, which include algorithm optimization and code optimization. The complexity of the algorithm was reduced by reducing the call number of Fast Fourier Transform (FFT). And the time consumption of the algorithm was reduced by some methods, such as sacrificing memory space for speed.The experimental results show that computation time consumption ratio of BWE module in encoder and decoder are decreased by 4.5 and 14.3 percentage points respectively, without reducing the overall audio codec subjective quality; the computational complexity of the algorithm is significantly reduced, which is beneficial to the application of the coding and decoding algorithm in the field of mobile audio.

Super-resolution image reconstruction algorithm based on image patche iteration and sparse representation

YANG Cunqiang, HAN Xiaojun, ZHANG Nan

2016, 36(2): 521-525. DOI: 10.11772/j.issn.1001-9081.2016.02.0521

Asbtract ( )

PDF (830KB) ( )

References | Related Articles | Metrics

Concerning the slow reconstruction and the difference among the contents of the image to be reconstructed, an improved super-resolution image reconstruction algorithm based on image patche iteration and sparse representation was proposed. In the proposed method, image patches were firstly divided into three different forms by threshold features, then the three forms were treated separately: during the reconstruction process, Bicubic Interpolation (BI) approach was used for image patches of 4N×4N; image patches of 2N×2N achieved corresponding high and low resolution dictionary pairs by K-Singular Value Decomposition (K-SVD) algorithm, and then to finish reconstruction using Orthogonal Matching Pursuit (OMP) algorithm; image patches of N×N were divided into smoothing layer and texture layer by Morphological Component Analysis (MCA) algorithm, then to finish reconstruction using OMP with corresponding dictionary pairs of each layer. Compared with the methods based on sparse representation group, MCA, and two-stage multi-frequency-band dictionaries, the proposed algorithm has a significant improvement in subjective visual effect, evaluation index and reconstruction speed. The experimental results show that the proposed algorithm can obtain more details in edge patches and irregular structure regions with better reconstruction effect.

Finger-vein image segmentation based on level set

WANG Baosheng, CHEN Yufei, ZHAO Weidong, ZHOU Qiangqiang

2016, 36(2): 526-530. DOI: 10.11772/j.issn.1001-9081.2016.02.0526

Asbtract ( )

PDF (752KB) ( )

References | Related Articles | Metrics

To deal with weak edge, intensity inhomogeneity and low contrast that may appear in finger-vein images, a new segmentation algorithm based on even-symmetric Gabor filter and level set method was proposed. Firstly, the even-symmetric Gabor filter was used to filter the finger-vein image through 8 different orientations; secondly, finger-vein image based on the 8 filtered results was reconstructed to obtain the high quality image with significantly improved gray contrast between target and background; finally, the level set algorithm combining local features and global features was applied to segment finger-vein image. Compared with the level set algorithm proposed by Li, et al. (LI C, HUANG R, DING Z, et al. A variational level set approach to segmentation and bias correction of images with intensity inhomogeneity. MICCAI'08: Proceedings of the 11th International Conference on Medical Image Computing and Computer-Assisted Intervention, Part II. Berlin: Springer, 2008: 1083-1091), and Legendre Level Set (L2S) algorithm, the percentage of Area Difference (AD) of the proposed algorithm decreased by 1.116% and 0.370% respectively, and the Relative Difference Degree (RDD) reduced by 1.661% and 1.379% respectively. The experimental results show that the proposed algorithm can achieve better results compared with traditional level set image segmentation algorithms that only consider local information or global information.

Image retrieval algorithm based on convolutional neural network and manifold ranking

LIU Bing, ZHANG Hong

2016, 36(2): 531-534. DOI: 10.11772/j.issn.1001-9081.2016.02.0531

Asbtract ( )

PDF (802KB) ( )

References | Related Articles | Metrics

In Content-Based Image Retrieval (CBIR), the low-level visual features are not consistent with the high-level semantic features captured by human, and it is difficult to reflect the similarity of images by traditional distance measurements. To solve these problems, an image retrieval algorithm based on Convolutional Neural Network (CNN) and manifold ranking was proposed. Firstly, the image dataset was put into CNN, image features were extracted through the fully connected layers of the network after supervised learning; secondly, the image features were normalized and then Efficient Manifold Ranking (EMR) algorithm was used to return the ranked scores for query images; finally, the most similar images were returned to users according to the scores. In corel dataset, the mean Average Precision (mAP) of deep image feature was 53.74% higher than that of the scene descriptor features, and the mAP of efficient manifold ranking was 18.34% higher than that of the cosine distance. The experimental results show that the proposed algorithm can effectively improve the accuracy of image retrieval.

Mesh simplification algorithm combined with edge collapse and local optimization

LIU Jun, FAN Hao, SUN Yu, LU Xiangyan, LIU Yan

2016, 36(2): 535-540. DOI: 10.11772/j.issn.1001-9081.2016.02.0535

Asbtract ( )

PDF (927KB) ( )

References | Related Articles | Metrics

Aiming at the problems that the detail features of the mesh models are lost and the qualities of meshes are bad when the three-dimensional models are simplified to a lower resolution with the mesh simplification algorithms, a high quality mesh simplification algorithm was proposed based on feature preserving. By introducing the concept of approximate curvature of the vertex and combining it with the error matrix of the edge collapse in the algorithm, the detail features of the simplified model were reserved to a great extent. At the same time, by analyzing the quality of simplified triangular mesh, optimizing triangular mesh locally, reducing the amount of narrow triangles, the quality of simplified model was improved. The proposed algorithm was tested on Apple model and Horse model, and compared with two algorithms, one of them is a classical mesh simplification algorithm based on edge collapse, the other is an improved algorithm of the classical one. The experimental results show that when the models are simplified to a lower resolution, the triangular meshes of two contrast algorithms are too evenly distributed, and the local details are not clear, while the triangular meshes of the proposed algorithm are intensive in the areas with large curvature but sparse in the flat areas, and the local details are legible. Compared with the two contrast algorithms, the geometric errors of the simplified model in the proposed algorithm are of the same magnitude; the average qualities of the simplified meshes in the proposed algorithm are much higher than those of two contrast algorithms. The results verify that not only the proposed algorithm can efficiently maintain the detail features of the original model, but also the simplified model has high quality and looks better.

Kinect depth map preprocessing based on uncertainty evaluation

YU Yaling, ZHANG Hua, LIU Guihua, SHI Jinfang

2016, 36(2): 541-545. DOI: 10.11772/j.issn.1001-9081.2016.02.0541

Asbtract ( )

PDF (936KB) ( )

References | Related Articles | Metrics

A new Kinect depth map pretreatment algorithm was presented for the lower accuracy problem compared with the original depth information in the field of three-Dimensional (3D) scene measurement for robot's perception. Firstly, a measuring and sampling model of the depth map was developed to realize the Monte Carlo uncertainty evaluation model. Secondly, the depth value intervals were calculated to judge and filter the noise pixels. Finally, noise points were repaired with mean-value of the estimation intervals. The experimental results show that the algorithm can effectively suppress and repair the noise pixels while keeping the depth gradient and values of non-noise pixels. The Mean Square Error (MSE) of depth map after preprocessing is reduced by 15.25% to 28.79%, and the object profiles remain unchanged compared with the JBF (Joint Bilateral Filtering) based on color and depth map. Therefore, it achieves the purpose of improving the depth information accuracy in 3D scenes.

Mesh slicer: novel algorithm for 3D mesh compression

HE Chen, WANG Lei, WANG Chunmeng

2016, 36(2): 546-550. DOI: 10.11772/j.issn.1001-9081.2016.02.0546

Asbtract ( )

PDF (818KB) ( )

References | Related Articles | Metrics

To solve the storage and network transmission problem of the three-Dimensional (3D) mesh model, a new 3D model compression algorithm was proposed. Based on the slicing for 3D mesh, the proposed algorithm was composed of the following three steps: slice vertex calculation, slice boundary sampling and encoding for the image obtained by slicing. For a given 3D mesh model, the bounding box of the model was firstly calculated; then the model was sliced along the longest direction of the bounding box. In the procedure of slicing, the intersection point of the slice with the edge of the mesh was calculated, and as a result, all the intersection points in the same slice constituted a polygon. Then the boundary of the polygon was uniformly resampled so that each layer of the slice had the same number of vertices. After resampling of the polygon boundary, the coordinates of vertices in each slice were converted into the polar form. In this way, all ρ-coordinates and θ-coordinates of the vertices in each slice could constitute one image respectively, and the original 3D model could be represented by these two images. The new representation method has two obvious advantages: first, the dimension of the data is reduced, thus the amount of the data is effectively reduced; second, the data in these two images have great data correlation, and as a result, the entropy of the data is further reduced. Based on these two advantages, the proposed algorithm compressed these two images by difference coding technique and arithmetic coding technique, and then the compressed files were obtained. Compared with Incremental Parametric Refinement (IPR) method, the coding efficiency of the proposed algorithm was increased by 23% under the same quality of the decoded model. The experimental results show that the proposed algorithm can obtain good compression efficiency, and effectively reduce the data amount in the application of 3D model storage and transmission.

Image denoising algorithm based on sparse representation and nonlocal similarity

ZHAO Jingkun, ZHOU Yingyue, LIN Maosong

2016, 36(2): 551-555. DOI: 10.11772/j.issn.1001-9081.2016.02.0551

Asbtract ( )

PDF (1050KB) ( )

References | Related Articles | Metrics

For the problem of denoising images corrupted by mixed noise such as Additive White Gaussian Noise (AWGN) with Salt-and-Pepper Impulse Noise (SPIN) and Random-Valued Impulse Noise (RVIN), an improved image restoration algorithm on the basis of the existing weighted encoding method was proposed. The image priors about sparse representation and non-local similarity were integrated. Firstly, the sparse representation based on the dictionary was used to build a variational denoising model and a weighting factor was designed for data fidelity term to suppress impulse noise. Secondly, the method of non-local means was used to get an initialized denoised image and then a mask matrix was built to remove impulse noise points to get the good non-local similarity prior knowledge. Finally, the image sparsity prior and non-local similarity prior were integrated into the regularization of the variational model. The final denoised image was obtained by solving the variational model. The experimental results show that in different noise ratios, the Peak Signal-to-Noise Ratio (PSNR) of the proposed algorithm increased 1.7 dB than that of fuzzy weighted non-local means filter, and the Feature Similarity Index (FSIM) increased 0.06. Compared with weighted encoding method, the PSNR increased 0.64 dB, and the FSIM increased 0.03. The proposed method has better recovery performance especially for the texture strong images and can retain real information of the image.

Non-local means denoising algorithm with hybrid similarity weight

HUANG Zhi, FU Xingwu, LIU Wanjun

2016, 36(2): 556-562. DOI: 10.11772/j.issn.1001-9081.2016.02.0556

Asbtract ( )

PDF (1247KB) ( )

References | Related Articles | Metrics

In traditional Non-Local Means (NLM) algorithm, the weighted Euclidean norm can not truly reflect the similarity between two neighborhoods under large noise standard deviation. To address this problem, a new NLM denoising algorithm combined with similarity weight was proposed. Firstly, the noise image was decomposed by using the advantages of stationary wavelet transform, and the filtering function was used to predenoise each detailed subband data. Secondly, according to the refined image, the similarity reference factor between the patches was calculated, and it was used to replace Gauss kernel function of the traditional NLM algorithm. Finally, to make the similarity weights more in line with the characteristics of Human Visual System (HVS), the block Singular Value Decomposition (SVD) method based on image structure perception was used to define neighborhood similarity measure, which can more accurately reflect the similarity between neighborhoods compared with the traditional NLM. The experimental results demonstrate that the hybrid similarity weighted NLM algorithm performs better than the traditional NLM in retaining the texture details and edge information, and the Structural SIMilarity (SSIM) index measurement values is also improved in comparison with the traditional NLM algorithm. When the noise standard deviation is large enough, the proposed approach is of effectiveness and robustness.

Keyword extraction method for microblog based on hashtag

YE Jingjing, LI Lin, ZHONG Luo

2016, 36(2): 563-567. DOI: 10.11772/j.issn.1001-9081.2016.02.0563

Asbtract ( )

PDF (915KB) ( )

References | Related Articles | Metrics

A hashtag based method was proposed to solve the problem how to accurately extract keywords from microblog. Hashtag, the social feature of a microblog was used to extract keywords from microblog content. A word-post weighted graph was built firstly, then a random walker was used on the graph by jumping to any hashtag node repeatedly. At last, every word rank was determined by its probability which would not change after walker iteration. The experiments were conducted on real microblogs from Sina platform. The results show that, compared to word-word graph method, the proposed hashtag-based approach gets higher accuracy of keyword extraction by 50%.

Three-dimensional spatio-temporal feature extraction method for action recognition

XU Haining, CHEN Enqing, LIANG Chengwu

2016, 36(2): 568-573. DOI: 10.11772/j.issn.1001-9081.2016.02.0568

Asbtract ( )

PDF (1005KB) ( )

References | Related Articles | Metrics

Concerning the high costs of traditional action recognition algorithm in color video and poor recognition performance caused by insufficient two-dimensional information, a new human action recognition method based on three-dimensional depth image sequence was put forward. On the temporal dimension, Temporal Depth Model (TDM) was proposed to describe the action. Specially, the entire depth maps were divided into several sub-actions under three orthogonal Cartesian planes. The absolute difference between two consecutive projected maps was accumulated to form a depth motion map to describe the dynamic feature of an action. On the spatial-dimension, Spatial Pyramid Histogram of Oriented Gradient (SPHOG) was computed from the TDM for the representation of an action to obtain the final descriptor. Support Vector Machine (SVM) was used to classify the proposed descriptors at last. The proposed method was tested on two authoritative datasets including MSR Action3D dataset and MSRGesture3D dataset, the recognition rates were 94.90% (cross subject test) and 94.86% respectively. The experimental results demonstrate that the proposed method has fast speed and better recognition, also it meets the real-time requirement in the depth video sequence system basically.

Latent Dirichlet allocation model integrated with texture structure for railway fastener detection

LUO Jianqiao, LIU Jiajia, LI Bailin, DI Shilei

2016, 36(2): 574-579. DOI: 10.11772/j.issn.1001-9081.2016.02.0574

Asbtract ( )

PDF (891KB) ( )

References | Related Articles | Metrics

Focusing on the ignorance of the image structure in Latent Dirichlet Allocation (LDA) model, a LDA fastener detection model integrated with image texture information, namely TS_LDA, was proposed. Firstly, a single-channel Local Binary Pattern (LBP) method was designed to acquire the image texture structure, and the texture information was treated as a label of visual word. The joint distribution of words and labels reflected the characteristics of an image structure. Secondly, those labels were embedded into LDA, and image topics were derived from words and labels. The improved distribution of topics considered the image structure. Finally, the classifier was trained and fastener states were identified on the basis of topic distribution. Compared with the LDA method, the differences between normal and disabled fasteners increased by 5%-35%, the average misdetection rate decreased by 1.8%-2.4% in the topic space of TS_LDA. The experimental results show that TL_LDA is able to enhance the accuracy of fastener image modeling, thus inspects fastener states more precisely.

Very low resolution face recognition via super-resolution based on extreme learning machine

LU Tao, YANG Wei, WAN Yongjing

2016, 36(2): 580-585. DOI: 10.11772/j.issn.1001-9081.2016.02.0580

Asbtract ( )

PDF (995KB) ( )

References | Related Articles | Metrics

The very low-resolution image itself contains less discriminant information and is prone to be interfered by noise, which reduces the recognition rate of the existing face recognition algorithm. In order to solve this problem, a very low resolution face recognition algorithm via Super-Resolution (SR) based on Extreme Learning Machine (ELM) was proposed. Firstly, the sparse expression dictionary of Low-Resolution (LR) and High-Resolution (HR) images were learned from sample base, and the HR image could be reconstructed due to the manifold consistency of LR and HR expression coefficients. Secondly, the ELM model was built on the HR reconstructed images, the connection weight of feedforward neural networks was obtained by training. Lastly, the ELM was used to predict the category attribute of the very low-resolution image. Compared with traditional face recognition algorithm based on Collaborative Representation Classification (CRC), the experimental results show that the recognition rate of the proposed algorithm increases by 2% upon the reconstructed HR images. At the same time, it greatly shortens the recognition time. The simulation results show that the proposed algorithm can effectively solve face recognition problem caused by limited discriminant information in very low-resolution image and it has better recognition ability.

Fuzzy Chinese character recognition of license plate based on histogram of oriented gradients and Gaussian pyramid

LIU Jun, BAI Xue

2016, 36(2): 586-590. DOI: 10.11772/j.issn.1001-9081.2016.02.0586

Asbtract ( )

PDF (832KB) ( )

References | Related Articles | Metrics

Concerning the low recognition rate of fuzzy license plate in the existing license plate recognition method, a new license plate recognition algorithm combined with Gaussian pyramid and Histogram of Oriented Gradients (HOG) was proposed. Firstly, by utilizing the multi-scale expression of Gaussian pyramid, a two-layer Gaussian pyramid model was established for fuzzy Chinese character in license plate. Details about the fuzzy characters were described in the first layer. The second layer was obtained by taking smooth processing and down sampling on the first layer, and the main feature was highlighted by describing details of the fuzzy characters. By extracting HOG from two-layer Gaussian pyramid, the characteristic dimension of image was expanded and the ability of recognizing fuzzy Chinese characters was enhanced. Finally, fuzzy Chinese character in license plate was recognized by the Back Propagation (BP) neural network classifier. The simulation result shows that the recognition rate of the proposed method is higher than that of HOG feature method and K-L (Karhunen-Loeve) transform method in the same sample space, it means that the proposed method can improve the effective recognition rate of fuzzy Chinese characters in video surveillance.

Table of Content