Table of Content

    10 April 2017, Volume 37 Issue 4
    Advances on virtualization technology of cloud computing
    WU Zhixue
    2017, 37(4):  915-923.  DOI: 10.11772/j.issn.1001-9081.2017.04.0915
    Asbtract ( )   PDF (1633KB) ( )  
    References | Related Articles | Metrics
    Cloud computing is a new computing model focused on the capability of data and its processing. It integrates a number of information and communication technologies, including virtualization, distributed data storage, distributed parallel programming model, big data management and distributed resource management. After more than a decade of development, cloud computing has entered a rapid growth period, more and more enterprises have adapted to cloud computing services. At the same time, the key technologies of cloud computing have advanced as well. The new generation technologies are enhancing and even replacing the existing technologies. Container is a new type virtualization technology. With its lightweight, elastic and fast advantages, container challenges the traditional virtual machine technology, and brings changes to the architecture and implementation of both the Infrastructure as a Service (IaaS) and Platform as a Service (PaaS). Container virtualization technology was described in detail, the advantages and disadvantages, the suitable use cases of container and virtual machine technology were compared and analyzed, then the future research directions and development trends of cloud computing virtualization technology were prospected.
    Cascaded and low-consuming online method for large-scale Web page category acquisition
    WANG Yaqiang, TANG Ming, ZENG Qin, TANG Dan, SHU Hongping
    2017, 37(4):  924-927.  DOI: 10.11772/j.issn.1001-9081.2017.04.0924
    Asbtract ( )   PDF (847KB) ( )  
    References | Related Articles | Metrics
    To balance the contradiction between accuracy and resource cost during constructing an automatic system for collecting massive well-classified Web pages, a cascaded and low-consuming online method for large-scale Web page category acquisition was proposed, which utilizes a cascaded strategy to integrate online and offline Web page classifiers so as to take full of use of their advantages. An online Web page classifier trained by features in the anchor text was used as the first-level classifier, and then the confidence of the classification results was computed by the information entropy of the posterior probability. The second-level classifier was triggered when the confidence is larger than the predefined threshold obtained by Multi-Objective Particle Swarm Optimization (MOPSO). The features were extracted from the downloaded Web pages by the secondary classifier, then they were classified by an offline classifier pre-trained by Web pages. In the comparison experiments with single online classification and single offline classification, the proposed method dramatically increased the F1 measure of classification by 10.85% and 4.57% respectively. Moreover, compared with the single online classification, the efficiency of the proposed method did not decrease a lot (less than 30%), while the efficiency was improved about 70% compared with single offline classification. The results demonstrate that the proposed method not only has a more powerful classification ability, but also significantly reduces the computing overhead and bandwidth consumption.
    Real-time detailed classification energy consumption measurement system based on Spark Streaming
    WU Zhixue
    2017, 37(4):  928-935.  DOI: 10.11772/j.issn.1001-9081.2017.04.0928
    Asbtract ( )   PDF (1408KB) ( )  
    References | Related Articles | Metrics
    Detailed classification energy consumption measurement can discover energy consuming issues more accurately, timely and effectively, which can form and implement the most effective energy-saving measures. Detailed classification energy measurement system needs to calculate energy consumption amounts at multiple time scales according to detailed classification coding. Not only does it need to complete the tasks timely, but also need to deal with data aggregating, data de-duplication and data joining operations. Due to the fast speed of the data being generated, the requirement of the data being processed in real-time, and the big size of the data volume, it is difficult to store the data to a database system first, and then to process the data afterwards. Therefore, the traditional data processing infrastructure cannot fulfil the requirements of detailed classification energy consumption measurement system. A new real-time detailed classification energy consumption measurement system based on Spark Streaming technologies was designed and implemented, the system infrastructure and the internal structure of the system were introduced in detail, and its real-time data processing capabilities were proved through experiments. Different from the traditional ways, the proposed system processes energy consumption data in real-time to capture any unusual behaviour timely; at the same time, it separates the data and calculates the consumption usages according to the detailed classification coding, and stores the results to a database system for offline analysis and data mining, which can effectively solve the previously mentioned problems encountered in the data processing process.
    Array erasure codes based on coding chains with multiple slopes
    TANG Dan, YANG Haopeng, WANG Fuchao
    2017, 37(4):  936-940.  DOI: 10.11772/j.issn.1001-9081.2017.04.0936
    Asbtract ( )   PDF (854KB) ( )  
    References | Related Articles | Metrics
    In view of the problem that the fault tolerance capability is low and strong constraint conditions need to be satisfied in the construction of most array erasure codes at present, a new type of array erasure codes based on coding chains was proposed. In the new array erasure codes, coding chains with different slopes were used to organize the relationship among data elements and check elements, so as to achieve infinite fault tolerance capability in theory; the strong constraint conditions like the prime number limitation was avoided in construction, which is easy to be practical and extensible. Simulation results show that, compared with Reed-Solomon codes (RS codes), the efficiency of the proposed array erasure codes based on coding chains is more than 2 orders of magnitude; under the condition of fixed fault tolerance, its storage efficiency can be improved with the increase of the strip size. In addition, the update penalty and repair cost of the array codes is a fixed constant, which will not increase with the expansion of the storage system scale or the increase of fault tolerance capability.
    Three dimensional strom tracking method based on distributed computing architecture
    ZENG Qin, LI Yongsheng
    2017, 37(4):  941-944.  DOI: 10.11772/j.issn.1001-9081.2017.04.0941
    Asbtract ( )   PDF (706KB) ( )  
    References | Related Articles | Metrics
    In recent years, meteorological data increases dramatically, and the amount of data has been TB-per-hour-level. The traditional relational database and file storage system have troubles in the massive data storage and management, thus large-scale and heterogeneous meteorological data cannot also be used effectively in meteorological business. Furthermore, it would be also difficult for scientific researchers to efficiently explore the huge amount of heterogeneous meteorological data. In order to tackle these problems, researchers have developed many types of distributed computing frameworks based on MapReduce and HBase, etc., which provide an effective way to exploit large-scale meteorological data. The distributed computing and storing techniques have been tested separately in applications of meteorology field. However, to our best knowledge, these techniques have not been carefully studied jointly. Therefore, a new 3D storm tracking method based on the combination of MapReduce and Hbase was studied by using a large amount of weather radar data accumulated in recent years. Moreover, based on the original Rest interface, a series of distributed service interfaces were implemented for exploring a variety of point, line and surface data. Compared with the performance of the standard single data storage and access interface based on Rest, the proposed method has better comprehensive performance, and the efficiency is improved about 100%. A practical application for tracking 3D storm in Zhujiang River urban agglomeration from 2007 to 2009 was used to further validate the performance of the proposed method.
    D2D power allocation based on max-min fairness underlying cellular systems
    NI Junhong, SHEN Zhentao, YANG Huifeng
    2017, 37(4):  945-947.  DOI: 10.11772/j.issn.1001-9081.2017.04.0945
    Asbtract ( )   PDF (543KB) ( )  
    References | Related Articles | Metrics
    Concerning the fairness problem of multiple Device-to-Device (D2D) users reusing the spectrum resources allocated to cellular subscribers, a power allocation algorithm based on max-min fairness was proposed under the premise of guaranteeing the rate of cellular users. First, the nonconvex optimization problem was transformed into a Difference between Convex functions (DC) programming problem, then the global optimization algorithm of convex approximation and the bisection algorithm were used to achieve power optimization of D2D. Simulation results show that compared with the global optimization algorithm which only uses convex approximation, the proposed algorithm has better convergence and maximizes the bottleneck rate of D2D users.
    Virtual network function backup method based on resource utility maximization
    ZHOU Qiao, YI Peng, MEN Haosong
    2017, 37(4):  948-953.  DOI: 10.11772/j.issn.1001-9081.2017.04.0948
    Asbtract ( )   PDF (962KB) ( )  
    References | Related Articles | Metrics
    Concerning the network service failure caused by the virtual network function failure of the service function chain in the network function virtualization environment, a resource utility maximization backup method of virtual network function was put forward to improve the network reliability. First, the backup problem of virtual network functions was analyzed in detail and a reliability evaluation model was established; then a corresponding backup mechanism was put forward, and the advantages of the proposed mechanism over other mechanisms were proved. Secondly, a global backup method and a backup selection strategy were designed to select the corresponding virtual network functions until the reliability requirement was satisfied. Finally, simulation and experimental results showed that compared with GREP, JP+random selection and DSP+random selection, the proposed method achieved excellent performance in terms of reliability and resource utilization, especially improved the rate of request acceptance by 18.8%-25% and the resource utilization ratio by 15%-20%. The experimental results demonstrate that the proposed method can effectively utilize the resources to improve the network reliability.
    Automatic protocol format signature construction algorithm based on discrete series protocol message
    LI Yang, LI Qing, ZHANG Xia
    2017, 37(4):  954-959.  DOI: 10.11772/j.issn.1001-9081.2017.04.0954
    Asbtract ( )   PDF (1104KB) ( )  
    References | Related Articles | Metrics
    To deal with the discrete series protocol message without session information, a new Separate Protocol Message based Format Signature Construction (SPMbFSC) algorithm was proposed. First, separate protocol message was clustered, then the keywords of the protocol were extracted by improved frequent pattern mining algorithm. At last, the format signature was acquired by filtering and choosing the keywords. Simulation results show that SPMbFSC is quite accurate and reliable, the recognition rate of SPMbFSC for six protocols (DNS, FTP, HTTP, IMAP, POP3 and IMAP) achieves above 95% when using single message as identification unit, and the recognition rate achieves above 90% when using session as identification unit. SPMbFSC has better performance than Adaptive Application Signature (AdapSig) extraction algorithm under the same experimental conditions. Experimental results indicate that the proposed SPMbFSC does not depend on the integrity of session data, and it is more suitable for processing incomplete discrete seriesprotocol message due to the reception limitation.
    Joint spectrum sensing algorithm for multi-user based on coherent multiple-access channels in cognitive radio
    WANG Sixiu, GUO Wenqiang, WANG Xiaojie
    2017, 37(4):  960-964.  DOI: 10.11772/j.issn.1001-9081.2017.04.0960
    Asbtract ( )   PDF (684KB) ( )  
    References | Related Articles | Metrics
    For joint sensing of multiple Cognitive Users (CUs), considering the case of fading channels between the CU and the decision center, a joint spectrum sensing algorithm based on Multiple-Access Channels (MAC) was proposed. On the basis of the system structure and signal modeling, the asymptotic behavior and outage probability of the traditional MAC algorithm were analyzed. Under the constraint of the average transmit power of the CU, the transmit gain of the MAC algorithm was optimized to maximize the detection probability; and the problem of minimizing the number of CUs was also studied in the case of certain Quality of Service (QoS). Simulation results show that the proposed MAC algorithm can ensure good detection performance; in particular, it achieves exponential performance improvement in detection error probability.
    Broadcast routing algorithm for WSN based on improved discrete fruit fly optimization algorithm
    XU Tongwei, HE Qing, WU Yile, GU Haixia
    2017, 37(4):  965-969.  DOI: 10.11772/j.issn.1001-9081.2017.04.0965
    Asbtract ( )   PDF (765KB) ( )  
    References | Related Articles | Metrics

    In Wireless Sensor Network (WSN), to deal with the energy limitation of nodes and the energy consumption of broadcast routing, a new WSN broadcast routing algorithm based on the improved Discrete Fruit fly Optimization Algorithm (DFOA) was proposed. Firstly, the swap and swap sequence were introduced into the Fruit fly Optimization Algorithm (FOA) to obtain DFOA, which expands the applications field of FOA. Secondly, the step of fruit fly was controlled by the Lévy flight to increase the diversity of the samples, and the position updating strategy of population was also improved by the roulette selection to avoid the local optimum. Finally,the improved DFOA was used to optimize the broadcast routing of WSN to find the broadcast path with minimum energy consumption. The simulation results show that the improved DFOA reduces the energy consumption of broadcast and has better performance than comparison algorithms including the original DFOA, Simulated Annealing Genetic Algorithm (SAGA), Ant Colony Optimization (ACO) and Particle Swarm Optimization (PSO) in different network. The improved DFOA can increase the diversity of the samples, enhance the ability of escaping from local optimum and improve the network performance.

    Improved Louvain method with strategy of separating isolated nodes
    LI Lei, YAN Guanghui, YANG Shaowen, ZHANG Haitao
    2017, 37(4):  970-974.  DOI: 10.11772/j.issn.1001-9081.2017.04.0970
    Asbtract ( )   PDF (905KB) ( )  
    References | Related Articles | Metrics
    Louvain Method (LM) is an algorithm to detect community in complex network based on modularity optimization. Since there is no method to calculate the gain of modularity after nodes leave their community in the existing research, a method was presented to calculate the modularity-gain after nodes leave their community based on the definition of modularity and the method for calculating the modularity-gain after nodes merge. Secondly, aiming at the problem that LM requires large memory space, an improved algorithm was proposed with the strategy of separating isolated nodes. In each iteration of the algorithm, isolated nodes of the input network were separated in advance, only the connected nodes of the input network can actually participate in the iterative process. Isolated nodes and non-isolated nodes were stored respectively when storing communities detected. The experimental results based on real networks showed that the requirement of memory space was reduced by more than 40% in the improved algorithm, and the running time of the algorithm was further reduced. Experimental results indicate that the improved algorithm has more advantages in dealing with real networks.
    Improved algorithm for multiplication and division error detection based on delta code
    SUN Zongqi, ZANG Haijuan, ZHANG Chunhua, PAN Yong
    2017, 37(4):  975-979.  DOI: 10.11772/j.issn.1001-9081.2017.04.0975
    Asbtract ( )   PDF (898KB) ( )  
    References | Related Articles | Metrics
    In order to ensure the correctness of program execution in the safety critical system, the error control theory is used to encode the computer instructions, but the algorithm involves the modular operation, resulting in high additional complexity and difficulty to use in real-time systems. Aiming at reducing the additional complexity, delta code's multiplication and division algorithm was improved. The idea of redundancy encoding and differentiated ideology was introduced to ensure security, while the inverse element was introduced into division to transform division into multiplication, thus avoiding the overhead of the modular operation and reducing the additional complexity while improving the security of the algorithm. Theoretical analysis shows that the undetected error rate is proved to be 2.3*10-10. Simulation results show that the undetected error rate of the proposed algorithm is consistent with the theoretical value, and the complexity is 6.4-7.2 times of the original algorithm, but 7%-19% lower than original delta code. The proposed algorithm satisfies the requirements of safety critical application systems in terms of error detection rate and complexity.
    New PSO particle filter method based on likelihood-adjustment
    GAO Guodong, LIN Ming, XU Lan
    2017, 37(4):  980-985.  DOI: 10.11772/j.issn.1001-9081.2017.04.0980
    Asbtract ( )   PDF (937KB) ( )  
    References | Related Articles | Metrics
    Traditional Particle Filter (PF) algorithm based on Particle Swarm Optimization (PSOPF), which moves the moving particles to the high likelihood region, destroys the prediction distribution. When the likelihood function has many peaks, it has a large computation amount while filtering performance does not improved significantly. To solve this problem, a new PSOPF based on the Adjustment of the Likelihood (LA-PSOPF) was proposed. Under the premise of preserving the prediction distribution, the Particle Swarm Optimization (PSO) algorithm was used to adjust the likelihood distribution to increase the number of effective particles and improve the filtering performance. Meanwhile, a strategy of local optimization was introduced to scale down the swarm of PSO, reduce the amount of calculation and achieve the balance of accuracy and speed of estimation. The simulation results show that the proposed algorithm is better than PF and PSOPF when the measurement error is small and the likelihood function has many peaks, and the computing time is less than that of PSOPF.
    Efficient virtualization-based approach to improve system availability
    LI Jinjin, JIA Xiaoqi, DU Haichao, WANG Lipeng
    2017, 37(4):  986-992.  DOI: 10.11772/j.issn.1001-9081.2017.04.0986
    Asbtract ( )   PDF (1122KB) ( )  
    References | Related Articles | Metrics
    In terms of the problem that a safety-critical system will be paused, detected and resumed when security tools alert, and the delay between the occurrence and discovery of the false alarms (false positive or false negative) results in an effect on the availability of the guest Operating System (OS), a scheme based on virtualization was proposed. When a false alarm occurred, the operations of the suspicious application were quarantined correctly to avoid substantial system-wide damages. Then the operations of the suspicious application were logged and application inter-dependency information was generated according to its interactions with other applications. When the false alarm was determined, measures such as resuming the application's operations and killing the relevant applications according to the operation logs and inter-dependency information were taken so that the guest OS could reach the correct operating status quickly. The experimental results show that the scheme can reduce the overhead caused by rollback and recovery when a false alarm occurs. Compared to the situation without the proposed scheme, the overhead of handling the false alarm is reduced by 20%-50%. The proposed scheme can effectively reduce the effect of false alarm on the availability of clients, and can be applied in the cloud platform which provides services to safety-critical clients.
    Classical cipher model based on rough set
    TANG Jianguo, WANG Jianghua
    2017, 37(4):  993-998.  DOI: 10.11772/j.issn.1001-9081.2017.04.0993
    Asbtract ( )   PDF (901KB) ( )  
    References | Related Articles | Metrics
    Although classical cipher is simple and efficient, but it has a serious defect of being cracked easily under the current social computing power. A new classical cipher model based on rough sets was developed to solve this problem. Firstly, two features of rough sets were integrated into the model to weaken the statistical law of the model. One feature is that certainty contains uncertainty in rough sets, another is that the approximate space scale tends to increase sharply with the slight increase of the domain size. Secondly, the ability of producing random sequences of the model was improved by using mixed congruence method. Finally, part of plaintext information was involved in the encryption process by using self-defined arithmetic and congruence method to enhance the anti-attack ability of the model. The analysis shows that the model not only has the same level of time and space complexity as traditional classical cipher, but also has nearly ideal performance of diffusion and confusion, which completely overcomes the defects that classical cipher can be easily cracked, and can effectively resist the attacks such as exhaustive method and statistical analysis method.
    Predicate encryption scheme supporting secure multi-party homomorphic multiplicative computation
    LI Zhenlin, ZHANG Wei, DAI Xiaoming
    2017, 37(4):  999-1003.  DOI: 10.11772/j.issn.1001-9081.2017.04.0999
    Asbtract ( )   PDF (746KB) ( )  
    References | Related Articles | Metrics
    In the traditional Secure Multi-party Computation (SMC), each participant can obtain the final result, but this coarse-grained access control may not be suitable for the requirements of specific users to decrypt ciphertexts, thus a new encryption scheme which has more accurate access control on the decryption authority of computation results was put forward. Combined with predicate encryption, a predicate encryption scheme with multiplicative homomorphic property for the secure multi-party computation was constructed. Compared with the existing predicate encryption, it supports the homomorphic operation, and is more accurate in access control on the decryption authority of computation results. In the current background of cloud environment, the secure multi-party computation of more fine-grained access control on computation results is realized, which is proved secure under INDistinguishable Attribute-Hiding against Chosen Plaintext Attacks (IND-AH-CPA).
    Chinese signature authentication based on accelerometer
    LIU Wei, WANG Yang, ZHENG Jianbin, ZHAN Enqi
    2017, 37(4):  1004-1007.  DOI: 10.11772/j.issn.1001-9081.2017.04.1004
    Asbtract ( )   PDF (777KB) ( )  
    References | Related Articles | Metrics
    Acceleration data in 3 axes during a signature process can be collected to authenticate users. Because of complex structures of Chinese signature, the process of signing in the air is hard to be forged, but it also increases differences between signatures performed by the same user which brings more difficulties in authentication. Classical verification methods applied to 2-D signature or hand gesture cannot solve this problem. In order to improve the performance of in-air Chinese signature verification, the classical Global Sequence Alignment (GSA) algorithm was improved, and the interpolation was applied to matching sequences. Different from classical GSA algorithm which uses matching score to measure similarity between sequences, two distance indexes, Euclidean distance and absolute value distance, were introduced to calculate the differences between sequences after interpolation. Experimental results show that both of the two improved GSA algorithms can improve the accuracy of authentication, the Equal Error Rate (EER) of them are decreased by 37.6% and 52.6% respectively compared with the classical method.
    Approach to network security situational element extraction based on parallel reduction
    ZHAO Dongmei, LI Hong
    2017, 37(4):  1008-1013.  DOI: 10.11772/j.issn.1001-9081.2017.04.1008
    Asbtract ( )   PDF (930KB) ( )  
    References | Related Articles | Metrics
    The quality of network security situational element extraction plays a crucial role in network security situation assessment. However, most of the existing network security situational element extraction methods rely on prior knowledge, and are not suitable for processing network security situational data. For effective and accurate extraction of network security situational elements, a parallel reduction algorithm based on matrix of attribute importance was proposed. The parallel reduction was introduced into classical rough set, then a single decision information table was expanded to multiple ones without affecting the classification. The conditional entropy was used to calculate attribute importance, and the redundant attributes were deleted according to reduction rules, thus the network security situational elements were extracted efficiently. In order to verify the efficiency of the proposed algorithm, the classification prediction was implemented on Weka. Compared with the usage of all the attributes, the classification modeling time on NSL-KDD dataset was reduced by 16.6% by using the attributes reduced by the proposed algorithm. Compared with the existing three element extraction algorithms (Genetic Algorithm (GA), Greedy Search Algorithm (GSA), and Attribute Reduction based on Conditional Entropy (ARCE) algorithm), the proposed algorithm has higher recall rate and low false positive rate. The experimental results show that the data set reduced by the proposed algorithm has better classification performance, which realizes an efficient extraction of network security situational elements.
    Data cleaning method based on dynamic configurable rules
    ZHU Huijuan, JIANG Tonghai, ZHOU Xi, CHENG Li, ZHAO Fan, MA Bo
    2017, 37(4):  1014-1020.  DOI: 10.11772/j.issn.1001-9081.2017.04.1014
    Asbtract ( )   PDF (1069KB) ( )  
    References | Related Articles | Metrics
    Traditional data cleaning approaches usually implement cleaning rules specified by business requirements through hard-coding mechanism, which leads to well-known issues in terms of reusability, scalability and flexibility. In order to address these issues, a new Dynamic Rule-based Data Cleaning Method (DRDCM) was proposed, which supports the complex logic operation between various types of rules and three kinds of dirty data repair behavior. It integrates data detection, error correction and data transformation in one system and contributes several unique characteristics, including domain-independence, reusability and configurability. Besides, the formal concepts and terms regarding data detection and correction were defined, while necessary procedures and algorithms were also introduced. Specially, the supported multiple rule types and rule configurations in DRDCM were presented in detail. At last, the DRDCM approach was implemented. Experimental results show that the implemented system provides a high accuracy on the discarded behavior of dirty data repair with real-life data sets. Especially for the attribute required to comply with the statutory coding rules (such as ID card number), whose accuracy can reach 100%. Moreover, these results also indicate that this reference implementation of DRDCM can successfully support multiple data sources in cross-domain scenarios, and its performance does not sharply decrease with the increase of the number of rules. These results further validate that the proposed DRDCM is practical in real-world scenarios.
    Range query authentication for outsourced spatial databases
    HU Xiaoyan, WANG Jingyu, LI Hairong
    2017, 37(4):  1021-1025.  DOI: 10.11772/j.issn.1001-9081.2017.04.1021
    Asbtract ( )   PDF (904KB) ( )  
    References | Related Articles | Metrics
    In existing spatial range query authenticating methods such as VR-tree and MR-tree, the transmission cost of the server to the client is high and the verification efficiency of the client is low because the Verification Object (VO) contains too much authentication information. To resolve these problems, a new index structure MGR-tree was proposed. First of all, by means of embedding a R-tree in each leaf node of Grid-tree, the size of VO decreased, and the efficiency of query and authentication was improved. In addition, an optimal index MHGR-tree which takes advantage of the property of Hilbert curve and a filter policy were proposed to accelerate the verification. Experimental results show that the proposed method has a better performance compared with MR-tree. In the best case, the verification object size and authentication time of MHGR are 63% and 19% of MR respectively.
    Bayesian clustering algorithm for categorical data
    ZHU Jie, CHEN Lifei
    2017, 37(4):  1026-1031.  DOI: 10.11772/j.issn.1001-9081.2017.04.1026
    Asbtract ( )   PDF (919KB) ( )  
    References | Related Articles | Metrics
    To address the difficulty of defining a meaningful distance measure for categorical data clustering, a new categorical data clustering algorithm was proposed based on Bayesian probability estimation. Firstly, a probability model with automatic attribute-weighting was proposed, in which each categorical attribute is assigned an individual weight to indicate its importance for clustering. Secondly, a clustering objective function was derived using maximum likelihood estimation and Bayesian transformation, then a partitioning algorithm was proposed to optimize the objective function which groups data according to the weighted likelihood between objects and clusters instead of the pairwise distances. Thirdly, an expression for estimating the attribute weights was derived, indicating that the weight should be inversely proportional to the entropy of category distribution. The experiments were conducted on some real datasets and a synthetic dataset. The results show that the proposed algorithm yields higher clustering accuracy than the existing distance-based algorithms, achieving 5%-48% improvements on the Bioinformatics data with meaningful attribute-weighting results for the categorical attributes.
    Imbalanced telecom customer data classification method based on dissimilarity
    WANG Lin, GUO Nana
    2017, 37(4):  1032-1037.  DOI: 10.11772/j.issn.1001-9081.2017.04.1032
    Asbtract ( )   PDF (964KB) ( )  
    References | Related Articles | Metrics
    It is difficult for conventional classification technology to discriminate churn customers in the context of imbalanced telecom customer dataset, therefore, an Improved Dissimilarity-Based imbalanced data Classification (IDBC) algorithm was proposed by introducing an improved prototype selection strategy to Dissimilarity-Based Classification (DBC) algorithm. In prototype selection stage, the improved sample subset optimization method was adopted to select the most valuable prototype set from the whole dataset, thus avoiding the uncertainties caused by the random selection; in classification stage, new feature space was constructed via dissimilarity between samples from train set and prototype set, and samples from test set and prototype set, and then dissimilarity-based datasets mapped into corresponding feature space were learnt with conventional classification algorithms. Finally, the telecom customer dataset and other six ordinary imbalanced datasets from UCI database were selected to test the performance of IDBC. Compared with the traditional imbalanced data classification algorithm based on features, the recognition rate of DBC algorithm for rare class was improved by 8.3% on average, and the recognition rate of IDBC algorithm for raw class was increased by 11.3%. The experimental results show that the IDBC algorithm is not affected by the category distribution, and the discriminative ability of IDBC algorithm outperforms existing state-of-the-art approaches.
    Dynamic prediction model on export sales based on controllable relevance big data of cross-border e-commerce
    WANG Xuerong, WAN Nianhong
    2017, 37(4):  1038-1043.  DOI: 10.11772/j.issn.1001-9081.2017.04.1038
    Asbtract ( )   PDF (1121KB) ( )  
    References | Related Articles | Metrics
    Current popular prediction methods of foreign trade product sales only respectively study prediction problems from angles of the third party platform or big data, lacking consideration of dynamic evolution prediction on product sales based on Internet platform, big data and cross-border e-commerce. To improve the efficiency of export sales prediction, to achieve scalability and dynamic evolution of prediction systems, with mining controllable relevance big data of cross-border e-commerce export sale based on "Internet+foreign trade" surroundings, personalized prediction mechanism and smart prediction algorithms, improving corresponding algorithms such as distributed quantitative calculation and centralized qualitative calculation, a dynamic prediction model on export sales based on "Internet+foreign trade"-driven controllable relevance big data of cross-border e-commerce was proposed. Finally, this model was verified and analyzed. The performance analysis results show that the model integrates fully openness and extensibility of "Internet+" and dynamic prediction advantages of big data, achieving dynamic, smart, quantitative, and qualitative prediction on export sales based on "Internet+foreign trade"-driven controllable relevance big data of cross-border e-commerce. The comprehensive prediction efficiency of the proposed model is obviously better than those of traditional models, and it has stronger dynamic evolution and higher utility.
    New words detection method for microblog text based on integrating of rules and statistics
    ZHOU Shuangshuang, XU Jin'an, CHEN Yufeng, ZHANG Yujie
    2017, 37(4):  1044-1050.  DOI: 10.11772/j.issn.1001-9081.2017.04.1044
    Asbtract ( )   PDF (1117KB) ( )  
    References | Related Articles | Metrics
    The formation rules of microblog new words are extremely complex with high degree of dispersion, and the extracted results by using traditional C/NC-value method have several problems, including relatively low accuracy of the boundary of identified new words and low detection accuracy of new words with low frequency. To solve these problems, a method of integrating heuristic rules, modified C/NC-value method and Conditional Random Field (CRF) model was proposed. On one hand, heuristic rules included the abstracted information of classification and inductive rules focusing on the components of microblog new words. The rules were artificially summarized by using Part Of Speech (POS), character types and symbols through observing a large number of microblog documents. On the other hand, to improve the accuracy of the boundary of identified new words and the detection accuracy of new words with low frequency, traditional C/NC-value method was modified by merging the information of word frequency, branch entropy, mutual information and other statistical features to reconstruct the objective function. Finally, CRF model was used to train and detect new words. The experimental results show that the F value of the proposed method in new words detection is improved effectively.
    Bilingual collaborative Chinese relation extraction based on parallel corpus
    GUO Bo, FENG Xupeng, LIU Lijun, HUANG Qingsong
    2017, 37(4):  1051-1055.  DOI: 10.11772/j.issn.1001-9081.2017.04.1051
    Asbtract ( )   PDF (826KB) ( )  
    References | Related Articles | Metrics
    In the relation extraction of Chinese resources, the long Chinese sentence style is complex, the syntactic feature extraction is very difficult, and its accuracy is low. A bilingual cooperative relation extraction method based on a parallel corpus was proposed to resolve these above problems. In a Chinese and English bilingual parallel corpus, the English relation extraction classification was trained by dependency syntactic features which obtained by mature syntax analytic tools of English, the Chinese relation extraction classification was trained by n-gram feature which is suitable for Chinese, then they constituted bilingual view. Finally, based on the annotated and mapped parallel corpus, the training corpus with high reliability of both classifications were added to each other for bilingual collaborative training, and a Chinese relation extraction classification model with better performance was acquired. Experimental results on Chinese test corpus show that the proposed method improves the performance of Chinese relation extraction method based on weak supervision, its F value is increased by 3.9 percentage points.
    Word semantic similarity computation based on integrating HowNet and search engines
    ZHANG Shuowang, OUYANG Chunping, YANG Xiaohua, LIU Yongbin, LIU Zhiming
    2017, 37(4):  1056-1060.  DOI: 10.11772/j.issn.1001-9081.2017.04.1056
    Asbtract ( )   PDF (844KB) ( )  
    References | Related Articles | Metrics
    According to mismatch between word semantic description of "HowNet" and subjective cognition of vocabulary, in the context of making full use of rich network knowledge, a word semantic similarity calculation method combining "HowNet" and search engine was proposed. Firstly, considering the inclusion relation between word and word sememes, the preliminary semantic similarity results were obtained by using improved concept similarity calculation method. Then the further semantic similarity results were obtained by using double correlation detection algorithm and point mutual information method based on search engines. Finally, the fitting function was designed and the weights were calculated by using batch gradient descent method, and the similarity calculation results of the first two steps were fused. The experimental results show that compared with the method simply based on "HowNet" or search engines, the Spearman coefficient and Pearson coefficient of the fusion method are both improved by 5%. Meanwhile, the match degree of the semantic description of the specific word and subjective cognition of vocabulary is improved. It is proved that it is effective to integrate network knowledge background into concept similarity calculation for computing Chinese word semantic similarity.
    Cross-media retrieval based on latent semantic topic reinforce
    HUANG Yu, ZHANG Hong
    2017, 37(4):  1061-1064.  DOI: 10.11772/j.issn.1001-9081.2017.04.1061
    Asbtract ( )   PDF (732KB) ( )  
    References | Related Articles | Metrics
    As an important and challenging problem in the multimedia area, common semantic topic has different expression across different modalities, and exploring the intrinsic semantic information from different modalities in a collaborative manner was usually neglected by traditional cross-media retrieval methods. To address this problem, a Latent Semantic Topic Reinforce cross-media retrieval (LSTR) method was proposed. Firstly, the text semantic was represented based on Latent Dirichlet Allocation (LDA) and the corresponding images were represented with Bag of Words (BoW) model. Secondly, multiclass logistic regression was used to classify both texts and images, and the posterior probability under the learned classifiers was exploited to indicate the latent semantic topic of images and texts. Finally, the learned posterior probability was used to regularize their image counterparts to reinforce the image semantic topics, which greatly improved the semantic similarity between them. In the Wikipedia data set, the mean Average Precision (mAP) of retrieving text with image and retrieving image with text is 57.0%, which is 35.1%, 34.8% and 32.1% higher than that of the Canonical Correlation Analysis (CCA), Semantic Matching (SM) and Semantic Correlation Matching (SCM) method respectively. Experimental results show that the proposed method can effectively improve the average precision of cross-media retrieval.
    Video shot recommendation model based on emotion analysis using time-sync comments
    DENG Yang, ZHANG Chenxi, LI Jiangfeng
    2017, 37(4):  1065-1070.  DOI: 10.11772/j.issn.1001-9081.2017.04.1065
    Asbtract ( )   PDF (1074KB) ( )  
    References | Related Articles | Metrics
    To solve the problem that traditional video emotional analysis methods can not work effectively and the results are not easy to explain, a video shot emotional analysis approach based on time-sync comments was proposed, as a basis for the recommendation of video shots. First, a formal description of video shots recommendation based on emotion analysis was studied. Then, after analyzing the classification of time sync comments based on Latent Dirichlet Allocation (LDA) topic model, the emotional vector of the words in time-sync comments were evaluated. Meanwhile, the emotion relationships among the video shots were analyzed for video shots recommendation. The recommendation precision of the proposed method was 28.9% higher than that of the method based on Term Frequency-Inverse Document Frequency (TF-IDF), and 43.8% higher than that of traditional LDA model. The experimental results show that the proposed model is effective in analyzing the complex emotion of different kinds of text information.
    Incremental learning algorithm based on graph regularized non-negative matrix factorization with sparseness constraints
    WANG Jintao, CAO Yudong, SUN Fuming
    2017, 37(4):  1071-1074.  DOI: 10.11772/j.issn.1001-9081.2017.04.1071
    Asbtract ( )   PDF (632KB) ( )  
    References | Related Articles | Metrics
    Focusing on the issues that the sparseness of the data obtained after Non-negative Matrix Factorization (NMF) is reduced and the computing scale increases rapidly with the increasing of training samples, an incremental learning algorithm based on graph regularized non-negative matrix factorization with sparseness constraints was proposed. It not only considered the geometric structure in the data representation, but also introduced sparseness constraints to coefficient matrix and combined them with incremental learning. Using the results of previous factorization involved in iterative computation with sparseness constraints and graph regularization, the cost of the computation was reduced and the sparseness of data after factorization was highly improved. Experiments on both ORL and PIE face recognition databases demonstrate the effectiveness of the proposed method.
    Construction method of mobile application similarity matrix based on latent Dirichlet allocation topic model
    CHU Zheng, YU Jiong, WANG Jiayu, WANG Yuefei
    2017, 37(4):  1075-1082.  DOI: 10.11772/j.issn.1001-9081.2017.04.1075
    Asbtract ( )   PDF (1175KB) ( )  
    References | Related Articles | Metrics
    With the rapid development of mobile Internet, how to extract effective description information from a large number of mobile applications and then provide effective and accurate recommendation strategies for mobile users becomes urgent. At present, recommendation strategies are relatively traditional, and mostly recommend applications according to the single attribute, such as downloads, application name and application classification. In order to resolve the problem that the granularity of recommended applications is too coarse and the recommendation is not accurate, a mobile application similarity matrix construction method based on Latent Dirichlet Allocation (LDA) was proposed. Started from the application labels, a topic model distribution matrix of mobile applications was constructed, which was utilized to construct mobile application similarity matrix. Meanwhile, a method for converting the mobile application similarity matrix to the viable storage structure was also proposed. Extensive experiments demonstrate the feasibility of the proposed method, and the application similarity achieves 130 percent increasement by the proposed method compared with that by the existing 360 application market. The proposed method solves the problem that the recommended granularity is too coarse in the mobile application recommendation process, so that the recommendation result is more accurate.
    Opinion formation model of social network based on node intimacy and influence
    ZHANG Yanan, SUN Shibao, ZHANG Jingshan, YIN Lihang, YAN Xiaolong
    2017, 37(4):  1083-1087.  DOI: 10.11772/j.issn.1001-9081.2017.04.1083
    Asbtract ( )   PDF (778KB) ( )  
    References | Related Articles | Metrics
    Aiming at the universality of individual interaction and the heterogeneity of individual social influence in opinion spreading, an opinion formation model of social network was proposed on the basis of Hegselmann-Krause model. By introducing the concepts of intimacy between individuals, interpersonal similarity and interaction strength, the individual interactive set was extended, the influence weight was reasonably quantified, and more realistic view of interaction rule was built. Through a series of simulation experiments, the effects of main parameters in the model on opinion evolution were analyzed. The simulation results indicate that group views can converge to the same and form consensus under different confidence thresholds. And the larger the confidence threshold is, the shorter the convergence time is. When confidence threshold is 0.2, convergence time is only 10. Meanwhile, extending the interactive set and increasing the strength of interpersonal similarity will promote consensus formation. Besides, when the clustering coefficient and the average degree of scale-free network are higher, the group views are more likely to produce convergence effect. The results are helpful to understand the dynamic process of opinion formation, and can guide social managers to make decisions and analysis.
    Cuckoo search algorithm for multi-objective optimization based on chaos cloud model
    MA Yiyuan, SONG Weiping, NING Aiping, NIU Haifan
    2017, 37(4):  1088-1092.  DOI: 10.11772/j.issn.1001-9081.2017.04.1088
    Asbtract ( )   PDF (722KB) ( )  
    References | Related Articles | Metrics
    Concerning that Cuckoo Search algorithm for Multi-objective Optimization (MOCS) has slow speed in the late iteration and being easy to fall into the local optimum, a new MOCS based on Chaos Cloud Model (CCMMOCS) was proposed. In the evolutionary process, chaos theory was used to optimize the positions of general nests in order to avoid falling into the local optimum; then the cloud model was used to optimize the position of some better nests to improve the accuracy; finally the better value of them was chosen as the best value for optimization. The simulation experiments on five general test functions in error estimated value and diversity index show that CCMMOCS is much better than MOCS, Particle Swarm Optimization algorithm for Multi-objective Optimization (MOPSO) and NSGA-Ⅱ. Its Pareto fronts are closer to the ideal curve than those of other algorithms and the distribution is more uniform.
    Cross-population differential evolution algorithm based on opposition-based learning
    ZHANG Bin, LI Yanhui, GUO Hao
    2017, 37(4):  1093-1099.  DOI: 10.11772/j.issn.1001-9081.2017.04.1093
    Asbtract ( )   PDF (1001KB) ( )  
    References | Related Articles | Metrics
    Aiming at the deficiencies of traditional Differential Evolution (DE) algorithm, low optimization accuracy and low convergence speed, a Cross-Population Differential Evolution algorithm based on Opposition-based Learning (OLCPDE) was proposed by using chaos dispersion strategy, opposition-based optimization strategy and multigroup parallel mechanism. The chaos dispersion strategy was used to generate the initial population, then the population was divided into sub-groups of the elite and the general, and a standard differential evolution strategy and a differential evolution strategy of Opposition-Based Learning (OBL) were applied to the two sub-groups respectively. Meanwhile, a cross-population differential evolution strategy was applied to further improve the accuracy and enhance population diversity for unimodal function. The sub-groups were handled through these three strategies to achieve co-evolution. After the experiments are totally run for 30 times independently, it is proven that the proposed algorithm can stably converge to the global optimal solution in 11 functions among 12 standard test functions, which is superior to other comparison algorithms. The results indicate that the proposed algorithm not only has high convergence precision but also effectively avoid trapping in local optimum.
    Fast ensemble method for strong classifiers based on instance
    XU Yewang, WANG Yongli, ZHAO Zhongwen
    2017, 37(4):  1100-1104.  DOI: 10.11772/j.issn.1001-9081.2017.04.1100
    Asbtract ( )   PDF (764KB) ( )  
    References | Related Articles | Metrics
    Focusing on the issue that the ensemble classifier based on weak classifiers needs to sacrifice a lot of training time to obtain high precision, an ensemble method of strong classifiers based on instances named Fast Strong-classifiers Ensemble (FSE) was proposed. Firstly, the evaluation method was used to eliminate substandard classifier and order the restclassifiers by the accuracy and diversity to obtain a set of classifiers with highest precision and maximal difference. Secondly, the FSE algorithm was used to break the existing sample distribution, to re-sample and make the classifier pay more attention to learn the difficult samples. Finally, the ensemble classifier was completed by determining the weight of each classifier simultaneously. The experiments were conducted on UCI dataset and customized dataset. The accuracy of the Boosting reached 90.2% and 90.4% on both datasets respectively, and the accuracy of the FSE reached 95.6% and 93.9%. The training time of ensemble classifier with FSE was shortened by 75% and 80% compared to the ensemble classifier with Boosting when they reached the same accuracy. The theoretical analysis and simulation results show that FSE ensemble model can effectively improve the recognition accuracy and shorten training time.
    Meta-learning based optimization algorithm selection framework and its empirical study
    CUI Jianshuang, LIU Xiaochan, YANG Meihua, LI Wenyan
    2017, 37(4):  1105-1110.  DOI: 10.11772/j.issn.1001-9081.2017.04.1105
    Asbtract ( )   PDF (1014KB) ( )  
    References | Related Articles | Metrics
    The goal of algorithm selection is to automatically select the best suitable algorithm for current problem from a batch of available algorithms. For this purpose, an intelligent recommendation framework based on meta-learning approach was presented. The automatic selection procedure for Genetic Algorithm (GA), Particle Swarm Optimization (PSO) and Simulated Annealing (SA) was designed according to this framework by using Multi-mode Resource-Constrained Project Scheduling Problem (MRCPSP) as the validation data set. Three hundred and seventy-eight instances of MRCPSP were randomly picked out from the Project Scheduling Problem Library (PSPLib), and the inherent and statistic features of each instance were extracted and used as the metadata, then the prediction meta-model for new examples was obtained by using Feed-forward Neural Network (FNN) algorithm. The empirical results demonstrate that the hit rate reaches 95% at most, and the average hit rate is 85% when choosing one algorithm from two ones; the best hit rate reaches 92% and 80% respectively when choosing one algorithm from three ones. The proposed intelligent recommendation framework is successful and the automatic selection for optimization algorithms is feasible.
    Automatic bird vocalization identification based on Mel-subband parameterized feature
    ZHANG Saihua, ZHAO Zhao, XU Zhiyong, ZHANG Yi
    2017, 37(4):  1111-1115.  DOI: 10.11772/j.issn.1001-9081.2017.04.1111
    Asbtract ( )   PDF (780KB) ( )  
    References | Related Articles | Metrics
    Aiming at the vocalization-based bird species classification in natural acoustic environments, an automatic bird vocalization identification method was proposed based on a new Mel-subband parameterized feature. The field recordings were first divided into consecutive frames and the distribution of log-energies of those frames were estimated using Gaussian Mixture Model (GMM) of two mixtures. The frames with respect to high likelihood were selected to compose initial candidate acoustic events. Afterwards, a Mel band-pass filter-bank was first employed on the spectrogram of each event. Then, the output of each subband, i.e. a time-series containing time-varying band-limited energy, was parameterized by an AutoRegressive (AR) model, which resulted in a parameterized feature set consisting of all model coefficients for each bird acoustic event. Finally, the Support Vector Machine (SVM) classifier was utilized to identify bird vocalization. The experimental results on real-field recordings containing vocalizations of eleven bird species demonstrate that the precision, recall and F1-measure of the proposed method are all not less than 89%, which indicates that the proposed method considerably outperforms the state-of-the-art texture-feature-based method and is more suitable for automatic data analysis in continuous monitoring of songbirds in natural environments.
    Path planning algorithm of tractor-trailer mobile robots system based on path-following control method
    FANG Xiaobo, QIAN Hong, LIU Zhenming, MENG Dezhuang
    2017, 37(4):  1116-1121.  DOI: 10.11772/j.issn.1001-9081.2017.04.1116
    Asbtract ( )   PDF (868KB) ( )  
    References | Related Articles | Metrics
    Concerning the low accuracy, poor stability and security of the path planning algorithm of tractor-trailer mobile robots system, a path planning algorithm based on path-following method was proposed. On the basis of Rapid-exploring Random Tree (RRT) method and the equations of path-following, the path accuracy was improved by automatically fitting spline curve and tracking and generating the path between nodes; an angle constraint condition between systems and node hitting mechanism were added to the algorithm to improve the stability of algorithm and the security of results. In addition, an optimization algorithm based on greedy strategy was added to optimize results. The simulations results indicate that compared with the basic RRT algorithm, the path calculated by the improved algorithm is more close to the actual trajectory, and the success rate and security are better than the original algorithm, which can meet the requirement of quick design and real-time systems.
    Improved extended Kalman filter for attitude estimation of quadrotor
    WANG Long, ZHANG Zheng, WANG Li
    2017, 37(4):  1122-1128.  DOI: 10.11772/j.issn.1001-9081.2017.04.1122
    Asbtract ( )   PDF (1094KB) ( )  
    References | Related Articles | Metrics
    In order to improve the rapidity and tracking accuracy of Extended Kalman Filter (EKF), an improved EKF for attitude estimation of quadrotor was proposed by introducing a dynamic step gradient descent algorithm with acceleration restraint. Gradient descent algorithm was used to carry out nonlinear observation in the Kalman measurement update, eliminate the linearity error caused by the linearization of the standard extended Kalman algorithm and improve the accuracy and rapidity of the algorithm. The gradient step of gradient descent algorithm was dynamically processed to be proportional to the angular velocity of the quadrotor, thus enhancing the dynamic performance of the quadrotor. The motion acceleration generated during strong maneuverability was restrained to remove the adverse effect to attitude calculation and improve tracking accuracy of quadrotor's attitude estimation. To verify the feasibility and effectiveness of proposed algorithm, a quadrotor experimental platform was set up based on STM32 microcontroller. The experimental results show that the proposed algorithm has higher estimation accuracy, better dynamic performance and anti-interference characteristics under strong maneuverability and high-speed motion, and can ensure the stable flight of the quadrotor.
    Benchmarks construction and evaluation for resource leak in Android apps
    LIU Jierui, WU Xueqing, YAN Jun, YANG Hongli
    2017, 37(4):  1129-1134.  DOI: 10.11772/j.issn.1001-9081.2017.04.1129
    Asbtract ( )   PDF (1015KB) ( )  
    References | Related Articles | Metrics
    Android system is becoming the most popular mobile operating system for its opening property. However, the opening also brings some problems, the resource leak is one of the common ones. For the problems that resource leak is existed in Android system and no benchmarks has been provided for this specific issue, a benchmark named ResLeakBench for resource leak problem was proposed. First, official Android reference and a lot of real apps were studied, then the operation of resources and their common application scenarios were generalized. Second, 35 self-designed test apps were put into the benchmark according to the collected information; besides, to ensure the practicality of the benchmarks, 35 real apps related to resources were added into the benchmark. Finally, to evaluate the ResLeakBench, the resource leak analysis tool Relda2 and resource leak fixing tool RelFix were tested on the benchmark, and some shortages of Relda2 and RelFix were found. The experimental results show that ResLeakBench is a practical benchmark.
    Feature selection model for harmfulness prediction of clone code
    WANG Huan, ZHANG Liping, YAN Sheng, LIU Dongsheng
    2017, 37(4):  1135-1142.  DOI: 10.11772/j.issn.1001-9081.2017.04.1135
    Asbtract ( )   PDF (1468KB) ( )  
    References | Related Articles | Metrics
    To solve the problem of irrelevant and redundant features in harmfulness prediction of clone code, a combination model for harmfulness feature selection of code clone was proposed based on relevance and influence. Firstly, a preliminary sorting for the correlation of feature data was proceeded by the information gain ratio, then the features with high correlation was preserved and other irrelevant features were removed to reduce the search space of features. Next, the optimal feature subset was determined by using the wrapper sequential floating forward selection algorithm combined with six kinds of classifiers including Naive Bayes and so on. Finally, the different feature selection methods were analyzed, and feature data was analyzed, filtered and optimized by using the advantages of various methods in different selection critera. Experimental results show that the prediction accuracy is increased by15.2-34 percentage pointsafter feature selection; and compared with other feature selection methods, F1-measure of this method is increased by 1.1-10.1 percentage points, and AUC measure is increased by 0.7-22.1 percentage points. As a result, this method can greatly improve the accuracy of harmfulness prediction model.
    Mutation test method for browser compatibility of JavaScript
    CHENG Yong, QIN Dan, YANG Guang
    2017, 37(4):  1143-1148.  DOI: 10.11772/j.issn.1001-9081.2017.04.1143
    Asbtract ( )   PDF (1031KB) ( )  
    References | Related Articles | Metrics
    Since the research on testing technology for JavaScript browser compatibility problems is insufficient, based on mutation testing method and the analysis on the compatibility of JavaScript in Web applications in major browsers, eighteen mutation operators was designed, and an automated testing tool named Compatibility Mutator was implemented. Compatibility Mutator analyze JavaScript syntax with Abstract Syntax Tree (AST), calls various browsers with Selenium WebDriver to run mutation testing automatically and concurrently. The experiments on 7 widely-used JavaScript frameworks showed that the proposed mutation operators could generate a certain amount of mutants, the mutation scores got from mutation testing on jQuery and YUI were 43.06% and 7.69% respectively. Experimental results prove that the proposed operators can trigger the compatibility issues effectively, and evaluate the completeness of test suite effectively in finding the browser compatibility issues.
    Garbage collection algorithm for NAND flash memory based on logical region heat
    LEI Bingbing, YAN Hua
    2017, 37(4):  1149-1152.  DOI: 10.11772/j.issn.1001-9081.2017.04.1149
    Asbtract ( )   PDF (808KB) ( )  
    References | Related Articles | Metrics
    To solve the problems of low collection performance, poor wear leveling effect, and high memory overhead in the existing NAND flash memory garbage collection algorithms, a new garbage collection algorithm based on logical region heat was proposed. The heat calculation formula was redefined, the NAND memory of continuous logical address was defined as a heat range which was used to replace the heat of logical page, then the data with different heat was separated into the corresponding flash blocks with different erase counts. The cold and hot data were effectively separated,and the memory space was also saved. Meanwhile, a new collection cost function was constructed to improve the collection efficiency and wear leveling effect. The experimental results showed that compared with the excellent File-aware Garbage Collection (FaGC) algorithm, the total number of erase operations was reduced by 11%, the total number of copy operations was reduced by 13%, the maximum difference of erase counts was reduced by 42%, and the memory consumption was reduced by 75%. Therefore, the available flash memory space can be increased, the read and write performance of flash memory can be improved, and the flash memory life can be also extended by using the proposed algorithm.
    General record-replay method based on data interception and cheating injection
    YAO Xiaoqiang, LIU Changyun, GUO Xiangke
    2017, 37(4):  1153-1156.  DOI: 10.11772/j.issn.1001-9081.2017.04.1153
    Asbtract ( )   PDF (658KB) ( )  
    References | Related Articles | Metrics
    For the problems of the traditional method of data record-replay, such as packet format association, close corporation with the controlled application, and low transmission efficiency, a new record-replay method based on data interception and cheating injection was proposed. Firstly, the network data packet was automatically intercepted through the service provider interface technique of Winsock 2. Secondly, the problem of the data sharing and high speed data access was solved by using the memory-mapped file technique. Finally, the saved data packet was intercepted into the user program by the data read operation motivated by the fake messages. The practical application shows that the new method is suitable for the distributed simulation and simulated training system for its merits such as the avoidance of network packet transmission, no necessity for corporation with the controlled application, irrelevance of the packet format, smooth recurrence with ten times the speed.
    Eventual consistency platform of distributed system based on message communication
    XU Jin, HUANG Bo, FENG Jiong
    2017, 37(4):  1157-1163.  DOI: 10.11772/j.issn.1001-9081.2017.04.1157
    Asbtract ( )   PDF (1141KB) ( )  
    References | Related Articles | Metrics
    In order to meet the performance and throughput requirements of distributed systems, the asynchronous message communication is a common strategy. However, this strategy can not solve the consistency problem of the distributed system. In order to solve this problem, this paper proposed the establishment of consistency guarantee platform. Firstly, the system fulfilled idempotency and strong consistency between business data and message production/consumption records. Secondly, a message monitoring strategy was established. And it could be decided whether a message was correct or the compensation/idempotent operation was needed, according to the monitoring rules and production/consumption records, in order to realize the eventual consistency of the distributed system based on message communication. Lastly, the Separation of Concerns (SoC) and horizontal segmentation methods were adopted in design and realization of this platform. Experiments and analyses have shown the better performance of this distributed message communication, comparing to the asynchronous communication. This platform could timely check and handle the inconsistency and thus achieve the eventual consistency, i.e. the final eventual consistency of the whole system. Also the platform design could easily be adopted to multiply business systems, which means this platform is not only superior-performed but also economic.
    Domain adaptation image classification based on target local-neighbor geometrical information
    TANG Song, CHEN Lijuan, CHEN Zhixian, YE Mao
    2017, 37(4):  1164-1168.  DOI: 10.11772/j.issn.1001-9081.2017.04.1164
    Asbtract ( )   PDF (799KB) ( )  
    References | Related Articles | Metrics
    In many real engineering applications, the distribution of training scenarios (source domain) and the distribution of testing scenarios (target domain) is different, thus the classification performance decreases sharply when simply applying the classifier trained in source domain directly to the target domain. At present, most of the existing domain adaptation methods are based on the probability-inference. For the problem of domain adaptation image classification, a collaborative representation based unsupervised method was proposed from the view of image representation. Firstly, all of the source samples were taken as the dictionary. Secondly, the three target samples closest to the target sample in the target domain were exploited to robustly represent the local-neighbor geometrical information. Thirdly, the target sample was encoded by combining the dictionary and the local-neighbor information. Finally, the classification was completed by using the nearest classifier. Since the collaborative representations have stronger robustness and discriminative ability by absorbing the target local-neighbor information, the classification method based on the new representations has better classification performance. The experimental results on the domain adaptation dataset confirm the effectiveness of the proposed method.
    Self-adaptive group based sparse representation for image inpainting
    LIN Jinyong, DENG Dexiang, YAN Jia, LIN Xiaoying
    2017, 37(4):  1169-1173.  DOI: 10.11772/j.issn.1001-9081.2017.04.1169
    Asbtract ( )   PDF (827KB) ( )  
    References | Related Articles | Metrics
    Focusing on the problem of object structure discontinuity and poor texture detail occurred in image inpainting, an inpainting algorithm based on self-adaptive group was proposed. Different from the traditional method which uses a single image block or a fixed number of image blocks as the repair unit, the proposed algorithm adaptively selects different number of similar image blocks according to the different characteristics of the texture area to construct self-adaptive group. A self-adaptive dictionary as well as a sparse representation model was established in the domain of self-adaptive group. Finally, the target cost function was solved by Split Bregman Iteration. The experimental results show that compared with the patch-based inpainting algorithm and Group-based Sparse Representation (GSR) algorithm, the Peak Signal-to-Noise Ratio (PSNR) and the Structural SIMilarity (SSIM) index are improved by 0. 94-4.34 dB and 0. 0069-0.0345 respectively; meanwhile, the proposed approach can obtain image inpainting speed-up of 2.51 and 3.32 respectively.
    Parallel convolutional neural network for super-resolution reconstruction
    OUYANG Ning, ZENG Mengping, LIN Leping
    2017, 37(4):  1174-1178.  DOI: 10.11772/j.issn.1001-9081.2017.04.1174
    Asbtract ( )   PDF (843KB) ( )  
    References | Related Articles | Metrics
    To extract more effective features and speed up the convergence of model training, a super-resolution reconstruction algorithm based on parallel convolution neural network was proposed. The network consists of two different network structures, one is a simple residual network structure, which has a easier optimal residual mapping than the original one; the other is a convolutional neural network with nonlinear mapping, which can increase the non-linearity of the network. As the complexity of the parallel network structure, the convergence speed is the key issue. Aiming at this problem, the Local Response Normalization (LRN) layer was added to the convolution layers to simplify the model parameters and enhance the feature fitting ability, thus accelerating the convergence. Experimental results show that, compared with algorithms based on deep convolutional neural network, the proposed method accelerates the convergence, improves the visual quality, and increases Peak Signal-to-Noise Ratio (PSNR) at least 0.2 dB.
    Deep natural language description method for video based on multi-feature fusion
    LIANG Rui, ZHU Qingxin, LIAO Shujiao, NIU Xinzheng
    2017, 37(4):  1179-1184.  DOI: 10.11772/j.issn.1001-9081.2017.04.1179
    Asbtract ( )   PDF (999KB) ( )  
    References | Related Articles | Metrics
    Concerning the low accuracy of automatically labelling or describing videos by computers, a deep natural language description method for video based on multi-feature fusion was proposed. The spatial features, motion features and video features of video frame sequence were extracted and fused to train a Long-Short Term Memory (LSTM) based natural language description model. Several natural language description models were trained through the combination of different features from early fusion, then did a late fusion when testing. One of the models was selected to predict possible outputs under current inputs, and the probabilities of these outputs were recomputed with other models, then a weighted sum of these outputs was computed and the output with the highest probability was used as the next output. The feature fusion methods of the proposed method include early fusion such as feature concatenating, weighted summing of different features after alignment, and late fusion such as weighted fusion of outputs' probabilities of different models based on different features, finetuning generated LSTM model by early fused features. Comparison experimental results on Microsoft Video Description (MSVD) dataset indicate that the fusion of different kinds of features can promote the evaluation score, while the fusion of the same kind of features cannot get higher evaluation score than that of the best feature; however, finetuning pre-trained model with other features has poor effect. Among different combination of different features tested, the description generated by the method of combining early fusion and later fusion gets 0.302 of METEOR, which is 1.34% higher than the highest score that can be found, it means that the method is able to improve the accuracy of video automatic description.
    Learning based kernel image differential filter for face recognition
    FANG Yiguang, LIU Wu, ZHANG Ji, ZHANG Lingchen, YUAN Meigui, QU Lei
    2017, 37(4):  1185-1188.  DOI: 10.11772/j.issn.1001-9081.2017.04.1185
    Asbtract ( )   PDF (767KB) ( )  
    References | Related Articles | Metrics
    For the applications of face recognition, a learning based kernel image differential filter was proposed. Firstly, instead of designing the image filter in a handcrafted or analytical way, the new image filter was designed by dynamically learning from the training data. By integrating the idea of Linear Discriminant Analysis (LDA) into filter learning, the intra-class difference of filtered image was attenuated and the inter-class difference was amplified. Secondly, the second order derivative operator and kernel trick were introduced to better extract the image detail information and cope with the nonlinear feature space problem. As a result, the filter is adaptive and more discriminative feature description can be obtained. The proposed algorithm was experimented on AR and ORL face database and compared with linearly learning image filter named IFL, kernel image filter without differential information, and kernel image filter considering only one order differential information. The experimental results validate the effectiveness of the proposed method.
    Scale-adaptive face tracking algorithm based on graph cuts theory
    HU Zhangfang, QIN Yanghong
    2017, 37(4):  1189-1192.  DOI: 10.11772/j.issn.1001-9081.2017.04.1189
    Asbtract ( )   PDF (665KB) ( )  
    References | Related Articles | Metrics
    Aiming at the problem of the excessive size-changing while the tracking window is enlarged by traditional Continuously Adaptive MeanShift (Camshift) algorithm in face tracking, an adaptive window face tracking method for Camshift based on graph cuts theory was proposed. Firstly, a graph cut area was created according to the Camshift iteration result of every frame by using graph cuts theory, and the skin lump was found by using Gaussian mixture model as weights of graph cuts. As a result, the tracking window could be updated by the skin lump. Then the real size of the target was obtained by computing the size of skin lump, and whether the target needed to be re-tracked was determined by comparing the size of the skin lump in the tracking window with that in the previous frame. Finally, the skin lump in last frame was used as the tracking target of the next frame. The experimental results demonstrate that the proposed method based on graph cuts can avoid interference of other skin color targets in the background, which effectively reflects the real face size-changing of the human body in rapid movement, and prevents the Camshift algorithm from losing the tracking target and falling into the local optimal solution with good usability and robustness.
    MRI image registration based on adaptive tangent space
    LIU Wei, CHEN Leiting
    2017, 37(4):  1193-1197.  DOI: 10.11772/j.issn.1001-9081.2017.04.1193
    Asbtract ( )   PDF (775KB) ( )  
    References | Related Articles | Metrics
    The diffeomorphism is a differential transformation with smooth and invertible properties, which leading to topology preservation between anatomic individuals while avoiding physically implausible phenomena during MRI image registration. In order to yield a more plausible diffeomorphism for spatial transformation, nonlinear structure of high-dimensional data was considered, and an MRI image registration using manifold learning based on adaptive tangent space was put forward. Firstly, Symmetric Positive Definite (SPD) covariance matrices were constructed by voxels from an MRI image, then to form a Lie group manifold. Secondly, tangent space on the Lie group was used to locally approximate nonlinear structure of the Lie group manifold. Thirdly, the local linear approximation was adaptively optimized by selecting appropriate neighborhoods for each sample voxel, therefore the linearization degree of tangent space was improved, the local nonlinearization structure of manifold was highly preserved, and the best optimal diffeomorphism could be obtained. Numerical comparative experiments were conducted on both synthetic data and clinical data. Experimental results show that compared with the existing algorithm, the proposed algorithm obtains a higher degree of topology preservation on a dense high-dimensional deformation field, and finally improves the registration accuracy.
    Integrated indoor positioning algorithm based on D-S evidence theory
    WANG Xuqiao, WANG Jinkun
    2017, 37(4):  1198-1201.  DOI: 10.11772/j.issn.1001-9081.2017.04.1198
    Asbtract ( )   PDF (762KB) ( )  
    References | Related Articles | Metrics
    An integrated positioning algorithm for Wireless Fidelity / Inertial Measurement Unit (WiFi/IMU) based on D-S evidence inference theory was proposed for large indoor area Location Based Service (LBS) without beacons deployment. Firstly, the transmission model of signal strength of a single Access Point (AP) was established, then Kalman Filter was used to denoise the Received Signal Strength Indication (RSSI). Secondly, Dempster/Shafer (D-S) evidence theory was applied in the data fusion process for real-time acquisition of multi-sources, including the signal strength of WiFi, yaw and accelerations on all shafts; then the fingerprint blocks with high confidence were selected. Finally, the Weighted K-Nearest Neighbor (WKNN) method was exploited for the terminal position estimation. Numerical simulations on unit area show that the maximum error is 2.36 m and the mean error is 1.27 m, which proves the viability and effectiveness of the proposed algorithm; the cumulated error probability is 88.20% when the distance is no greater than the typical numerical value, which is superior to 70.82% of C-Support Vector Regression (C-SVR) or 67.85% of Pedestrian Dead Reckoning (PDR). Furthermore, experiments on the whole area of the real environment also show that the proposed algorithm has an excellent environmental applicability.
    Application of improved grey wolf optimizer algorithm in soil moisture monitoring and forecasting system
    LI Ning, LI Gang, DENG Zhongliang
    2017, 37(4):  1202-1206.  DOI: 10.11772/j.issn.1001-9081.2017.04.1202
    Asbtract ( )   PDF (783KB) ( )  
    References | Related Articles | Metrics
    Focusing on the issues of high cost, high susceptibility to damage and low prediction accuracy of soil moisture monitoring and forecasting system, the soil moisture monitoring based on non-fixed wireless sensor network and improved grey wolf algorithm optimization neural network was designed and implemented. In the proposed soil moisture monitoring system, non-fixed and plug-in sensor bluetooth network was used to collect moisture data, and high-precision multi-source location access fusion method was used for wide-area outdoor high-precision positioning. In terms of algorithms, focusing on the issue that Grey Wolf Optimizer (GWO) algorithm easily falls into local optima in its later iterations, an improved GWO algorithm based on rearward explorer mechanism was proposed. Firstly, according to the fitness value of the population, the explorer type was added to the original individual types of the algorithm. Secondly, the search period of population was divided into three parts: active exploration period, cycle exploration period and population regression period. Finally, the unique location updating strategy was used for the explorer during the different period, which made the algorithm more random in the early stage and keep updating in the middle and late stages, thus strengthening the local optimal avoidance ability of the algorithm. The algorithm was tested on the standard functions and applied to optimize the neural network prediction model of soil moisture system. Based on the datasets obtained from the experimental plot No. 2 in a city, the experimental results show that the relative error decreases by about 4 percentage points compared with the direct neural network prediction model, and decreases by about 1 to 2 percentage points compared with the traditional GWO algorithm and Particle Swarm Optimization (PSO). The proposed algorithm has smaller error, better local optimal avoidance ability, and improves the prediction quality of soil moisture.
    Rolling bearing fault diagnosis based on visual heterogeneous feature fusion
    YANG Hongbai, ZHANG Hongli, LIU Shulin
    2017, 37(4):  1207-1211.  DOI: 10.11772/j.issn.1001-9081.2017.04.1207
    Asbtract ( )   PDF (821KB) ( )  
    References | Related Articles | Metrics
    Aiming at the shortcomings of large feature set dimensionality, data redundancy and low fault recognition rate in existing fault diagnosis method based on simple combination of multi-classes features, a fault diagnosis method based on heterogeneous feature selection and fusion was proposed. The clustering characteristics of the feature data was analyzed according to the contours of the data of various class of features, and the redundant feature dimensions which are weakly clustered and not useful for fault classification were removed, only the feature dimensions with strong clustering characteristics were retained for the fault recognition. In the bearing fault diagnosis experiment, time-domain statistics and wavelet packet energy of fault signals were optimally selected and merged, and Back Propagation (BP) neural network was used for fault pattern recognition. The fault recognition rate reached 100%, which is significantly higher than that of the fault diagnosis method without feature selection and fusion. Experimental results show that the proposed method is easy to be implemented and can significantly improve the fault recognition rate.
    Speech enhancement algorithm based on improved variable-step LMS algorithm in cochlear implant
    XU Wenchao, WANG Guangyan, CHEN Lei
    2017, 37(4):  1212-1216.  DOI: 10.11772/j.issn.1001-9081.2017.04.1212
    Asbtract ( )   PDF (799KB) ( )  
    References | Related Articles | Metrics
    In order to improve the quality of speech signal and adaptability of cochlear implant under strong noise background, an improved method was proposed based on the combination of spectral subtraction and variable-step Least Mean Square error (LMS) adaptive filtering algorithm, and a speech enhancement hardware system for cochlear implant was constructed with this method. Concerning the problem of slow convergence rate and big steady-state error, the squared term of output error was used to adjust the step size of variable-step LMS adaptive filtering algorithm; besides, the combination of fixed and changed values of step was also considered, thus improved the adaptability and quality of speech signal. The speech enhancement hardware system for cochlear implant was composed of TMS320VC5416 and audio codec chip TLV320AIC23B, high-speed acquisition and real-time processing of voice data between TMS320VC5416 and TLV320AIC23B were realized by the interface of Muti-channel Buffered Serial Port (McBSP) and Serial Peripheral Interface (SPI).The Matlab simulation and test results prove that the proposed method has good performance in eliminating noise, the Signal-to-Noise Ratio (SNR) can be increased by about 10 dB in the case of low input SNR, and Perceptual Evaluation of Speech Quality (PESQ) score can be also greatly enhanced, the quality of the voice signal is improved effectively, and the system based on the proposed algorithm has stable performance which further improves the clarity and intelligibility of voice in cochlear implant.
2024 Vol.44 No.5

Current Issue
Honorary Editor-in-Chief: ZHANG Jingzhong
Editor-in-Chief: XU Zongben
Associate Editor: SHEN Hengtao XIA Zhaohui
Domestic Post Distribution Code: 62-110
Foreign Distribution Code: M4616
No. 9, 4th Section of South Renmin Road, Chengdu 610041, China
Tel: 028-85224283-803
Website: www.joca.cn
E-mail: bjb@joca.cn
Join CCF