Journal of Computer Applications

Efficient storage scheme for deadline aware distributed matrix multiplication

Yongzhu ZHAO, Weidong LI, Bin TANG, Feng MEI, Wenda LU

2020, 40(2): 311-315. DOI: 10.11772/j.issn.1001-9081.2019091640

Asbtract ( )

HTML ( )

PDF (742KB) ( )

Figures and Tables | References | Related Articles | Metrics

Distributed matrix multiplication is a fundamental operation in many distributed machine learning and scientific computing applications， but its performance is greatly influenced by the stragglers commonly existed in the systems. Recently， researchers have proposed a fountain code based coded matrix multiplication method， which can effectively mitigate the effect of stragglers by fully exploiting the partial results of stragglers. However， it lacks the consideration of the storage cost of worker nodes. By considering the tradeoff relationship between the storage cost and the finish time of computation， the computational deadline-aware storage optimization problem for heterogeneous worker nodes was proposed firstly. Then， through the theoretical analysis， the solution based on expectation approximation was presented， and the problem was transformed into a convex optimization problem by relaxation for efficient solution. Simulation results show that in the case of ensuring a large task success rate， the storage overhead of the proposed scheme will rapidly decrease as the task duration is relaxed， and the scheme can greatly reduce the storage overhead brought by encoding. In other words， the proposed scheme can significantly reduce the extra storage overhead while guaranteeing that the whole computation can be finished before the deadline with high probability.

High performance key-value storage system based on remote direct memory access

Cheng WANG, Baoliu YE, Feng MEI, Wenda LU

2020, 40(2): 316-320. DOI: 10.11772/j.issn.1001-9081.2019091635

Asbtract ( )

HTML ( )

PDF (613KB) ( )

Figures and Tables | References | Related Articles | Metrics

With the continuous increment of data and system size， network communication becomes a performance bottleneck of key-value storage systems. Meanwhile， Remote Direct Memory Access （RDMA） technique can support high bandwidth， low latency data transmission， which provides a new idea for designing key-value storage systems. Based on RDMA technique in the high performance network， a key-value storage system named Chequer with high performance and low CPU overhead was designed and implemented. By combining the characteristics of RDMA primitives， the basic operation workflow of key-value storage system was redesigned. And a linear probing based shared hash table was designed to reduce the number of client reading rounds by solving the problem of client cache invalidation as well as increasing the hash hit rate， which can further improve the performance of the system. The Chequer system was implemented on the small-scale cluster， and its performance was demonstrated by experiments.

Incremental learning based proactive caching mechanism for RocksDB key-value system

Keyun LUO, Baoliu YE, Bin TANG, Feng MEI, Wenda LU

2020, 40(2): 321-327. DOI: 10.11772/j.issn.1001-9081.2019091616

Asbtract ( )

HTML ( )

PDF (723KB) ( )

Figures and Tables | References | Related Articles | Metrics

RocksDB key-value storage system based on Log-Structured Merge （LSM） tree has the problem of low read performance caused by the constraints of its hierarchical structure. One effective solution is to cache hot spot data proactively， but it faces two challenges. One is how to predict the hot spot data when the data distribution keeps on changing constantly， the other is how to integrate the proactive caching mechanism with the RocksDB storage structure. To tackle these challenges， a proactive caching framework for RocksDB key-value system with multiple components including data collection， system interaction and system evaluation was built， which can cache the hot spot data at the low levels of the LSM tree. And with the modeling of data access patterns， an incremental learning based prediction analysis method for hot spot data was designed and implemented， which can reduce the number of I/O operations of storage medium. Experimental results show that the proposed mechanism can effectively improve the read performance of RocksDB under different dynamic workloads.

Computing task offloading based on multi-cloudlet collaboration

Qingyong WANG, Yingchi MAO, Yichao WANG, Longbao WANG

2020, 40(2): 328-334. DOI: 10.11772/j.issn.1001-9081.2019081367

Asbtract ( )

HTML ( )

PDF (800KB) ( )

Figures and Tables | References | Related Articles | Metrics

Focusing on the problems of complex process and long response time of task offloading in multi-cloudlet mode， a computing task offloading model based on multi-cloudlet collaboration was constructed， and a Weighted self-Adaptive Inertia Weight Particle Swarm Optimization （WAIW-PSO） algorithm was proposed to solve the optimal offloading scheme quickly. Firstly， the task execution process of mobile terminal-cloudlet-remote cloud was modeled. Secondly， considering the competition of computing resources by multiple users， the task offloading model based on multi-cloudlet collaboration was constructed. Finally， since the complexity of solving the optimal offloading scheme was excessively high， the WAIW-PSO was proposed to solve the offloading problem. Simulation results show that compared with the standard Particle Swarm Optimization （PSO） algorithm and the PSO algorithm with Decreasing Inertia Weight based on Gaussian function （GDIWPSO）， WAIW-PSO algorithm can adjust the inertia weight according to evolutionary generation and individual fitness， and it has the better optimization ability and the shortest time for finding the optimal offloading scheme. Experimental results on different task unloading schemes with different numbers of equipments and tasks show that the WAIW-PSO algorithm based offloading schemes can significantly shorten the total task completion time.

Dynamic monitoring model based on DPDK parallel communication

Cui LI, Qingkui CHEN

2020, 40(2): 335-341. DOI: 10.11772/j.issn.1001-9081.2019081405

Asbtract ( )

HTML ( )

PDF (846KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to make better use of the performance of communication system， make full use of the resources of system nodes， and improve the reliability and stability of the system， a dynamic monitoring model based on DPDK （Data Plane Development Kit） parallel communication was designed. Combined with the high speed， large traffic， strong real-time characteristics of DPDK and communication system， the model was designed for multi-node backup， data packet and control packet separation， transmitting and receiving data packets in parallel by multiple network ports， and processing data packets in parallel by multiple cores. The monitoring objects were analyzed， the data acquisition methods were researched， and a 2-layer communication protocol named DMPD was designed. The fine-grained monitoring was performed on the network ports， and the network port load information model was given. Besides， a more efficient and fair dynamic load balancing algorithm based on multi-network port was designed by combining hash function， adjustment function and dynamic load information. Experimental results show that the dynamic monitoring model can accurately detect and timely deal with the abnormality appeared in the system， and achieve the load balancing of multiple network ports.

Inference delay optimization of branchy neural network model based on edge computing

Qi FAN, Zhuo LI, Xin CHEN

2020, 40(2): 342-346. DOI: 10.11772/j.issn.1001-9081.2019081406

Asbtract ( )

HTML ( )

PDF (629KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the long delay of inference tasks in Deep Neural Network （DNN） on cloud servers， a branchy neural network deployment model based on edge computing was proposed. The distributed deployment problem of DNNs in edge computing scenarios was analyzed， and was proved to be NP-hard. A Deployment algorithm based on Branch and Bound （DBB） was designed to select appropriate edge computing nodes to reduce inference delay. And a Selection Node Exit （SNE） algorithm was designed and implemented to select the appropriate edge computing nodes for different tasks to exit the inference task. The simulation results show that， compared with the approach of deploying neural network model on the cloud， the branchy neural network model based on edge computing reduces the inference delay by 36% on average.

Automatic tuning of Ceph parameters based on random forest and genetic algorithm

Yu CHEN, Yingchi MAO

2020, 40(2): 347-351. DOI: 10.11772/j.issn.1001-9081.2019081366

Asbtract ( )

HTML ( )

PDF (722KB) ( )

Figures and Tables | References | Related Articles | Metrics

The performance of Ceph system is significantly affected by the configuration parameters. In the optimization of configuration of Ceph cluster， there are many kinds of configuration parameters with complex meanings， which makes it difficult to achieve fast and accurate optimization. To solve the above problems， a parameter tuning method based on Random Forest （RF） and Genetic Algorithm （GA） was proposed to automatically adjust the Ceph parameter configuration in order to optimize the Ceph system performance. RF algorithm was used to construct a performance prediction model for the Ceph system， and the output of the prediction model was used as the input of GA， then the parameter configuration scheme was automatically and iteratively optimized by using GA. Simulation results show that compared with the system with default parameter configuration， the Ceph file system with optimized parameter configuration has the read and write performance improved by about 1.4 times， and the optimization time is much lower than that of the black box parameter tuning method.

Hard-negative sample mining for metric learning based on linear assignment

Taiming FU, Yan CHEN, Taoshen LI

2020, 40(2): 352-357. DOI: 10.11772/j.issn.1001-9081.2019081403

Asbtract ( )

HTML ( )

PDF (2386KB) ( )

Figures and Tables | References | Related Articles | Metrics

Scientists identify the species of whales based on the shape and the distinctive marks of the whale tails， but the process of recognition by human eyes and manual labeling is very cumbersome. The dataset of whale tail photo has the unbalanced data distribution， and some specific categories in the dataset have very few samples or even one sample. Besides， the samples have small individual differences and contain unknown categories， which leads to the difficulty in automatic labeling of whale identification by image classification. To solve the problem that metric learning is difficult to realize classification under this task， on the basis of Siamese Neural Network （SNN）， the training batches were constructed dynamically by using Linear Assignment Problem （LAP） algorithm in the training process of hard-negative sample mining. Firstly， image feature vectors were extracted from the training samples， and the similarity metric of feature vector was calculated. Then， LAP was used to assign sample pairs to the model， training sample batches were constructed dynamically according to the metric score matrix， and the difficult sample pairs were targeted by trained. Experimental results on a whale tail image dataset with unbalanced data distribution and CUB 200-2001 dataset show that， the proposed algorithm can achieve good results in learning minority classes and classifying fine-grained images.

Distributed multi-task allocation method for user area in mobile crowd sensing

Junying HAN, Zhenyu ZHANG, Deshi KONG

2020, 40(2): 358-362. DOI: 10.11772/j.issn.1001-9081.2019081402

Asbtract ( )

HTML ( )

PDF (575KB) ( )

Figures and Tables | References | Related Articles | Metrics

Most Mobile Crowd Sensing （MCS） task allocation methods are specific to a single task and are difficult to apply to real-world scenarios of real-time concurrent multi-task. And it is often necessary for these methods to obtain user location in real time， which is not conducive to the protection of participant privacy. Concerning the above problems， a distributed multi-task allocation method for user area was proposed， named Crowd-Cluster. Firstly， the global perception task and the user area were clustered by using the greedy heuristic algorithm. Secondly， based on the spatial correlation， the Q-learning algorithm was used to combine the concurrent tasks into the task path. Then， the task path was dynamically priced by constructing user intention model that satisfying the Boltzmann distribution. Finally， based on the historical reputation records， the participants were greedily selected to implement task allocation. Experimental results on the real dataset mobility show that Crowd-Cluster can effectively reduce the total number of participants and the total movement distance of users， and can also reduce the impact of insufficient perception resources on task completion in the low population density scenarios.

Minimizing transmit power sum of full-duplex relay system with simultaneous wireless information and power transmission

Yening ZHOU, Taoshen LI, Min ZENG, Nan XIAO

2020, 40(2): 363-368. DOI: 10.11772/j.issn.1001-9081.2019081477

Asbtract ( )

HTML ( )

PDF (718KB) ( )

Figures and Tables | References | Related Articles | Metrics

Considering the adoption of information and energy simultaneous transmission in wireless networks to improve the performance of wireless relay systems， a bidirectional transmission full-duplex relay system with self-energy recycling based on wireless radio frequency network was proposed by using Simultaneous Wireless Information and Power Transmission （SWIPT） technology. It is a new attempt to apply SWIPT in bidirectional full-duplex relay system. The energy-constrained destination node used the energy harvested from the relay and the loop channel to send feedback information， and the logical structure of the full-duplex relay system and the physical structure of the energy-constrained destination node were given. Then， the system performance was described by using the minimization of the system transmit power sum as the optimization target， the power allocation scheme was used for information decoding and energy harvest， the semi-definite programming， rank relaxation and Lagrange methods were used to transform the original non-convex optimization equation into a solvable convex optimization problem， and the solution of the problem was found. The relay transmission power， transmit beamforming vector and power allocation ratio were jointly optimized. Finally， the proposed system was compared with the traditional bidirectional transmission relay system by experimental simulator. The results verify that the self-energy recycling can not only eliminate self-interference， but also significantly optimize the system transmission power sum， and reveal that the proposed system has higher performance gain than the traditional bidirectional transmission system due to the combination of SWIPT technology and full-duplex relay system.

Apple price prediction method based on distributed neural network

Bin LIU, Jinrong HE, Yuancheng LI, Hong HAN

2020, 40(2): 369-374. DOI: 10.11772/j.issn.1001-9081.2019081454

Asbtract ( )

HTML ( )

PDF (672KB) ( )

Figures and Tables | References | Related Articles | Metrics

Concerning the issue that the traditional price prediction model for agricultural product cannot predict the market price of apple quickly and accurately under the big data scenario， an apple price prediction method based on distributed neural network was proposed. Firstly， the relative factors that affect the market price of apple were studied， and the historical price of apple， historical price of alternatives， household consumption level and oil price were selected as the input of the neural network. Secondly， a distributed neural network prediction model containing price fluctuation law was constructed to implement the short-term prediction for the market price of apple. Experimental results show that the proposed model has a high prediction accuracy， and the average relative error is only 0.50%， which satisfies the requirements of apple market price prediction. It indicates that the distributed neural network model can reveal the price fluctuation law and development trend of apple market price through the characteristic of self-learning. The proposed method not only can provide scientific basis for stabilizing apple market order and macroeconomic regulation of market price， but also can reduce the harms brought by price fluctuations， helping farmers to avoid the market risks.

Detection method for echo hiding based on convolutional neural network framework

Jie WANG, Rangding WANG, Diqun YAN, Yuzhen LIN

2020, 40(2): 375-380. DOI: 10.11772/j.issn.1001-9081.2019081400

Asbtract ( )

HTML ( )

PDF (713KB) ( )

Figures and Tables | References | Related Articles | Metrics

Echo hiding is a steganographic technique with audio as carrier. Currently， the steganalysis methods for echo hiding mainly use the cepstral coefficients as handcrafted-features to realize classification. However， when the echo amplitude is low， the detection performance of these traditional methods is not high. Aiming at the low echo amplitude condition， a steganalysis method for echo hiding based on Convolutional Neural Network （CNN） was proposed. Firstly， Short-Time Fourier Transform （STFT） was used to extract the amplitude spectrum coefficient matrix as the shallow feature. Secondly， the deep feature was extracted by the designed CNN framework from the shallow feature. The network framework consisted of four convolutional blocks and three fully connected layers. Finally， the classification results were output by Softmax. The proposed method was steganographically evaluated on three classic echo hiding algorithms. Experimental results indicate that the detection rates of the proposed method under low echo amplitude are 98.62%， 98.53% and 93.20% respectively. Compared with the existing traditional handcrafted-features based methods and deep learning based methods， the proposed method has the detection performance improved by more than 10%.

Hyperspectral band selection based on deep adversarial subspace clustering

Meng ZENG, Bin NING, Zhihua CAI, Qiong GU

2020, 40(2): 381-385. DOI: 10.11772/j.issn.1001-9081.2019081385

Asbtract ( )

HTML ( )

PDF (714KB) ( )

Figures and Tables | References | Related Articles | Metrics

HyperSpectral Image （HSI） consists of hundreds of bands with strong intra-band correlations between bands and high redundancy， resulting in dimensional disaster and increased classification complexity. Therefore， a Deep Adversarial Subspace Clustering （DASC） method was used for hyperspectral band selection， and Laplacian regularization was introduced to make the network performance more robust， which reduces the classification complexity under the premise of ensuring classification accuracy. A self-expressive layer was introduced between the encoder and the decoder to imitate the “self-expression” attribute of traditional subspace clustering， making full use of the spectral information and nonlinear feature transformation to obtain the relationships between the bands， and solving the problem that traditional band selection methods cannot consider spectral-spatial information simultaneously. At the same time， adversarial learning was introduced to supervise the sample representation of the auto-encoder and subspace clustering， so that the subspace clustering has better self-expression performance. In order to make the network performance more robust， Laplacian regularization was added to consider the manifold structure reflecting geometric information. Experimental results on two public hyperspectral datasets show that compared with several mainstream band selection methods， DASC method has higher accuracy， and the selected band subset of the method can satisfy application requirements.

IB-LBM parallel optimization method mixed with multiple task scheduling modes

Zhixiang LIU, Huichao LIU, Dongmei HUANG, Liping ZHOU, Cheng SU

2020, 40(2): 386-391. DOI: 10.11772/j.issn.1001-9081.2019081401

Asbtract ( )

HTML ( )

PDF (941KB) ( )

Figures and Tables | References | Related Articles | Metrics

When using Immersed Boundary-Lattice Boltzmann Method （IB-LBM） to solve the flow field， in order to obtain more accurate results， a larger and denser flow field grid is often required， which results in a long time of simulation process. In order to improve the efficiency of the simulation， according to the characteristics of IB-LBM local calculation， combined with three different task scheduling methods in OpenMP， a parallel optimization method of IB-LBM was proposed. In the parallel optimization， three task scheduling modes were mixed to solve the load imbalance problem caused by single task scheduling. The structural decomposition was performed on IB-LBM， and the optimal scheduling mode of each structure part was tested. Based on the experimental results， the optimal scheduling combination mode was selected. At the same time， it could be concluded that the optimal combination is different under different thread counts. The optimization results were verified by speedup， and it could be concluded that when the number of threads is small， the speedup approaches the ideal state； when the number of threads is large， although the additional time consumption of developing and destroying threads affects the optimization of performance， the parallel performance of the model is still greatly improved. The flow field simulation results show that the accuracy of IB-LBM simulation of fluid-solid coupling problems is not affected after parallel optimization.

Design and implementation of parallel genetic algorithm for cutting stock of circular parts

Zhiyang ZENG, Yan CHEN, Ke WANG

2020, 40(2): 392-397. DOI: 10.11772/j.issn.1001-9081.2019081397

Asbtract ( )

HTML ( )

PDF (658KB) ( )

Figures and Tables | References | Related Articles | Metrics

For the cutting stock problem of circular parts which is widely existed in many manufacturing industries， a new parallel genetic algorithm for cutting stock was proposed to maximize the material utilization within a reasonable computing time， namely Parallel Genetic Blanking Algorithm （PGBA）. In PGBA， the material utilization rate of cutting plan was used as the optimization objective function， and the multithread was used to perform the genetic manipulation on multiple subpopulations in parallel. Firstly， a specific individual coding method was designed based on the parallel genetic algorithm， and a heuristic method was used to generate the individuals of population to improve the search ability and efficiency of the algorithm and avoid the premature phenomena. Then， an approximate optimal cutting plan was searched out by adaptive genetic operations with better performance. Finally， the effectiveness of the algorithm was verified by various experiments. The results show that compared with the heuristic algorithm proposed in literature， PGBA takes longer computing time， but has the material utilization rate greatly improved， which can effectively improve the economic benefits of enterprises.

Optimization and parallelization of Graphlet Degree Vector method

Xiangshuai SONG, Fuzhang YANG, Jiang XIE, Wu ZHANG

2020, 40(2): 398-403. DOI: 10.11772/j.issn.1001-9081.2019081387

Asbtract ( )

HTML ( )

PDF (742KB) ( )

Figures and Tables | References | Related Articles | Metrics

Graphlet Degree Vector （GDV） is an important method for studying biological networks， and can reveal the correlation between nodes in biological networks and their local network structures. However， with the increasing number of automorphic orbits that need to be researched and the expanding biological network scale， the time complexity of the GDV method will increase exponentially. To resolve this problem， based on the existing serial GDV method， the parallelization of GDV method based on Message Passing Interface （MPI） was realized. Besides， the GDV method was improved and the parallel optimization of the optimized method was realized. The calculation process was optimized to solve the problem of double counting when searching for automorphic orbits of different nodes by the improved method， at the same time， the tasks were allocated reasonably combining with the load balancing strategy. Experimental results of simulated network data and real biological network data indicate that parallel GDV method and the improved parallel GDV method both obtain better parallel performance， they can be widely applied to different types of networks with different scales， and have good scalability. As a result， they can effectively maintain the high efficiency of searching for automorphic orbits in the network.

Parallel computing of bifurcation stenosis flows of carotid artery based on lattice Boltzmann method and large eddy simulation model

Yizhuo ZHANG, Sen GE, Liangjun WANG, Jiang XIE, Jie CAO, Wu ZHANG

2020, 40(2): 404-409. DOI: 10.11772/j.issn.1001-9081.2019081388

Asbtract ( )

HTML ( )

PDF (1296KB) ( )

Figures and Tables | References | Related Articles | Metrics

The formation of carotid artery plaque is closely related to complex hemodynamic factors. The accurate simulation of complex flow conditions is of great significance for the clinical diagnosis of carotid artery plaque. In order to simulate the pulsating flow accurately， Large Eddy Simulation （LES） model was combined with Lattice Boltzmann Method （LBM） to construct a LBM-LES carotid artery simulation algorithm， and a real geometric model of carotid artery stenosis was established through medical image reconstruction software， thus the high-resolution numerical simulation of carotid artery stenosis flows was conducted. By calculating blood flow velocity and Wall Shear Stress （WSS）， some meaningful flow results were obtained， proving the effectiveness of LBM-LES in the study of blood flow in the carotid artery narrow posterior. Based on the OpenMP programming environment， the parallel computation of the grid of ten million magnitude was carried out on the fully interconnected fat node of high-performance cluster machine. The results show that the LBM-LES carotid artery simulation algorithm has good parallel performance.

Accessing optimization with multiple indexes in TiDB

Hai LAN, Ke HAN, Li SHEN, Qiu CUI, Yuwei PENG

2020, 40(2): 410-415. DOI: 10.11772/j.issn.1001-9081.2019081908

Asbtract ( )

HTML ( )

PDF (613KB) ( )

Figures and Tables | References | Related Articles | Metrics

When the query condition involves multiple indexed attributes， TiDB cannot effectively use multiple indexes to generate a more efficient execution plan. After studying the existing solutions of databases， such as PostgreSQL and MySQL， a new type of data access path using multiple indexes simultaneously was proposed in TiDB to solve the problem， namely MultiIndexPath. Firstly， a possible access path named MultiIndexPath for a certain query was generated and its physical plan named MultiIIndexPlan was created， then the cost of this plan was calculated. Secondly， the general execution framework of MultiIndexPath was proposed after combining the architecture and implementation of TiDB. Finally， the Pipeline execution plan was proposed with the condition of conjunctive normal form. The whole work was implemented based on TiDB 3.0 and several experiments were conducted. Experimental results show that the performance of the proposed scheme is improved by at least one order of magnitude compared with the original TiDB when the condition is the disjunctive normal form. With the condition of conjunctive normal form， the performance of the scheme is also better than that of the original TiDB.

Complexity analysis of functional query answering on big data

Wenli WU, Guohua LIU, Junbao ZHANG

2020, 40(2): 416-419. DOI: 10.11772/j.issn.1001-9081.2019091618

Asbtract ( )

HTML ( )

PDF (436KB) ( )

Figures and Tables | References | Related Articles | Metrics

Functional query is an important operation in big data application， and the problem of query answering has always been the core problem in database theory. In order to analyze the complexity of the functional query answering problem on big data， firstly， the functional query language was reduced to a known decidable language by using mapping reduction method， which proves the computability of the functional query answering problem. Secondly， first-order language was used to describe the functional query， and the plexity of the first-order language was analyzed. On this basis， the NC-factor reduction method was used to reduce the functional query class to the known $Π Τ Q$ -complete class. It is proved that functional query answering problem can be solved in NC time after PTIME （Polynomial TIME） preprocessing. It can be conducted that the functional query answering problem can be handled on big data.

Query execution plan selection under concurrent query

Zefeng PEI, Baoning NIU, Jinwen ZHANG, Muhammad AMJAD

2020, 40(2): 420-425. DOI: 10.11772/j.issn.1001-9081.2019101762

Asbtract ( )

HTML ( )

PDF (477KB) ( )

Figures and Tables | References | Related Articles | Metrics

Query is the main workload of a database system， and its efficiency determines the performance of the database system. There are multiple execution plans for a query， and the existing query optimizers can only statically select a better execution plan for a query according to the configuration parameters of the database system. There are complex and variable resource contentions between concurrent queries， and such contentions are difficult to be reflected accurately through configuration parameters； besides， the efficiency of the same execution plan is not consistent in different scenarios. The selection of the execution plans for concurrent queries needs to consider the influence between queries — query interaction. Based on the above， a metric for measuring the influence of query interaction on the query under concurrent query called QIs （Query Interactions） was proposed. For the selection of query execution plan under concurrent query， a method called TRating （Time Rating） was proposed to dynamically select the execution plan for the query. In the method， the influence of query interaction on the queries executed with different plans in the query combination was measured， and the plan with small influence of query interaction was selected as the better execution plan for the query. Experimental results show that TRating can select a better execution plan for the query with an accuracy of 61%， which is 25% higher than that of the query optimizer； and the accuracy of the proposed method is as high as 69% when selecting suboptimal execution plan for the query.

Optimized algorithm for k-step reachability queries on directed acyclic graphs

Ming DU, Anping YANG, Junfeng ZHOU, Ziyang CHEN, Yun YANG

2020, 40(2): 426-433. DOI: 10.11772/j.issn.1001-9081.2019081605

Asbtract ( )

HTML ( )

PDF (654KB) ( )

Figures and Tables | References | Related Articles | Metrics

The k-step reachability query is used to answer whether there exists a path between two nodes with length no longer than k in a Directed Acyclic Graph （DAG）. Concerning the problems of large index size and low query processing efficiency of existing approaches， a bi-directional shortest path index based on partial nodes was proposed to improve the coverage of reachable queries， and a set of optimization rules was proposed to reduce the index size. Then， a bi-directional reversed topological index was proposed to accelerate the unreachable queries answering based on the simplified graph. Finally， the farthest-node-first-visiting bi-traversal strategy was proposed to improve the efficiency of query processing. Experimental results on 21 real datasets， such as citation networks and social networks， show that compared with existing efficient approaches including PLL （Pruned Landmark Labeling） and BFSI-B （Breadth First Search Index-Bilateral）， the proposed algorithm has smaller index size and higher query response speed.

Clustering-based hyperlink prediction

Pengfei QI, Lihua ZHOU, Guowang DU, Hao HUANG, Tong HUANG

2020, 40(2): 434-440. DOI: 10.11772/j.issn.1001-9081.2019101730

Asbtract ( )

HTML ( )

PDF (2588KB) ( )

Figures and Tables | References | Related Articles | Metrics

Hyperlink prediction aims to utilize inherent properties of observed network to reproduce the missing links in the network. Existing hyperlink prediction algorithms often make predictions based on entire network， and some link types with insufficient training samples data may be missed， resulting in imcomplete link types to be detected. To address this problem， a clustering-based hyperlink prediction algorithm named C-CMM was proposed. Firstly， the dataset was divided into clusters， and then the model was constructed for each cluster to perform hyperlink prediction. The proposed algorithm can make full use of the information contained in the observation samples of each cluster， and widen the coverage range of the prediction results. Experimental results on three real-world datasets show that the proposed algorithm outperforms a great number of state-of-the-art link prediction algorithms in prediction accuracy and efficiency， and has the prediction coverage more comprehensive.

Representation learning for topic-attention network

Jingfeng GUO, Hui DONG, Tingwei ZHANG, Xiao CHEN

2020, 40(2): 441-447. DOI: 10.11772/j.issn.1001-9081.2019081529

Asbtract ( )

HTML ( )

PDF (955KB) ( )

Figures and Tables | References | Related Articles | Metrics

Concerning the problem that heterogeneous network representation learning only considers social relations in structure and ignores semantics， combining the social relationship between users and the preference of users for topics， a representation learning algorithm based on topic-attention network was proposed. Firstly， according to the characteristics of the topic-attention network and combining with the idea of the identical-discrepancy-contrary （determination and uncertainty） of set pair analysis theory， the transition probability model was given. Then， a random walk algorithm based on two types of nodes was proposed by using the transition probability model， so as to obtain the relatively high-quality random walk sequence. Finally， the embedding vector space representation of the topic-attention network was obtained by modeling based on two types of nodes in the sequences. Theoretical analysis and experimental results on the Douban dataset show that the random walk algorithm combined with the transition probability model is more comprehensive in analyzing the connection relationship between nodes in the network. The modularity of the proposed algorithm is 0.699 8 when the number of the communities is 13， which is nearly 5% higher than that of metapath2vec algorithm， and can capture more detailed information in the network.

Activity recommendation method based on directed label graph and user feedback in event-based social network

Xiaohuan SHAN, Zhiguo ZHANG, Baoyan SONG, Chenglin REN

2020, 40(2): 448-453. DOI: 10.11772/j.issn.1001-9081.2019081565

Asbtract ( )

HTML ( )

PDF (859KB) ( )

Figures and Tables | References | Related Articles | Metrics

Due to the timeliness of activities in Event-Based Social Network （EBSN）， the traditional social network recommendation algorithms cannot be applied to EBSN. In addition， most of the traditional recommendation algorithms ignore the feedback that can affect whether the previous users accept the recommendation， which influences subsequent recommendation quality. Therefore， an activity recommendation method based on directed label graph and user feedback in EBSN was proposed. Firstly， EBSN was abstracted into a directed label graph， and a Directed Graph Structure Feature （DGSF） index was construction by extracting the property feature information of nodes and edges to filter nodes and edges for the first time. DGSF index consists of node property feature index， directed edge property feature index and time feature index. Secondly， a multi-attribute candidate set filtering strategy based on DGSF index was proposed. By using the limits of time， in-degrees and out-degrees of nodes， and label types， the further pruning of the candidate sets was realized to avoid redundant computation. Thirdly， an improved UCB （Upper Confidence Bound） activity recommendation algorithm with user feedback was put forward， namely EN_UCB （Elastic Net UCB）. In EN_UCB， with the introduction of the elastic net regression， the interest values of the user to the activities were calculated according to many influencing factors， and the activities with high interest values were recommended to the user. At the same time， the feedback whether the user accepted the activities was received to optimize the subsequent user recommendation. Experimental results show that EN_UCB has the accept rate higher than TS （Thompson Sampling）， UCB and eGreedy， the regret rate far lower than TS and eGreedy， the running time superior to TS， UCB and eGreedy， and the larger the number of activities， the more obvious the advantages. The proposed method implements online activity recommendation in EBSN effectively.

Recommendation method with focus on long tail items

Jing QIN, Qingbo ZHANG, Bin WANG

2020, 40(2): 454-458. DOI: 10.11772/j.issn.1001-9081.2019091665

Asbtract ( )

HTML ( )

PDF (799KB) ( )

Figures and Tables | References | Related Articles | Metrics

To solve the long tail problem caused by low coverage and diversity in recommendation system， a recommendation framework for long tail items and a recommendation algorithm named FLTI （Focusing on Long Tail Item） were proposed. The recommendation framework for long tail items was built based on Convolutional Neural Network （CNN） model with three layers， including data processing layer， recommendation algorithm layer and recommendation list generation layer. The FLTI algorithm was added to the recommendation algorithm layer of the framework， and calculated the frequent recommendation items and the infrequent recommendation items at first， then replaced the frequent recommendation items by the long-tail items to meet the specified proportion of long-tail items in the system. Experimental results on Movielens 1M and BookCrossing datasets show that compared to traditional User-Based Collaborative Filtering （UserCF） algorithm， Item-based Collaborative Filtering （ItemCF） algorithm， Singular Value Decomposition （SVD） recommendation algorithm and Colloborative Denosing Auto-Encoder （CDAE） algorithm， the coverage of FLTI algorithm is improved up to 51%， and the diversity of FLTI algorithm is improved up to 59% .

Service discovery method for Internet of Things based on Biterm topic model

Shuman WANG, Aiping LI, Liguo DUAN, Jia FU, Yongle CHEN

2020, 40(2): 459-464. DOI: 10.11772/j.issn.1001-9081.2019091662

Asbtract ( )

HTML ( )

PDF (1058KB) ( )

Figures and Tables | References | Related Articles | Metrics

Service description texts for Internet of Things （IoT） are short in length and sparse in text features， and direct modeling the IoT service by using traditional topic model has poor clustering effect， so that the best service cannot be discovered. To solve this problem， an IoT service discovery method based on Biterm Topic Model （BTM） was proposed. Firstly， BTM was employed to mine the latent topic of the existing IoT services， and the service document-topic probability distribution was calculated and deduced through global topic distribution and theme-word distribution. Then， K-means algorithm was used to cluster the services and return the best matching results of service requests. Experimental results show that the proposed method can improve the clustering effect of services for IoT and thus obtain the matched best service. Compared with the methods of HDP （Hierarchical Dirichlet Process） and LDA-K （Latent Dirichlet Allocation based on K-means）， the proposed method achieves better performance in terms of Precision and Normalized Discounted Cumulative Gain （NDCG） for best service discovery.

Dominant feature mining of spatial sub-prevalent co-location patterns

Dong MA, Hongmei CHEN, Lizhen WANG, Qing XIAO

2020, 40(2): 465-472. DOI: 10.11772/j.issn.1001-9081.2019081900

Asbtract ( )

HTML ( )

PDF (1839KB) ( )

Figures and Tables | References | Related Articles | Metrics

The spatial co-location pattern is a subset of spatial features whose instances frequently appear together in the neighborhoods. Co-location pattern mining methods usually assume that spatial instances are independent to each other， adopt a participation rate， which is the frequency of spatial instances participating in pattern instances， to measure the importance of spatial features in the co-location pattern， and adopt a participation index， which is the minimal participation rate of spatial features， to measure the interest of patterns. These methods neglect some important relationships between spatial features. Therefore， the co-location pattern with dominant feature was proposed to reveal the dominant relationship between spatial features. The existing method for mining co-location pattern with dominant feature is based on the traditional co-location pattern mining and its clique instance model. However， the clique instance model may neglect the non-clique dominant relationship between spatial features. Motivated by the above， the dominant feature mining of spatial sub-prevalent co-location patterns was studied based on the star instance model to better reveal the dominant relationship between spatial features and mine more valuable co-location patterns with dominant feature. Firstly， two metrics to measure feature’s dominance were defined. Secondly， an effective algorithm for mining co-location pattern with dominant feature was designed. Finally， the experimental results on both synthetic and real datasets show that the proposed mining algorithm is efficient and the co-location pattern with dominant feature is pratical.

Target-dependent method for authorship attribution

Yang LI, Wei ZHANG, Chen PENG

2020, 40(2): 473-478. DOI: 10.11772/j.issn.1001-9081.2019101768

Asbtract ( )

HTML ( )

PDF (650KB) ( )

Figures and Tables | References | Related Articles | Metrics

Authorship attribution is the task of deciding who is the author of a particular document， however， the traditional methods for authorship attribution are target-independent without considering any constraint during the prediction of authorship， which is inconsistent with the actual problems. To address the above issue， a Target-Dependent method for Authorship Attribution （TDAA） was proposed. Firstly， the product ID corresponding to the user review was chosen to be the constraint information. Secondly， Bidirectional Encoder Representation from Transformer （BERT） was used to extract the pre-trained review text feature to make the text modeling process more universal. Thirdly， the Convolutional Neural Network （CNN） was used to extract the deep features of the text. Finally， two fusion methods were proposed to fuse the two different information. Experimental results on Amazon Movie_and_TV dataset and CDs_and_Vinyl_5 dataset show that the proposed method can increase the accuracy by 4%-5% compared with the comparison methods.

Preventing location disclosure attacks through generating dummy trajectories

Xiangyu LIU, Jinmei CHEN, Xiufeng XIA, Manish Singh, Chuanyu ZONG, Rui ZHU

2020, 40(2): 479-485. DOI: 10.11772/j.issn.1001-9081.2019081612

Asbtract ( )

HTML ( )

PDF (836KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to solve the problem of trajectory privacy leakage caused by the collection of numerous trajectory information of moving objects， a dummy trajectory-based trajectory privacy protection algorithm was proposed. In this algorithm， considering the user’s locations under disclosure， a heuristic rule was designed based on the comprehensive measure of trajectory similarity and location diversity to select the dummy trajectories， so that the generated dummy trajectories were able to effectively hide the real trajectory and sensitive locations. Besides， the trajectory directed graph strategy and the grid-based map strategy were proposed to optimize the execution efficiency of the algorithm. Experimental results on real trajectory datasets demonstrate that the proposed algorithm can effectively protect the real trajectory with high data utility.

Image reconstruction based on gradient projection for sparse representation and complex wavelet

Yanyan GAO, Li LI, Jing ZHANG, Yingqian JIA

2020, 40(2): 486-490. DOI: 10.11772/j.issn.1001-9081.2019101719

Asbtract ( )

HTML ( )

PDF (680KB) ( )

Figures and Tables | References | Related Articles | Metrics

Compressed sensing mainly contains random projection and reconstruction. Because of lower convergence speed of iterative shrinkage algorithm and the lacking of direction of traditional 2-dimensional wavelet transform， random projection was implemented by using Permute Discrete Cosine Transform （PDCT）， and the gradient projection was used for reconstruction. Based on the simplification of computation complexity， the transformation coefficients in the dual-tree complex wavelet domain were improved by iteration. Finally， the reconstructed image was obtained by the inverse transform. In the experiments， the reconstruction results of DT CWT （Dual-Tree Complex Wavelet Transform） and bi-orthogonal wavelet were compared with the same reconstruction algorithm， and the former is better than the latter in image detail and smoothness with higher Peak Signal-to-Noise Ratio （PSNR） of 1.5 dB. In the same sparse domain， gradient projection converges faster than iterative shrinkage algorithm. And in the same sparse domain and random projection， PDCT has a slightly higher PSNR than the structural random matrix.

Personalized privacy protection method for data with multiple numerical sensitive attributes

Meishu ZHANG, Yabin XU

2020, 40(2): 491-496. DOI: 10.11772/j.issn.1001-9081.2019091639

Asbtract ( )

HTML ( )

PDF (588KB) ( )

Figures and Tables | References | Related Articles | Metrics

The existing privacy protection methods for data with multiple numerical sensitive attributes not only have the problem of large loss of information about quasi-identifier attributes， but also have the problem that they cannot satisfy the user’s personalized need for ranking the importance of numerically sensitive attributes. To solve the above problems， a personalized privacy protection method based on clustering and weighted Multi-Sensitive Bucketization （MSB） was proposed. Firstly， according to the similarity of quasi-identifiers， the dataset was divided into several subsets with similar values of quasi-identifier attributes. Then， considering the different sensitivities of users to sensitive attributes， the sensitivity and the bucket capacity of multi-dimensional buckets were used to calculate the weighted selectivity and to construct the weighted multi-dimensional buckets. Finally， the data were grouped and anonymized according to all above. Eight attributes in UCI’s standard Adult dataset were selected for experiments， and the proposed method was compared with MNSACM and WMNSAPM. Experimental results show that the proposed method is better generally and is significantly superior to the comparison methods in reducing information loss and running time， which improves the data quality and operating efficiency.

Multi-user sharing ORAM scheme based on attribute encryption

Wei FU, Chenyang GU, Qiang GAO

2020, 40(2): 497-502. DOI: 10.11772/j.issn.1001-9081.2019091634

Asbtract ( )

HTML ( )

PDF (550KB) ( )

Figures and Tables | References | Related Articles | Metrics

Oblivious Random Access Machine （ORAM） is one of the key technologies to protect the privacy security of the user access behaviors. However， existing ORAM schemes mainly focus on the single-user access requirements and cannot support data sharing between multiple users. Combined with Ring ORAM scheme and Attribute Based Encryption （ABE） technology， a multi-user sharing ORAM scheme was designed and implemented based on attribute encryption， namely ABE-M-ORAM. Attribute encryption was adopted to achieve the fine-grained access control， which can not only protect user access behavior security， but also realize the convenient data sharing between different users. Theoretical analysis and simulation experiments verify the high security， practicability and good access performance of the proposed scheme.

Multi-label feature selection algorithm based on conditional mutual information of expert feature

Yusheng CHENG, Fan SONG, Yibin WANG, Kun QIAN

2020, 40(2): 503-509. DOI: 10.11772/j.issn.1001-9081.2019091626

Asbtract ( )

HTML ( )

PDF (818KB) ( )

Figures and Tables | References | Related Articles | Metrics

Feature selection plays an important role in the classification accuracy and generalization performance of classifiers. The existing multi-label feature selection algorithms mainly use the maximum relevance and minimum redundancy criterion to perform feature selection in all feature sets without considering expert features， therefore， the multi-label feature selection algorithm has the disadvantages of long running time and high complexity. Actually， in real life， experts can directly determine the overall prediction direction based on a few or several key features. Paying attention to and extracting this information will inevitably reduce the calculation time of feature selection and even improve the performance of classifier. Based on this， a multi-label feature selection algorithm based on conditional mutual information of expert feature was proposed. Firstly， the expert features were combined with the remaining features， and then the conditional mutual information was used to obtain a feature sequence of strong to weak relativity with the label set. Finally， the subspaces were divided to remove the redundant features. The experimental comparison was performed to the proposed algorithm on 7 multi-label datasets. Experimental results show that the proposed algorithm has certain advantages over the other feature selection algorithms， and the statistical hypothesis testing and the stability analysis further illustrate the effectiveness and the rationality of the proposed algorithm.

Strategy with low redundant computation for reachability query preserving graph compression

Danfeng ZHAO, Junchen LIN, Wei SONG, Jian WANG, Dongmei HUANG

2020, 40(2): 510-517. DOI: 10.11772/j.issn.1001-9081.2019091666

Asbtract ( )

HTML ( )

PDF (634KB) ( )

Figures and Tables | References | Related Articles | Metrics

Since some computation in reachability Query Preserving Graph Compression （QPGC） algorithm are redundant， a high-performance compression strategy was proposed. In the stage of solving the vertex sets of ancestors and descendants， an algorithm named TSB （Topological Sorting Based algorithm for solving ancestor and descendant sets） was proposed for common graph data. Firstly， the vertices of the graph data were topological sorted. Then， the vertex sets were solved in the order or backward order of the topological sequence， avoiding the redundant computation caused by the ambiguous solution order. And an algorithm based on graph aggregation operation was proposed for graph data with short longest path， namely AGGB （AGGregation Based algorithm for solving ancestor and descendant sets）， so the vertex sets were able to be solved in a certain number of aggregation operations. In the stage of solving reachability equivalence class， a Piecewise Statistical Pruning （PSP） algorithm was proposed. Firstly， piecewise statistics of ancestors and descendants sets were obtained and then the statistics were compared to achieve the coarse matching， and some unnecessary fine matches were pruned off. Experimental results show that compared with QPGC algorithm： in the stage of solving the vertex sets of ancestors and descendants， TSB and AGGB algorithm have the performance averagely increased by 94.22% and 90.00% respectively on different datasets； and in the stage of solving the reachability equivalence class， PSP algorithm has the performance increased by more than 70% on most datasets. With the increasing of the dataset， using TSB and AGGB cooperated with PSP has the performance improved by nearly 28 times. Theoretical analysis and simulation results show that the proposed strategy has less redundant computation and faster compression speed compared to QPGC.

Distributed rough set attribute reduction algorithm under Spark

Xiajie ZHANG, Jinghua ZHU, Yang CHEN

2020, 40(2): 518-523. DOI: 10.11772/j.issn.1001-9081.2019091642

Asbtract ( )

HTML ( )

PDF (560KB) ( )

Figures and Tables | References | Related Articles | Metrics

Attribute reduction （feature selection） is an important part of data preprocessing. Most of attribute reduction methods use attribute dependence as the criterion for filtering attribute subsets. A Fast Dependence Calculation （FDC） method was designed to calculate the dependence by directly searching for the objects based on relative positive domains. It is not necessary to find the relative positive domain in advance， so that the method has a significant performance improvement in speed compared with the traditional methods. In addition， the Whale Optimization Algorithm （WOA） was improved to make the calculation method effective for rough set attribute reduction. Combining the above two methods， a distributed rough set attribute reduction algorithm based on Spark named SP-WOFRST was proposed， which was compared with a Spark-based rough set attribute reduction algorithm named SP-RST on two synthetical large data sets. Experimental results show that the proposed SP-WOFRST algorithm is superior to SP-RST in accuracy and speed.

Analysis of international influence of news media for major social security emergencies

Chen CHEN, Shaowu ZHANG, Liang YANG, Dongyu ZHANG, Hongfei LIN

2020, 40(2): 524-529. DOI: 10.11772/j.issn.1001-9081.2019091629

Asbtract ( )

HTML ( )

PDF (1388KB) ( )

Figures and Tables | References | Related Articles | Metrics

Public opinions on major social security emergencies in the era of big data are mainly spread through the media. Most of the existing researches fail to consider the special group — news media and the influence of news media in a certain kind of specific events. In order to study the above problems， a method to evaluate the influence by integrating the network structure and behavioral relationship between users was proposed， and the Xinjiang and Paris violent and terrorist events were taken as examples to calculate the international influence of news media of different countries on such events on the Twitter platform. This evaluation method can better obtain the influence of various news media at the event level. By calculating the influence of news media in the violent and terrorist events in Xinjiang and Paris， the experimental results show that there are differences in the influence of news media of different countries in Xinjiang and Paris violent and terrorist events， which indicates that these two events of the same type have different influence scopes， and also reflects the differences of political positions of different countries.

Item-based unified recommendation model

Kai DENG, Jiajin HUNAG, Jin QIN

2020, 40(2): 530-534. DOI: 10.11772/j.issn.1001-9081.2019101791

Asbtract ( )

HTML ( )

PDF (565KB) ( )

Figures and Tables | References | Related Articles | Metrics

The modeling of user-item interaction patterns is an important task for personalized recommendation. Many recommendation systems are based on the assumption that there is a linear relationship between users and items， and ignore the complexity and non-linearity of interaction between real and historical items， as a result， these systems cannot capture the complex decision-making process of users. Therefore， a more expressive top-N recommendation system’s item similarity factor model solution was combined with the multi-layer perceptron approach， to effectively model the higher-order relationships between items and capture more complex user decisions. The combination effect was verified on the three datasets of MovieLens， Foursquare and ratings_Digital_Music； and compared with the benchmark methods such as MLP （Multi-Layer Perception）， Factored Item Similarity Model （FISM）， DeepICF （Deep Item-based Collaborative Filtering） and ItemKNN （Item-based K-Nearest Neighbors）， the results demonstrate that the proposed method has significant improvement in recommendation performance.

Alarm text named entity recognition based on BERT

Yue WANG, Mengxuan WANG, Sheng ZHANG, Wen DU

2020, 40(2): 535-540. DOI: 10.11772/j.issn.1001-9081.2019101717

Asbtract ( )

HTML ( )

PDF (642KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem that the key entity information in the police field is difficult to recognize， a neural network model based on BERT （Bidirectional Encoder Representations from Transformers）， namely BERT-BiLSTM-Attention-CRF， was proposed to recognize and extract related named entities， in the meantime， the corresponding entity annotation specifications were designed for different cases. In the model ，the BERT pre-trained word vectors were used to replace the word vectors trained by the traditional methods such as Skip-gram and Continuous Bag of Words （CBOW）， improving the representation ability of the word vector and solving the problem of word boundary division in Chinese corpus trained by the character vectors. And the attention mechanism was used to improve the architecture of classical Named Entity Recognition （NER） model BiLSTM-CRF. BERT-BiLSTM-Attention-CRF model has an accuracy of 91% on the test set， which is 7% higher than that of CRF++ Baseline， and 4% higher than that of BiLSTM-CRF model. The F1 values of the entities are all higher than 0.87.

Fast file access system for NVM storage systems

Qingjian HE, Tao CAI, Jie WANG, Dejiao NIU

2020, 40(2): 541-546. DOI: 10.11772/j.issn.1001-9081.2019091655

Asbtract ( )

HTML ( )

PDF (602KB) ( )

Figures and Tables | References | Related Articles | Metrics

NVM （Non-Volatile Memory） storage system has the potential to provide high throughput， including near-memory read and write speeds， byte addressing features， and support for multi-way forwarding. However， the existing system software stack is not designed for NVM， which makes the system software stack have many factors that affect system access performance. Through analysis， it is found that the lock mechanism of the file system has a large overhead， which makes the concurrent access of data to be a difficult problem under multi-core environment. In order to alleviate these problems， a lock-free file reading and writing mechanism as well as a byte-based read and write interface were designed. By eliminating the file-based lock mechanism， the coarse-grained access control was changed， and the self-management request was used to improve the concurrency of the process. When designing the new file access interface that can utilize byte addressing， the read-write asymmetry as well as the different characteristics of the read and write operations of the NVM storage device were considered. These designs reduce the overhead of the software stack and make the full use of NVM features to provide a high concurrent， high throughput， and durable storage system. Finally， based on the open source NVM simulator PMEM， the FPMRW prototype system was implemented， and the universal test tool Filebench was used to test and analyze the FPRRW. The results show that the FPMRW can improve the system throughput by 3%-40% compared with EXT+PMEM and XFS+PMEM.

Design and implementation of cloud native massive data storage system based on Kubernetes

Fuxin LIU, Jingwei LI, Yihong WANG, Lin LI

2020, 40(2): 547-552. DOI: 10.11772/j.issn.1001-9081.2019101732

Asbtract ( )

HTML ( )

PDF (560KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the sharp increasing of data on the cloud caused by the development and popularization of cloud native technology as well as the bottlenecks of the technology in performance and stability， a Haystack-based storage system was proposed. With the optimization in service discovery， automatic fault tolerance and caching mechanism， the system is more suitable for cloud native business and meets the growing and high-frequent file storage and read/write requirements of the data acquisition， storage and analysis industries. The object storage model used by the system satisfies the massive file storage with high-frequency reads and writes. A simple and unified application interface is provided for business using the storage system， a file caching strategy is applied to improve the resource utilization， and the rich automated tool chain of Kubernetes is adopted to make this storage system easier to deploy， easier to expand， and more stable than other storage systems. Experimental results indicate that the proposed storage system has a certain performance and stability improvement compared with the current mainstream object storage and file systems in the situation of large-scale fragmented data storage with more reads than writes.

Traffic image semantic retrieval method based on specific object self-recognition

Yi ZHAO, Xing DUAN, Shiyi XIE, Chunlin LIANG

2020, 40(2): 553-560. DOI: 10.11772/j.issn.1001-9081.2019101795

Asbtract ( )

HTML ( )

PDF (1320KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to retrieve images of traffic violations from a large number of road traffic images， a semantic retrieval method based on specific object self-recognition was proposed. Firstly， road traffic domain ontology as well as road traffic rule description were established by experts in traffic domain. Secondly， traffic image features were extracted by Convolutional Neural Network （CNN）， and combined with the strategy for classifying image features which is based on the proposed improved Support Vector Machine based Decision Tree （SVM-DT） algorithm， the specific objects and the spatial positional relationship between the objects in the traffic images were automatically recognized and mapped into the association relationship （rule instance） between the corresponding ontology instance and its objects. Finally， the image semantic retrieval result was obtained by reasoning based on ontology instances and rule instances. Experimental results show that the proposed method has higher accuracy， recall and retrieval efficiency compared to keyword and ontology traffic image semantic retrieval methods.

Retrieval method of pulmonary nodule images based on multi-scale convolution feature fusion

Junhua GU, Feng WANG, Yongjun QI, Zheran SUN, Zepei TIAN, Yajuan ZHANG

2020, 40(2): 561-565. DOI: 10.11772/j.issn.1001-9081.2019091641

Asbtract ( )

HTML ( )

PDF (644KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to solve the difficulty of feature extraction and low accuracy of retrieval in pulmonary nodule image retrieval， a deep network model named LMSCRnet was proposed to extract image features. Firstly， the feature fusion method of convolution of filters with different scales was adopted to solve the problem of difficulty in obtaining local features caused by different sizes of pulmonary nodules. Then， the SE-ReSNeXt block was introduced to obtain the semantic features with higher level and reduce network degradation. Finally， the high-level semantic feature representation of pulmonary nodule image was obtained. In order to meet the needs of massive data retrieval tasks in real life， the distance calculation and sorting process were deployed on the Spark distributed platform. The experimental results show that the feature extraction method based on LMSCRnet can better extract the image high-level semantic information， and can achieve 84.48% accuracy on the preprocessed dataset of lung nodules named LIDC， and has the retrieval precision higher than other retrieval methods. At the same time， using Spark distributed platform to complete similarity matching and sorting process enables the retrieval method to meet the requirements of massive data retrieval tasks.

Digital camouflage generation method based on cycle-consistent adversarial network

Xu TENG, Hui ZHANG, Chunming YANG, Xujian ZHAO, Bo LI

2020, 40(2): 566-570. DOI: 10.11772/j.issn.1001-9081.2019091625

Asbtract ( )

HTML ( )

PDF (5080KB) ( )

Figures and Tables | References | Related Articles | Metrics

Traditional methods of generating digital camouflages cannot generate digital camouflages based on the background information in real time. In order to cope with this problem， a digital camouflage generation method based on cycle-consistent adversarial network was proposed. Firstly， the image features were extracted by using densely connected convolutional network， and the learned digital camouflage features were mapped into the background image. Secondly， the color retention loss was added to improve the quality of generated digital camouflages， ensuring that the generated digital camouflages were consistent with the surrounding background colors. Finally， a self-normalized neural network was added to the discriminator to improve the robustness of the model against noise. For the lack of objective evaluation criteria for digital camouflages， the edge detection algorithm and the Structural SIMilarity （SSIM） algorithm were used to evaluate the camouflage effects of the generated digital camouflages. Experimental results show that the SSIM score of the digital camouflage generated by the proposed method on the self-made datasets is reduced by more than 30% compared with the existing algorithms， verifying the effectiveness of the proposed method in the digital camouflage generation task.

Optimization of multidimensional index query mechanism based on HBase

Jiangfeng XU, Yulong TAN

2020, 40(2): 571-577. DOI: 10.11772/j.issn.1001-9081.2019081462

Asbtract ( )

HTML ( )

PDF (1005KB) ( )

Figures and Tables | References | Related Articles | Metrics

The key value store is designed to extract values from very large amounts of data and is highly available， fault-tolerant， and scalable， providing a much needed infrastructure to support Location-Based Service （LBS）. However， complex queries on multidimensional data cannot be processed effectively because the key value store does not provide a way to access multiple properties. For the key value storage， HBase cannot effectively deal with the problem of multidimensional data， a uniform indexing framework named New-grid was proposed. In the improved P-grid coverage network， a group of nodes was organized to provide efficient data distribution， fault tolerance and multi-dimensional data query processing. For indexing purposes， the locality of data storage based on Hilbert space filling curves was used to effectively manage the multidimensional data in the key value store. Simultaneously， HBase underlying storage was used to manage data， and an algorithm of range query and K-Nearest Neighbors （KNN） query were given to eliminate the overhead of maintaining separate index tables. Extensive experiments were conducted on Amazon EC2 using cluster sizes of 4， 8 and 16 normal nodes. Experimental results show that New-grid performance is more optimized than MD-HBase and MapReduce.

Trajectory similarity measurement method based on area division

Yike LYU, Kai XU, Zhenqiang HUANG

2020, 40(2): 578-583. DOI: 10.11772/j.issn.1001-9081.2019071249

Asbtract ( )

HTML ( )

PDF (545KB) ( )

Figures and Tables | References | Related Articles | Metrics

In the era of big data， the application of spatial-temporal trajectory data is increasing and these data contain a large amount of information， and the similarity measurement of the trajectory plays a pivotal role as a key step in the trajectory mining work. However， the traditional trajectory similarity measurement methods have the disadvantages of high time complexity and inaccuracy caused by the determination based on the trajectory points. In order to solve these problems， a Triangle Division （TD） trajectory similarity measurement method with the trajectory area metric as theory was proposed for trajectories without road network structure. By setting up “pointer” to connect the trajectory points between two trajectories to construct the non-overlapping triangle areas， the areas were accumulated and the trajectory similarity was calculated to confirm the similarity between the trajectories based on the thresholds set in different application scenarios. Experimental results show that compared with the traditional trajectory point-based spatial trajectory similarity measurement methods such as Longest Common Subsequence （LCSS） and Fréchet distance metric， the proposed method improves the recognition accuracy， reduces the time complexity by nearly 90%， and can better adapt to the trajectory similarity measurement work with uneven distribution of trajectory points.

Optimization model of hospital emergency resource redundancy configuration under emergencies

Zhiyuan WAN, Qinming LIU, Chunming YE, Wenyi LIU

2020, 40(2): 584-588. DOI: 10.11772/j.issn.1001-9081.2019071235

Asbtract ( )

HTML ( )

PDF (539KB) ( )

Figures and Tables | References | Related Articles | Metrics

Before an emergency occurs， the hospitals need to maintain a certain amount of emergency resource redundancy. Aiming at the problem of configuration optimization of hospital emergency resource redundancy under emergencies， firstly， based on the utility theory， by analyzing the utility performance of the hospital emergency resource redundancy， the emergency resource redundancy was defined and classified， and the utility function conforming to the marginal law was determined. Secondly， the redundancy configuration model of hospital emergency resources with maximal total utility was established， and the upper limit of emergency resource storage and the lower limit of emergency rationality were given as the constraints of the model. Finally， the combination of particle swarm optimization and sequential quadratic programming method was used to solve the model. Through case analysis， four optimization schemes for the emergency resource redundancy of the hospital were obtained， and the demand degree of the hospital emergency level to the hospital emergency resource redundancy was summarized. The research shows that with the emergency resource redundancy configuration optimization model， the emergency rescue of hospitals under emergencies can be carried out well， and the utilization efficiency of hospital emergency resources can be improved.

Speech deception detection algorithm based on denoising autoencoder and long short-term memory network

Hongliang FU, Peizhi LEI

2020, 40(2): 589-594. DOI: 10.11772/j.issn.1001-9081.2019071183

Asbtract ( )

HTML ( )

PDF (670KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to further improve the performance of speech deception detection， a speech deception detection algorithm based on Denoising AutoEncoder （DAE） and Long Short-Term Memory （LSTM） network was proposed. Firstly， a parallel structure of DAE and LSTM was constructed， namely PDL （Parallel connection of DAE and LSTM）. Then， artificial features in the speech were extracted and put into the DAE to obtain more robust features. Simultaneously， the Mel spectrums extracted after adding windows to the speech and framing were input into LSTM frame-by-frame for frame-level depth feature learning. Finally， these two types of features were merged by the fully connected layer and the batch normalization， and the softmax classifier was used for the deception recognition. The experimental results on the CSC （Columbia-SRI-Colorado） corpus and the self-built corpus show that the recognition accuracy of the classification with fusion feature is 65.18% and 68.04% respectively， which is up to 5.56% and 7.22% higher than those of other algorithms， indicating that the proposed algorithm can effectively improve the accuracy of deception recognition.

Student grade prediction method based on knowledge graph and collaborative filtering

Xi CHEN, Guang MEI, Jinjin ZHANG, Weisheng XU

2020, 40(2): 595-601. DOI: 10.11772/j.issn.1001-9081.2019071222

Asbtract ( )

HTML ( )

PDF (714KB) ( )

Figures and Tables | References | Related Articles | Metrics

Focusing on the prediction of student grade in the undergraduate teaching of higher education， a prediction algorithm based on course Knowledge Graph （KG） was proposed. Firstly， a course KG representing course information was constructed. Then， the neighbor-based methods and the KG representation learning-based methods were used to calculate the similarity of the courses on the knowledge level based on the KG， and those knowledge similarities among courses were integrated into the traditional grade prediction framework Collaborative Filtering （CF）. Finally， the performance of the algorithm with fusing KG and the common prediction algorithm in different data sparsities were compared in experiments. Experimental results show that in the data sparse scenario， compared with the traditional CF algorithm， the neighbor-based algorithm has the Root Mean Square Error （RMSE） reduced by about 11% and the Mean Absolute Error （MAE） reduced by about 9%； and compared with the traditional CF algorithm， KG representation learning-based algorithm has the RMSE reduced by about 17.55% and the MAE reduced by about 11.40%. Experimental results indicate that the CF algorithm using KG can significantly reduce the prediction error， which proves that the KG can be used as information supplement in the lack of historical data， thus helping CF to obtain better prediction results.

Design and implementation of intelligent flow field pathfinding algorithm for real-time strategy game

Tian LI, Shumei ZHANG, Junli ZHAO

2020, 40(2): 602-607. DOI: 10.11772/j.issn.1001-9081.2019071158

Asbtract ( )

HTML ( )

PDF (662KB) ( )

Figures and Tables | References | Related Articles | Metrics

To solve the problems of too long time of pathfinding and collision and blocking during movement in real-time strategy games， a combined improved flow field pathfinding algorithm was proposed. Firstly， the red-black tree was used to store data to improve the speed of data access. Secondly， by using the penalty function， the calculation of the integration field cost was simplified through transforming the nonlinear partial differential equation problem into a linear unconstrained problem. Finally， a pre-adjacency node was introduced to generate the flow direction. Compared with the flow field pathfinding algorithm without improvement， the improved algorithm has the path calculation time reduced by 20%， and the average moving time is stable at 20 s. Experimental results show that the improved flow field pathfinding algorithm can effectively shorten the pathfinding time， increase the moving speed of Agents and improve the level of game artificial intelligence in real-time strategy games.

Auxiliary diagnosis method of myocardial infarction based on fusion of statistical features and entropy features

Zhizhong WANG, Longlong QIAN, Chuang HAN, Li SHI

2020, 40(2): 608-615. DOI: 10.11772/j.issn.1001-9081.2019071172

Asbtract ( )

HTML ( )

PDF (900KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem of low clinical practicability and accuracy in the clinical diagnosis of myocardial infarction， an auxiliary diagnosis method of myocardial infarction based on 12-lead ElectroCardioGram （ECG） signal was proposed. Firstly， denoising and data enhancement were performed on the 12-lead ECG signals. Secondly， aiming at the ECG signals of each lead， the statistical features including standard deviation， kurtosis coefficient and skewness coefficient were extracted respectively to reflect the morphological characteristics of ECG signals， meanwhile the entropy features including Shannon entropy， sample entropy， fuzzy entropy， approximate entropy and permutation entropy were extracted to characterize the time and frequency spectrum complexity， the new mode generation probability， the regularity and the unpredictability of the ECG signal time series as well as detect the small changes of ECG signals. Thirdly， the statistical features and entropy features of ECG signals were fused. Finally， based on the random forest algorithm， the performance of algorithm was analyzed and verified in both intra-patient and inter-patient modes， and the cross-validation technology was used to avoid over-fitting. Experimental results show that， the accuracy and F1 value of the proposed method in the intra-patient modes are 99.98% and 99.99% respectively， the accuracy and F1 value of the proposed method in the inter-patient mode are 94.56% and 97.05% respectively； and compared with the detection method based on single-lead ECG， the detection of myocardial infarction with 12-lead ECG is more logical for doctors’ clinical diagnosis.

Motor imagery EEG feature extraction method based on multi-feature fusion

Fei LUO, Pengfei LIU, Yuan LUO, Simeng ZHU

2020, 40(2): 616-620. DOI: 10.11772/j.issn.1001-9081.2019071167

Asbtract ( )

HTML ( )

PDF (699KB) ( )

Figures and Tables | References | Related Articles | Metrics

To solve the problems of low recognition rate and poor adaptability of single feature， a feature extraction method named Hilbert-CSP-Huang Transform （HCHT） was proposed based on Hilbert-Huang Transform （HHT） and Common Spatial Pattern （CSP）. Firstly， the Intrinsic Mode Function （IMF） was obtained by the Empirical Mode Decomposition （EMD） of original ElectroEncephaloGram （EEG） signals， and the IMF components were merged into a new signal matrix. Secondly， the time-frequency domain features were obtained by Hilbert spectrum analysis. Thirdly， the time-frequency domain features were extended into time-frequency-space features by further CSP decomposition of the constructed signal matrix. Finally， the feature set was classified by Support Vector Machine （SVM）. Experiments on the BCI Competition II dataset show that compared with methods based on HHT time-frequency and CSP spatial domain features， the proposed method has the recognition accuracy increased by 7.5， 10.3 and 9.2 percentage points respectively with smaller standard deviation. The online experimental results on the intelligent wheelchair platform show that HCHT can effectively improve the recognition accuracy and robustness.

Table of Content