Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Outlier detection algorithm based on graph random walk
DU Xusheng, YU Jiong, YE Lele, CHEN Jiaying
Journal of Computer Applications    2020, 40 (5): 1322-1328.   DOI: 10.11772/j.issn.1001-9081.2019101708
Abstract440)      PDF (1616KB)(477)       Save

Outlier detection algorithms are widely used in various fields such as network intrusion detection, and medical aided diagnosis. Local Distance-Based Outlier Factor (LDOF), Cohesiveness-Based Outlier Factor (CBOF) and Local Outlier Factor (LOF) algorithms are classic algorithms for outlier detection with long execution time and low detection rate on large-scale datasets and high dimensional datasets. Aiming at these problems, an outlier detection algorithm Based on Graph Random Walk (BGRW) was proposed. Firstly, the iterations, damping factor and outlier degree for every object in the dataset were initialized. Then, the transition probability of the rambler between objects was deduced based on the Euclidean distance between the objects. And the outlier degree of every object in the dataset was calculated by iteration. Finally, the objects with highest outlier degree were output as outliers. On UCI (University of California, Irvine) real datasets and synthetic datasets with complex distribution, comparison between BGRW and LDOF, CBOF, LOF algorithms about detection rate, execution time and false positive rate were carried out. The experimental results show that BGRW is able to decrease execution time and false positive rate, and has higher detection rate.

Reference | Related Articles | Metrics
Task scheduling strategy based on data stream classification in Heron
ZHANG Yitian, YU Jiong, LU Liang, LI Ziyang
Journal of Computer Applications    2019, 39 (4): 1106-1116.   DOI: 10.11772/j.issn.1001-9081.2018081848
Abstract542)      PDF (1855KB)(408)       Save
In a new platform for big data stream processing called Heron, the round-robin scheduling algorithm is usually used for task scheduling by default, which does not consider the topology runtime state and the impact of different communication modes among task instances on Heron's performance. To solve this problem, a task scheduling strategy based on Data Stream Classification in Heron (DSC-Heron) was proposed, including data stream classification algorithm, data stream cluster allocation algorithm and data stream classification scheduling algorithm. Firstly, the instance allocation model of Heron was established to clarify the difference in communication overhead among different communication modes of the task instances. Secondly, the data stream was classified according to the real-time data stream size between task instances based on the data stream classification model of Heron. Finally, the packing plan of Heron was constructed by using the interrelated high-frequency data streams as the basic scheduling unit to complete the scheduling to minimize the communication cost by transforming inter-node data streams into intra-node ones as many as possible. After running SentenceWordCount, WordCount and FileWordCount topologies in a Heron cluster environment with 9 nodes, the results show that compared with the Heron default scheduling strategy, DSC-Heron has 8.35%, 7.07% and 6.83% improvements in system complete latency, inter-node communication overhead and system throughput respectively; in the load balancing aspect, the standard deviations of CPU usage and memory usage of the working nodes are decreased by 41.44% and 41.23% respectively. All experimental results show that DSC-Heron can effectively improve the performance of the topologies, and has the most significant optimization effect on FileWordCount topology which is close to the real application scenario.
Reference | Related Articles | Metrics
SQL energy consumption perception model for database load based on SSD
LI Shu, YU Jiong, GUO Binglei, PU Yonglin, YANG Dexian, LIU Su
Journal of Computer Applications    2019, 39 (1): 205-212.   DOI: 10.11772/j.issn.1001-9081.2018051055
Abstract639)      PDF (1350KB)(331)       Save
For energy consumption and severe environmental problems brought by big data, building an energy-efficient green database system has become a key requirement and an important challenge. To solve the problem that traditional database systems mainly focus on performance, and are lack of energy consumption perception and optimization, an energy consumption perception model based on database workload was proposed and applied to the database system based on Solid-State Drive (SSD). Firstly, the consumption of major system resources (CPU, SSD) during database workload execution was quantified as time overhead and power consumption overhead. Based on basic I/O type of SSD database workload, a time cost model and a power consumption overhead model were built, and an energy consumption perception model with uniform resource unit was implemented. Then, multi-variable linear regression mathematical tools were used to solve the model, and in the exclusive environment and competitive environment, the energy estimation accuracy of the model for different I/O types of database workload was verified. Finally, the experimental results were analyzed and the factors that affect the model accuracy were discussed. The experimental results show that the model accuracy is relatively high. Under ideal conditions that DBMS monopolized system resources, the average error is 5.15% and the absolute error is no more than 9.8%. Although the accuracy in competitive environment is reduced, the average error is less than 12.21%.The model can effectively build an energy-aware green database system.
Reference | Related Articles | Metrics
Dynamic task dispatching strategy for stream processing based on flow network
LI Ziyang, YU Jiong, BIAN Chen, LU Liang, PU Yonglin
Journal of Computer Applications    2018, 38 (9): 2560-2567.   DOI: 10.11772/j.issn.1001-9081.2017122910
Abstract1264)      PDF (1352KB)(521)       Save
Concerning the problem that sharp increase of data input rate leads to the rising of computing latency which influences the real-time of computing in big data stream processing platform, a dynamic dispatching strategy based on flow network was proposed and applied to a data stream processing platform named Apache Flink. Firstly, a Directed Acyclic Graph (DAG) was transformed to a flow network by defining the capacity and flow of every edge and a capacity detection algorithm was used to ascertain the capacity value of every edge. Secondly, a maximum flow algorithm was used to acquire the improved network and the optimization path in order to promote the throughput of cluster when the data input rate is increasing; meanwhile the feasibility of the algorithm was proved by evaluating its time-space complexity. Finally, the influence of an important parameter on the algorithm execution was discussed and recommended parameter values of different types of jobs were obtained by experiments. The experimental results show that the throughput promoting rate of the strategy is higher than 16.12% during the increasing phases of the data input rate in different types of benchmarks compared with the original dispatching strategy of Apache Flink, so the dynamic dispatching strategy efficiently promotes the throughput of cluster under the premise of task latency constraint.
Reference | Related Articles | Metrics
Android malware detection based on texture fingerprint and malware activity vector space
LUO Shiqi, TIAN Shengwei, YU Long, YU Jiong, SUN Hua
Journal of Computer Applications    2018, 38 (4): 1058-1063.   DOI: 10.11772/j.issn.1001-9081.2017102499
Abstract530)      PDF (862KB)(475)       Save
To improve the accuracy and automation of malware recognition, an Android malware analysis and detection method based on deep learning was proposed. Firstly, the malware texture fingerprint was proposed to reflect the content similarity of malicious code binary files, and 33 types of malware activity vector space were selected to reflect the potential dynamic activities of malicious code. In addition, to improve the accuracy of the classification, the AutoEncoder (AE) and the Softmax classifier were trained combined with the above characteristics. Test results on different data samples showed that the average classification accuracy of the proposed method was up to 94.9% by using Stacked AE (SAE), which is 1.1 percentage points higher than that of Support Vector Machine (SVM). The proposed method can effectively improve the accuracy of malicious code recognition.
Reference | Related Articles | Metrics
Task scheduling algorithm based on weight in Storm
LU Liang, YU Jiong, BIAN Chen, YING Changtian, SHI Kangli, PU Yonglin
Journal of Computer Applications    2018, 38 (3): 699-706.   DOI: 10.11772/j.issn.1001-9081.2017082125
Abstract622)      PDF (1385KB)(660)       Save
Apache Storm, a typical platform for big data stream computing, uses a round-robin scheduling algorithm as the default scheduler, which does not consider the fact that differences of computational and communication cost are ubiquitous among different tasks and different data streams in one topology. Hence optimization is needed in terms of load balance and communication cost. To solve this problem, a Task Scheduling Algorithm based on Weight in Storm (TSAW-Storm) was proposed. In the algorithm, CPU occupation was taken as the weight of a task in a specific topology, and similarly tuple rate between a pair of tasks was taken as the weight of a data stream. Then tasks were assigned to the most suitable work node gradually by maximizing the gain of weight of data streams via transforming inter-node data streams into intra-node ones as many as possible with load balance ensured in order to reduce network overhead. Experimental results show that TSAW-Storm can reduce latency and inter-node tuple rate by about 30.0% and 32.9% respectively, and standard deviation of CPU load of work nodes is only 25.8% when compared to Storm default scheduling algorithm in WordCount benchmark with 8 work nodes. Additionally, online scheduler is deployed in contrast experiment. Experimental results show that TSAW-Storm can reduce latency, inter-node tuple rate and standard deviation of CPU load by about 7.76%, 11.8% and 5.93% respectively, which needs only a bit of executive overhead compared to online scheduler. Therefore, the proposed algorithm can reduce communication cost as well as improve load balance effectively, which makes a great contribution to the efficient operation of Apache Storm.
Reference | Related Articles | Metrics
Task scheduling strategy based on topology structure in Storm
LIU Su, YU Jiong, LU Liang, LI Ziyang
Journal of Computer Applications    2018, 38 (12): 3481-3489.   DOI: 10.11772/j.issn.1001-9081.2018040741
Abstract986)      PDF (1471KB)(467)       Save
In order to solve the problems of large communication cost and unbalanced load in the default round-robin scheduling strategy of Storm stream computing platform, a Task Scheduling Strategy based on Topology Structure (TS 2) in Storm was proposed. Firstly, the work nodes with sufficient and available Central Processing Unit (CPU) resources were selected and only a process was allocated to each work node to eliminate the communication cost between processes within the nodes and optimize the process deployment. Then, the topology structure was analyzed, the component with the biggest degree in the topology was found and the thread of the component was assigned with the highest priority. Finally, under the condition of the maximum number of threads that a node could carry, the associated tasks were deployed to the same node as far as possible to reduce the communication cost between nodes, improve the load balance of cluster and optimize the thread deployment. The experimental results show that, in terms of system latency, the average optimization rate of TS 2 is 16.91% and 5.69% respectively compared with Storm default scheduling strategy and offline scheduling strategy, which effectively improves the real-time performance of system. Additionally, compared with the Storm default scheduling strategy, the communication cost between nodes of TS 2 is reduced by 15.75% and its average throughput is improved by 14.21%.
Reference | Related Articles | Metrics
Performance optimization of ItemBased recommendation algorithm based on Spark
LIAO Bin, ZHANG Tao, GUO Binglei, YU Jiong, ZHANG Xuguang, LIU Yan
Journal of Computer Applications    2017, 37 (7): 1900-1905.   DOI: 10.11772/j.issn.1001-9081.2017.07.1900
Abstract615)      PDF (928KB)(457)       Save
Under MapReduce computing scenarios, complex data mining algorithms typically require multiple MapReduce jobs' collaboration process to compete the task. However, serious redundant disk read and write and repeat resource request operations among multiple MapReduce jobs seriously degrade the performance of the algorithm under MapReduce. To improve the computational efficiency of ItemBased recommendation algorithm, firstly, the performance issues of the ItemBased collaborative filtering algorithm under MapReduce platform were analyzed. Secondly, the execution efficiency of the algorithm was improved by taking advantage of Spark's performance superiority on iterative computation and memory computing, and the ItemBased collaborative filtering algorithm under Spark platform was implemented. The experimental results show that, when the size of the cluster nodes is 10 and 20, the running time of the algorithm in Spark is only 25.6% and 30.8% of that in MapReduce. The algorithm's overall computing efficiency of Spark platform improves more than 3 times compared with that of MapReduce platform.
Reference | Related Articles | Metrics
Energy-efficient strategy for threshold control in big data stream computing environment
PU Yonglin, YU Jiong, WANG Yuefei, LU Liang, LIAO Bin, HOU Dongxue
Journal of Computer Applications    2017, 37 (6): 1580-1586.   DOI: 10.11772/j.issn.1001-9081.2017.06.1580
Abstract612)      PDF (1225KB)(572)       Save
In the field of big data real-time analysis and computing, the importance of stream computing is constantly improved while the energy consumption of dealing with data on stream computing platform rises constantly. In order to solve the problems, an Energy-efficient Strategy for Threshold Control (ESTC) was proposed by changing the processing mode of node to data in stream computing. First of all, according to system load difference, the threshold of the work node was determined. Secondly, according to the threshold of the work node, the system data stream was randomly selected to determine the physical voltage of the adjustment system in different data processing situation. Finally, the system power was determined according to the different physical voltage. The experimental results and theoretical analysis show that in stream computing cluster consisting of 20 normal PCs, the system based on ESTC saves about 35.2% more energy than the original system. In addition, the ratio of performance and energy consumption under ESTC is 0.0803 tuple/(s·J), while the original system is 0.0698 tuple/(s·J). Therefore, the proposed ESTC can effectively reduce the energy consumption without affecting the system performance.
Reference | Related Articles | Metrics
Construction method of mobile application similarity matrix based on latent Dirichlet allocation topic model
CHU Zheng, YU Jiong, WANG Jiayu, WANG Yuefei
Journal of Computer Applications    2017, 37 (4): 1075-1082.   DOI: 10.11772/j.issn.1001-9081.2017.04.1075
Abstract453)      PDF (1175KB)(640)       Save
With the rapid development of mobile Internet, how to extract effective description information from a large number of mobile applications and then provide effective and accurate recommendation strategies for mobile users becomes urgent. At present, recommendation strategies are relatively traditional, and mostly recommend applications according to the single attribute, such as downloads, application name and application classification. In order to resolve the problem that the granularity of recommended applications is too coarse and the recommendation is not accurate, a mobile application similarity matrix construction method based on Latent Dirichlet Allocation (LDA) was proposed. Started from the application labels, a topic model distribution matrix of mobile applications was constructed, which was utilized to construct mobile application similarity matrix. Meanwhile, a method for converting the mobile application similarity matrix to the viable storage structure was also proposed. Extensive experiments demonstrate the feasibility of the proposed method, and the application similarity achieves 130 percent increasement by the proposed method compared with that by the existing 360 application market. The proposed method solves the problem that the recommended granularity is too coarse in the mobile application recommendation process, so that the recommendation result is more accurate.
Reference | Related Articles | Metrics
Partitioning and mapping algorithm for in-memory computing framework based on iterative filling
BIAN Chen, YU Jiong, XIU Weirong, YING Changtian, QIAN Yurong
Journal of Computer Applications    2017, 37 (3): 647-653.   DOI: 10.11772/j.issn.1001-9081.2017.03.647
Abstract525)      PDF (1133KB)(449)       Save
Focusing on the issue that the only one Hash/Range partitioning strategy in Spark usually results in unbalanced data load at Reduce phase and increases job duration sharply, an Iterative Filling data Partitioning and Mapping algorithm (IFPM) which include several innovative approaches was proposed. First of all, according to the analysis of job execute scheme of Spark, the job efficiency model and partition mapping model were established, the definitions of job execute timespan and allocation incline degree were given. Moreover, the Extendible Partitioning Algorithm (EPA) and Iterative Mapping Algorithm (IMA) were proposed, which reserved partial data into extend region by one-to-many partition function at Map phase. Data in extended region would be mapped by extra iterative allocation until the approximate data distribution was obtained, and the adaptive mapping function was executed by awareness of calculated data size at Reduce phase to revise the unbalanced data load in original region allocation. Experimental results demonstrate that for any distribution of the data, IFPM promotes the rationality of data load allocation from Map phase to Reduce phase and optimize the job efficiency of in-memory computing framework.
Reference | Related Articles | Metrics
Video recommendation algorithm based on clustering and hierarchical model
JIN Liang, YU Jiong, YANG Xingyao, LU Liang, WANG Yuefei, GUO Binglei, Liao Bin
Journal of Computer Applications    2017, 37 (10): 2828-2833.   DOI: 10.11772/j.issn.1001-9081.2017.10.2828
Abstract661)      PDF (1025KB)(762)       Save
Concerning the problem of data sparseness, cold start and low user experience of recommendation system, a video recommendation algorithm based on clustering and hierarchical model was proposed to improve the performance of recommendation system and user experience. Focusing on the user, similar users were obtained by analyzing Affiliation Propagation (AP) cluster, then historical data of online video of similar users was collected and a recommendation set of videos was geberated. Secondly, the user preference degree of a video was calculated and mapped into the tag weight of the video. Finally, a recommendation list of videos was generated by using analytic hierarchy model to calculate the ranking of user preference with videos. The experimental results on MovieLens Latest Dataset and YouTube video review text dataset show that the proposed algorithm has good performance in terms of Root-Mean-Square Error (RMSE) and the recommendation accuracy.
Reference | Related Articles | Metrics
Dynamic data stream load balancing strategy based on load awareness
LI Ziyang, YU Jiong, BIAN Chen, WANG Yuefei, LU Liang
Journal of Computer Applications    2017, 37 (10): 2760-2766.   DOI: 10.11772/j.issn.1001-9081.2017.10.2760
Abstract835)      PDF (1299KB)(943)       Save
Concerning the problem of unbalanced load and incomplete comprehensive evaluation of nodes in big data stream processing platform, a dynamic load balancing strategy based on load awareness algorithm was proposed and applied to a data stream processing platform named Apache Flink. Firstly, the computational delay time of the nodes was obtained by using the depth-first search algorithm for the Directed Acyclic Graph (DAG) and regarded as the basis for evaluating the performance of the nodes, and the load balancing strategy was created. Secondly, the load migration technology for data stream was implemented based on the data block management strategy, and both the global and local load optimization was implemented through feedback. Finally, the feasibility of the algorithm was proved by evaluating its time-space complexity, meanwhile the influence of important parameters on the algorithm execution was discussed. The experimental results show that the proposed algorithm increases the efficiency of the task execution by optimizing the load sharing between nodes, and the task execution time is shortened by 6.51% averagely compared with the traditional load balancing strategy of Apache Flink.
Reference | Related Articles | Metrics
Coordinator selection strategy based on RAMCloud
WANG Yuefei, YU Jiong, LU Liang
Journal of Computer Applications    2016, 36 (9): 2402-2408.   DOI: 10.11772/j.issn.1001-9081.2016.09.2402
Abstract375)      PDF (1102KB)(289)       Save
Focusing on the issue that ZooKeeper cannot meet the requirement of low latency and quick recovery of RAMCloud, a Coordinator Election Strategy (CES) based on RAMCloud was proposed. First of all, according to the network environment of RAMCloud and factors of the coordinator itself, the performance indexes of coordinator were divided into two categories including individual indexes and coordinator indexes, and models for them were built separately. Next, the operation of RAMCloud was divided into error-free running period and data recovery period, their fitness functions were built separately, and then the two fitness functions were merged into a total fitness function according to time ratio. Lastly, on the basis of fitness value of RAMCloud Backup Coordinator (RBC), a new operator was proposed with randomness and the capacity of selecting an ideal target: CES would firstly eliminate poor-performing RBC by screening, as the range of choice was narrowed, CES would select the ultimate RBC from the collection of ideal coordinators by means of roulette. The experimental results showed that compared with other RBCs in the NS2 simulation environment, the coordinator selected by CES decreased latency by 19.35%; compared with ZooKeeper in the RAMCloud environment, the coordinator selected by CES reduced recovery time by 10.02%. In practical application of RAMCloud, the proposed CES can choose the coordinator with better performance, ensure the demand of low latency and quick recovery.
Reference | Related Articles | Metrics
Parallel access strategy for big data objects based on RAMCloud
CHU Zheng, YU Jiong, LU Liang, YING Changtian, BIAN Chen, WANG Yuefei
Journal of Computer Applications    2016, 36 (6): 1526-1532.   DOI: 10.11772/j.issn.1001-9081.2016.06.1526
Abstract627)      PDF (1195KB)(407)       Save
RAMCloud only supports the small object storage which is not larger than 1 MB. When the object which is larger than 1 MB needs to be stored in the RAMCloud cluster, it will be constrained by the object's size. So the big data objects can not be stored in the RAMCloud cluster. In order to resolve the storage limitation problem in RAMCloud, a parallel access strategy for big data objects based on RAMCloud was proposed. Firstly, the big data object was divided into several small data objects within 1 MB. Then the data summary was created in the client. The small data objects which were divided in the client were stored in RAMCloud cluster by the parallel access strategy. On the stage of reading, the data summary was firstly read, and then the small data objects were read in parallel from the RAMCloud cluster according to the data summary. Then the small data objects were merged into the big data object. The experimental results show that, the storage time of the proposed parallel access strategy for big data objects can reach 16 to 18 μs and the reading time can reach 6 to 7 μs without destroying the architecture of RAMCloud cluster. Under the InfiniBand network framework, the speedup of the proposed paralled strategy almost increases linearly, which can make the big data objects access rapidly and efficiently in microsecond level just like small data objects.
Reference | Related Articles | Metrics
Strategy for object index based on RAMCloud
WANG Yuefei, YU Jiong, LU Liang
Journal of Computer Applications    2016, 36 (5): 1222-1227.   DOI: 10.11772/j.issn.1001-9081.2016.05.1222
Abstract414)      PDF (876KB)(480)       Save
In order to solve the problem of low using rate, RAMCloud would change the positions of objects, which would cause the failure for Hash to localize the object, and the low efficiency of data search. On the other hand, since the needed data could not be positioned rapidly in the recovery process of the data, the returned segments from every single backup could not be organized perfectly. Due to such problems, RAMCloud Global Key (RGK) and binary index tree, as solutions, were proposed. RGK can be divided into three parts:positioned on master, on segment, and on object. The first two parts constituted Coordinator Index Key (CIK), which means in the recovery process, Coordinator Index Tree (CIT) could position the master of segments. The last two parts constituted Master Index Key (MIK), and Master Index Tree (MIT) could obtain objects quickly, even though the data was shifted the position in the memory. Compared with the traditional RAMCloud cluster, the time of obtaining objects can obviously reduce when the data throughput is increasing. Also, the idle time of coordinator and recombined time of log are both declining. The experimental results show that the global key with the support of the binary index tree can reduce the time of obtaining objects and recovering.
Reference | Related Articles | Metrics
Adaptive multi-resource scheduling dominant resource fairness algorithm for Mesos in heterogeneous clusters
KE Zunwang, YU Jiong, LIAO Bin
Journal of Computer Applications    2016, 36 (5): 1216-1221.   DOI: 10.11772/j.issn.1001-9081.2016.05.1216
Abstract507)      PDF (870KB)(575)       Save
The fairness of multi-resource allocation is one of the most important indicators in the resource scheduling subsystem, Dominant Resource Fairness (DRF), as a general resource allocation algorithm for multi-resources scenarios, it may be unfair in heterogeneous cluster environment. On the basis of the research on the DRF multi-resource fair allocation algorithm under Mesos framework environment, meDRF allocation algorithm was designed and implemented to evaluate the influence factors of the performance of the server. The machine performance scores of computing nodes, as the dominant factor of DRF share calculation, made computing tasks have equal chance to obtain high quality computing resources and poor computing resources. Experiments were conducted by using K-means, Bayes and PageRank jobs under Hadoop. The experimental results show that, compared with DRF allocation algorithm, the meDRF algorithm can reflect more fairness of the allocation of resources, and the allocation of resources has better stability, which effectively improves the utilization of system resources.
Reference | Related Articles | Metrics
Link prediction algorithm based on node importance in complex networks
CHEN Jiaying, YU Jiong, YANG Xingyao, BIAN Chen
Journal of Computer Applications    2016, 36 (12): 3251-3255.   DOI: 10.11772/j.issn.1001-9081.2016.12.3251
Abstract949)      PDF (902KB)(966)       Save
Enhancing the accuracy of link prediction is one of the fundamental problems in the research of complex networks. The existing node similarity-based prediction indexes do not make full use of the importance influences of the nodes in the network. In order to solve the above problem, a link prediction algorithm based on the node importance was proposed. The node degree centrality, closeness centrality and betweenness centrality were used on the basis of similarity indexes such as Common Neighbor (CN), Adamic-Adar (AA) and Resource Allocation (RA) of local similarity-based link prediction algorithm. The link prediction indexes of CN, AA and RA with considering the importance of nodes were proposed to calculate the node similarity. The simulation experiments were taken on four real-world networks and Area Under the receiver operation characteristic Curve (AUC) was adopted as the standard index of link prediction accuracy. The experimental results show that the link prediction accuracies of the proposed algorithm on four data sets are higher than those of the other comparison algorithms, like Common Neighbor (CN) and so on. The proposed algorithm outperforms traditional link prediction algorithm and produces more accurate prediction on the complex network.
Reference | Related Articles | Metrics
Energy-efficient strategy of distributed file system based on data block clustering storage
WANG Zhengying, YU Jiong, YING Changtian, LU Liang
Journal of Computer Applications    2015, 35 (2): 378-382.   DOI: 10.11772/j.issn.1001-9081.2015.02.0378
Abstract539)      PDF (766KB)(446)       Save

Concerning the low server utilization and complicated energy management caused by block random placement strategy in distributed file systems, the vector of the visiting feature on data block was built to depict the behavior of the random block accessing. K-means algorithm was adopted to do the clustering calculation according to the calculation result, then the datanodes were divided into multiple regions to store different cluster data blocks. The data blocks were dynamic reconfigured according to the clustering calculation results when the system load is low. The unnecessary datanodes could sleep to reduce the energy consumption. The flexible set of distance parameters between clusters made the strategy be suitable for different scenarios that has different requests for the energy consumption and utilization. Compared with hot-cold zoning strategies, the mathematical analysis and experimental results prove that the proposed method has a higher energy saving efficiency, the energy consumption reduces by 35% to 38%.

Reference | Related Articles | Metrics
Data migration model based on RAMCloud hierarchical storage architecture
GUO Gang, YU Jiong, LU Liang, YING Changtian, YIN Lutong
Journal of Computer Applications    2015, 35 (12): 3392-3397.   DOI: 10.11772/j.issn.1001-9081.2015.12.3392
Abstract533)      PDF (878KB)(370)       Save
In order to achieve the efficient storage and access to the huge amounts of data online, under the hierarchical storage architecture of memory cloud, a model of Migration Model based on Data Significance (MMDS) was proposed. Firstly, the importance of data itself was calculated based on factors of the size of the data itself, the importance of time, the total amount of user access, and so on. Secondly, the potential value of the data was evaluated by adopting users' similarity and the importance ranking of the PageRank algorithm in the recommendation system. The importance of the data was determined by the importance of data itself and its potential value together. Then, data migration mechanism was designed based on the importance of data, The experimental results show that, the proposed model can identify the importance of the data and place the data in a hierarchical way and improved the data access hit rate from the storage system compared with the algorithms of Least Recently Used (LRU), Least Frequently Used (LFU), Migration Strategy based on Data Value (MSDV). The proposed model can alleviate the part pressure of storage and has improved the data access performance.
Reference | Related Articles | Metrics
Dynamic power consumption profiling and modeling by structured query language
GUO Binglei, YU Jiong, LIAO Bin, YANG Dexian
Journal of Computer Applications    2015, 35 (12): 3362-3367.   DOI: 10.11772/j.issn.1001-9081.2015.12.3362
Abstract581)      PDF (923KB)(340)       Save
In order to build energy-saving green database, a database model of dynamic power consumption based on the smallest unit of Structured Query Language (SQL) resource (Central Processing Unit (CPU), disk) consumption. The proposed model profiled the dynamic power consumption and mapped the main hardwares (CPU, disk) resource consumption to power consumption. Key parameters of the model were fitted by adopting the method of multiple linear regression to estimate the dynamic system power in real-time and build the unit-unified model of dynamic power consumption. The experimental results show that, compared with the model based on the total number of tuples, the total number of CPU instructions can better reflect the CPU power consumption. The average relative error of the constitutive model is less than 6% and the absolute error of the constitutive model is less than 9% while the DataBase Management System (DBMS) monopolizes system resources in the static environment. The proposed dynamic power consumption model is more suitable for building energy-saving green database.
Reference | Related Articles | Metrics
Video recommendation algorithm fusing comment analysis and latent factor model
YIN Lutong, YU Jiong, LU Liang, YING Changtian, GUO Gang
Journal of Computer Applications    2015, 35 (11): 3247-3251.   DOI: 10.11772/j.issn.1001-9081.2015.11.3247
Abstract510)      PDF (790KB)(662)       Save
Video recommender is still confronted with many challenges such as lack of meta-data of online videos, and also it's difficult to abstract features on multi-media data directly. Therefore an Video Recommendation algorithm Fusing Comment analysis and Latent factor model (VRFCL) was proposed. Starting with video comments, it firstly analyzed the sentiment orientation of user comments on multiple videos, and resulted with some numeric values representing user's attitude towards corresponding video. Then it constructed a virtual rating matrix based on numeric values calculated before, which made up for data sparsity to some extent. Taking diversity and high dimensionality features of online video into consideration, in order to dig deeper about user's latent interest into online videos, it adapted Latent Factor Model (LFM) to categorize online videos. LFM enables us to add latent category feature to the basis of traditional recommendation system which comprised of dual user-item relationship. A series of experiments on YouTube review data were carried to prove that VRFCL algorithm achieves great effectiveness.
Reference | Related Articles | Metrics
Energy-efficient strategy for disks in RAMCloud
LU Liang YU Jiong YING Changtian WANG Zhengying LIU Jiankuang
Journal of Computer Applications    2014, 34 (9): 2518-2522.   DOI: 10.11772/j.issn.1001-9081.2014.09.2518
Abstract196)      PDF (777KB)(412)       Save

The emergence of RAMCloud has improved user experience of Online Data-Intensive (OLDI) applications. However, its energy consumption is higher than traditional cloud data centers. An energy-efficient strategy for disks under this architecture was put forward to solve this problem. Firstly, the fitness function and roulette wheel selection which belong to genetic algorithm were introduced to choose those energy-saving disks to implement persistent data backup; secondly, reasonable buffer size was needed to extend average continuous idle time of disks, so that some of them could be put into standby during their idle time. The simulation experimental results show that the proposed strategy can effectively save energy by about 12.69% in a given RAMCloud system with 50 servers. The buffer size has double impacts on energy-saving effect and data availability, which must be weighed.

Reference | Related Articles | Metrics
Task scheduling and resource selection algorithm with data-dependent constraints
LIAO Bin YU Jiong ZHANG Tao YANG Xingyao
Journal of Computer Applications    2014, 34 (8): 2260-2266.   DOI: 10.11772/j.issn.1001-9081.2014.08.2260
Abstract324)      PDF (1100KB)(490)       Save

Like MapReduce, tasks under big data environment are always with data-dependent constraints. The resource selection strategy in distributed storage system trends to choose the nearest data block to requestor, which ignored the server's resource load state, like CPU, disk I/O and network, etc. On the basis of the distributed storage system's cluster structure, data file division mechanism and data block storage mechanism, this paper defined the cluster-node matrix, CPU load matrix, disk I/O load matrix, network load matrix, file-division-block matrix, data block storage matrix and data block storage matrix of node status. These matrixes modeled the relationship between task and its data constraints. And the article proposed an optimal resource selection algorithm with data-dependent constraints (ORS2DC), in which the task scheduling node is responsible for base data maintenance, MapRedcue tasks and data block read tasks take different selection strategies with different resource-constraints. The experimental results show that, the proposed algorithm can choose higher quality resources for the task, improve the task completion quality while reducing the NameNode's load burden, which can reduce the probability of the single point of failure.

Reference | Related Articles | Metrics
Energy-efficient strategy for dynamic management of cloud storage replica based on user visiting characteristic
WANG Zhengying YU Jiong YING Changtian LU Liang BAN Aiqin
Journal of Computer Applications    2014, 34 (8): 2256-2259.   DOI: 10.11772/j.issn.1001-9081.2014.08.2256
Abstract341)      PDF (793KB)(568)       Save

For low server utilization and serious energy consumption waste problems in cloud computing environment, an energy-efficient strategy for dynamic management of cloud storage replica based on user visiting characteristic was put forward. Through transforming the study of the user visiting characteristics into calculating the visiting temperature of Block, DataNode actively applied for sleeping so as to achieve the goal of energy saving according to the global visiting temperature.The dormant application and dormancy verifying algorithm was given in detail, and the strategy concerning how to deal with the visit during DataNode dormancy was described explicitly. The experimental results show that after adopting this strategy, 29%-42% DataNode can sleep, energy consumption reduces by 31%, and server response time is well. The performance analysis show that the proposed strategy can effectively reduce the energy consumption while guaranteeing the data availability.

Reference | Related Articles | Metrics
Reliability-aware workflow scheduling strategy on cloud computing platform
YAN Ge YU Jiong YANG Xingyao
Journal of Computer Applications    2014, 34 (3): 673-677.   DOI: 10.11772/j.issn.1001-9081.2014.03.0673
Abstract603)      PDF (737KB)(590)       Save

Through the analysis and research of reliability problems in the existing workflow scheduling algorithm, the paper proposed a reliability-based workflow strategy concerning the problems in improving the reliability of the entire workflow by sacrificing efficiency or money in some algorithms. Combining the reliability of tasks in workflow and duplication ideology, and taking full consideration of priorities among tasks, this strategy lessened failure rate in transmitting procedure and meantime shortened transmit time, so it not only enhanced overall reliability but also reduced makespan. Through the experiment and analysis, the reliability of cloud workflow in this strategy, tested by different numbers of tasks and different Communication to Computation Ratios (CCR), was proved to be better than the Heterogeneous Earliest-Finish-Time (HEFT) algorithm and its improved algorithm named SHEFTEX, including the superiority of the proposed algorithm over the HEFT in the completion time.

Related Articles | Metrics
Optimal storing strategy based on small files in RAMCloud
YING Changtian YU Jiong LU Liang LIU Jiankuang
Journal of Computer Applications    2014, 34 (11): 3104-3108.   DOI: 10.11772/j.issn.1001-9081.2014.11.3104
Abstract317)      PDF (782KB)(667)       Save

RAMCloud stores data using log segment structure. When large amount of small files store in RAMCloud, each small file occupies a whole segment, so it may leads to much fragments inside the segments and low memory utilization. In order to solve the small file problem, a strategy based on file classification was proposed to optimize the storage of small files. Firstly, small files were classified into three categories including structural related, logical related and independent files. Before uploading, merging algorithm and grouping algorithm were used to deal with these files respectively. The experiment demonstrates that compared with non-optimized RAMCloud, the proposed strategy can improve memory utilization.

Reference | Related Articles | Metrics
Collaborative filtering model combining users' and items' predictions
YANG Xingyao YU Jiong TURGUN Ibrahim LIAO Bin
Journal of Computer Applications    2013, 33 (12): 3354-3358.  
Abstract964)      PDF (792KB)(1009)       Save
Concerning the poor quality of recommendations of traditional user-based and item-based collaborative filtering models, a new collaborative filtering model combining users and items predictions was proposed. Firstly, it considered both users and items, and optimized the similarity model with excellent performance dynamically. Secondly, it constructed neighbor sets for the target objects by selecting some similar users and items according to the similarity values, and then obtained the user-based and item-based prediction results respectively based on some prediction functions. Finally, it gained final predictions by using the adaptive balance factor to coordinate both of the prediction results. Comparative experiments were carried out under different evaluation criteria, and the results show that, compared with some typical collaborative filtering models such as RSCF, HCFR and UNCF, the proposed model not only has better performance in prediction accuracy of items, but also does well in the precision and recall of recommendations.
Related Articles | Metrics
Energy saving and load balance strategy in cloud computing
QIAN Yurong YU Jiong WANG Weiyuan SHUN Hua LIAO Bin YANG Xingyao
Journal of Computer Applications    2013, 33 (12): 3326-3330.  
Abstract746)      PDF (867KB)(787)       Save
An adaptive Virtual Machine (VM) dynamic migration strategy of soft energy-saving was put forward to optimize energy consumption and load balance in cloud computing. The energy-saving strategy adopted Dynamic Voltage Frequency Scaling (DVFS) as the static energy-aware technology to achieve the sub-optimized static energy saving, and used online VM migration to achieve an adaptive dynamic soft energy-saving in cloud platform. The two energy-saving strategies were simulated and compared with each other in CloudSim platform, and the data were tested on PlanetLab platform. The results show that: Firstly, the adaptive soft and hard combination strategy in energy-saving can significantly save 96% energy; secondly, DVFS+MAD_MMT strategy using Median Absolute Deviation (MAD) to determine whether the host is overload, and choosing VM to remove based on Minimum Migration Time (MMT), which can save energy about 87.15% with low-load in PlanetLab Cloudlets than that of experimental environment; finally, security threshold of 2.5 in MAD_MMT algorithm can consume the energy efficiently and achieve the adaptive load balancing of virtual machines migration dynamically.
Related Articles | Metrics
Collaborative filtering recommendation models considering item attributes
YANG Xingyao YU Jiong Turgun IBRAHIM QIAN Yurong SHUN Hua
Journal of Computer Applications    2013, 33 (11): 3062-3066.  
Abstract1102)      PDF (1027KB)(825)       Save
The traditional User-based Collaborative Filtering (UCF) models do not consider the attributes of items fully in the process of measuring the similarity of users. In view of the drawback, this paper proposed two collaborative filtering recommendation models considering item attributes. Firstly, the models optimized the rating-based similarity between users, and then summed the rating numbers of different items by users according to item attributes, in order to obtain the optimized and attribute-based similarity between users. Finally, the models coordinated the two types of similarity measurements by a self-adaptive balance factor, to complete the item prediction and recommendation process. The experimental results demonstrate that the newly proposed models not only have reasonable time costs in different data sets, but also yield excellent improvements in prediction accuracy of ratings, involving an average improvement of 5%, which confirms that the models are efficient in improving the accuracy of user similarity measurements.
Related Articles | Metrics