CCF Bigdata 2020

Select

Hybrid ant colony optimization algorithm with brain storm optimization

LI Mengmeng, QIN Wei, LIU Yi, DIAO Xingchun

Journal of Computer Applications 2021, 41 (8): 2412-2417. DOI: 10.11772/j.issn.1001-9081.2020101562

Abstract （382）

PDF （946KB）（437）

Save

Feature selection can improve the performance of data classification effectively. In order to further improve the solving ability of Ant Colony Optimization (ACO) on feature selection, a hybrid Ant colony optimization with Brain storm Optimization (ABO) algorithm was proposed. In the algorithm, the information communication archive was used to maintain the historical better solutions, and a longest time first method based on relaxation factor was adopted to update archive dynamically. When the global optimal solution of ACO was not updated for several times, a route-idea transformation operator based on Fuch chaotic map was used to transform the route solutions in the archive to the idea solutions. With the obtained solutions as initial population, the Brain Storm Optimization (BSO) was adopted to search for better solutions in wider space. On six typical binary datasets, experiments were conducted to analyze the sensibility of parameters of the proposed algorithm, and the algorithm was compared to three typical evolutionary algorithms:Hybrid Firefly and Particle Swarm Optimization (HFPSO) algorithm, Particle Swarm Optimization and Gravitational Search Algorithm (PSOGSA) and Genetic Algorithm (GA). Experimental results show that compared with the comparison algorithms, the proposed algorithm can improve the classification accuracy by at least 2.88% to 5.35%, and the F1-measure by at least 0.02 to 0.05, which verify the effectiveness and superiority of the proposed algorithm.

Reference | Related Articles | Metrics

Select

Remote sensing image dehazing method based on cascaded generative adversarial network

SUN Xiao, XU Jindong

Journal of Computer Applications 2021, 41 (8): 2440-2444. DOI: 10.11772/j.issn.1001-9081.2020101563

Abstract （577）

PDF （2363KB）（663）

Save

Dehazing algorithms based on image training pairs are difficult to deal with the problems of insufficient training sample pairs in remote sensing images, and have the model with weak generalization ability, therefore, a remote sensing image dehazing method based on cascaded Generative Adversarial Network (GAN) was proposed. In order to solve the missing of paired remote sensing datasets, U-Net GAN (UGAN) learning haze generation and Pixel Attention GAN (PAGAN) learning dehazing were proposed. In the proposed method, UGAN was used to learn how to add haze to the haze-free remote sensing images with the details of the images retained by using unpaired clear and haze image sets, and then was used to guide the PAGAN to learn how to correctly dehazing such images. To reduce the discrepancy between the synthetic haze remote sensing images and the dehazing remote sensing images, the self-attention mechanism was added to PAGAN. By the generator, the high-resolution detail features were generated by using cues from all feature locations in the low-resolution image. By the discriminator, the detail features in distant parts of the images were checked whether they are consistent with each other. Compared with the dehazing methods such as Feature Fusion Attention Network (FFANet), Gated Context Aggregation Network (GCANet) and Dark Channel Prior (DCP), this cascaded GAN method does not require a large number of paired data to train the network repeatedly. Experimental results show this method can remove haze and thin cloud effectively, and is better than the comparison methods on both visual effect and quantitative indices.

Reference | Related Articles | Metrics

Select

Matching method for academic expertise of research project peer review experts

WANG Zisen, LIANG Ying, LIU Zhengjun, XIE Xiaojie, ZHANG Wei, SHI Hongzhou

Journal of Computer Applications 2021, 41 (8): 2418-2426. DOI: 10.11772/j.issn.1001-9081.2020101564

Abstract （383）

PDF （1602KB）（652）

Save

Most of the existing expert recommendation processes rely on manual matching, which leads to the low accuracy of expert recommendation due to that they cannot fully capture the semantic association between the subject of the project and the interests of experts. To solve this problem, a matching method for academic expertise of project peer review experts was proposed. In the method, an academic network was constructed to establish the academic entity connection, and a meta-path was designed to capture the semantic association between different nodes in the academic network. By using the random walk strategy, the node sequence of co-occurrence association between the subject of the project and the expert research interests was obtained. And through the network representation learning model training, the vector representation with semantic association of the project subject and expert research interests was obtained. On this basis, the semantic similarity was calculated layer by layer according to the hierarchical structure of project subject tree to realize multi-granularity peer review academic expertise matching. Experimental results on the crawled datasets of HowNet and Wanfang papers, an expert review dataset and Baidu Baike word vector dataset show that this method can enhance the semantic association between the project subject and expert research interests, and can be effectively applied to the academic expertise matching of project review experts.

Reference | Related Articles | Metrics

Select

Time-incorporated point-of-interest collaborative recommendation algorithm

BAO Xuan, CHEN Hongmei, XIAO Qing

Journal of Computer Applications 2021, 41 (8): 2406-2411. DOI: 10.11772/j.issn.1001-9081.2020101565

Abstract （517）

PDF （886KB）（437）

Save

Point-Of-Interest (POI) recommendation aims to recommend places that users do not visit but may be interested in, which is one of the important location-based services. In POI recommendation, time is an important factor, but it is not well considered in the existing POI recommendation models. Therefore, the Time-incorporated User-based Collaborative Filtering POI recommendation (TUCF) algorithm was proposed to improve the performance of POI recommendation by considering time factor. Firstly, the users' check-in data of Location-Based Social Network (LBSN) was analyzed to explore the time relationship of users' check-ins. Then, the time relationship was used to smooth the users' check-in data, so as to incorporate time factor and alleviate data sparsity. Finally, according to the user-based collaborative filtering method, different POIs were recommended to the users at different times. Experimental results on real check-in datasets showed that compared with the User-based collaborative filtering (U) algorithm, TUCF algorithm had the precision and recall increased by 63% and 69% respectively, compared with the U with Temporal preference with smoothing Enhancement (UTE) algorithm, TUCF algorithm had the precision and recall increased by 8% and 12% respectively. And TUCF algorithms reduced the Mean Absolute Error (MAE) by 1.4% and 0.5% respectively, compared with U and UTE algorithms.

Reference | Related Articles | Metrics

Select

Low-latency cluster scheduling framework for large-scale short-time tasks

ZHAO Quan, TANG Xiaochun, ZHU Ziyu, MAO Anqi, LI Zhanhuai

Journal of Computer Applications 2021, 41 (8): 2396-2405. DOI: 10.11772/j.issn.1001-9081.2020101566

Abstract （449）

PDF （1310KB）（400）

Save

There are always some tasks with short duration and high concurrency in the large-scale data analysis environment. How to schedule these concurrent jobs with low-latency requirement is a hot research topic. In some existing cluster resource management frameworks, the centralized schedulers cannot meet the low-latency requirement due to the bottleneck of the master node, and some distributed schedulers achieve the low-latency task scheduling, but has shortcomings in the optimal resource allocation and resource allocation conflict. By considering the needs for large-scale real-time jobs, a distributed cluster resource scheduling framework was designed and implemented to meet the low-latency requirement of large-scale data processing. Firstly, a two-stage scheduling framework and an optimized two-stage multi-path scheduling framework were proposed. Secondly, aiming at some resource conflict problems in two-stage multi-path scheduling, a task transfer mechanism based on load balancing was proposed to solve the load imbalance problems among computing nodes. At last, the task scheduling framework for large-scale clusters was simulated and verified by using actual load and a simulated scheduler. For the actual load, the scheduling delay of the proposed framework is controlled within 12% of that of the ideal scheduling. In the simulated environment, this framework has the delay of short-time tasks reduced by more than 40% compared with the centralized scheduler.

Reference | Related Articles | Metrics

Select

Hybrid aerial image segmentation algorithm based on multi-region feature fusion for natural scene

YANG Rui, QIAN Xiaojun, SUN Zhenqiang, XU Zhen

Journal of Computer Applications 2021, 41 (8): 2445-2452. DOI: 10.11772/j.issn.1001-9081.2020101567

Abstract （400）

PDF （1689KB）（534）

Save

In the two components of hybrid image segmentation algorithm, the initial segmentation cannot form the over-segmentation region sets with low wrong segmentation rate, while region merging lacks the label selection mechanism for region merging and the method of determining region merging stopping moment in this component commonly does not meet the scenario requirements. To solve the above problems, a Multi-level Region Information fusion based Hybrid image Segmentation algorithm (MRIHS) was proposed. Firstly, the improved Markov model was used to smooth the superpixel blocks, so as to form initial segmentation regions. Then, the designed region label selection mechanism was used to select the labels of the merged regions after measuring the similarity of the initial segmentation regions and selecting the region pairs to be merged. Finally, an optimal merging state was defined to determine region merging stopping moment. To verify MRIHS performance, comparison experiments between this algorithm with Multi-dimensional Feature fusion based Hybrid image Segmentation algorithm (MFHS), Improved FCM image segmentation algorithm based on Region Merging (IFRM), Inter-segment and Boundary Homogeneities based Hybrid image Segmentation algorithm (IBHHS), Multi-dimensional Color transform and Consensus based Hybrid image Segmentation algorithm (MCCHS) were carried out on Visual Object Classes (VOC), Cambridge-driving labeled Video database (CamVid) and the self-built river and lake inspection (rli) datasets. The results show that on VOC and rli datasets, the Boundary Recall (BR), Achievable Segmentation Accuracy (ASA), recall and dice of MRIHS are at least increased by 0.43 percentage points, 0.35 percentage points, 0.41 percentage points, 0.84 percentage points respectively and the Under-segmentation Error (UE) of MRIHS is at least decreased by 0.65 percentage points compared with those of other algorithms; on CamVid dataset, the recall and dice of MRIHS are at least improved by 1.11 percentage points, 2.48 percentage points respectively compared with those of other algorithms.

Reference | Related Articles | Metrics

Select

Case reading comprehension method combining syntactic guidance and character attention mechanism

HE Zhenghai, XIAN Yantuan, WANG Meng, YU Zhengtao

Journal of Computer Applications 2021, 41 (8): 2427-2431. DOI: 10.11772/j.issn.1001-9081.2020101568

Abstract （589）

PDF （813KB）（671）

Save

Case reading comprehension is the specific application of machine reading comprehension in judicial field. Case reading comprehension is one of the important applications of judicial intelligence, which reads the judgment documents by computer and answers the related questions. At present, the mainstream method of machine reading comprehension is to use deep learning model to encode the text words and obtain vector representation of the text. The core problem of model construction is how to obtain the semantic representation of the text and how to match the questions with the context. Considering that syntactic information is helpful for model learning the sentence skeleton information and Chinese characters have potential semantic information, a case reading comprehension method that integrates syntactic guidance and character attention mechanism was proposed. By fusing the syntactic information and Chinese character information, the coding ability of the model for the case text was improved. Experimental results on the reading comprehension dataset of Law Research Cup 2019 show that compared with the baseline model, the proposed method has the Exact Match (EM) value increased by 0.816 and the F1 value improved by 1.809%.

Reference | Related Articles | Metrics

Select

Analysis of hypernetwork characteristics in Tang poems and Song lyrics

WANG Gaojie, YE Zhonglin, ZHAO Haixing, ZHU Yu, MENG Lei

Journal of Computer Applications 2021, 41 (8): 2432-2439. DOI: 10.11772/j.issn.1001-9081.2020101569

Abstract （524）

PDF （1147KB）（550）

Save

At present, there are many research results in Tang poems and Song lyrics from the perspective of literature, but there are few research results in Tang poems and Song lyrics by using the hypergraph based hypernetwork method, and the only researches of this kind are also limited to the study of Chinese character frequency and word frequency. The analysis and study of Tang poems and Song lyrics by using the method of hypernetwork data analysis is helpful to explore the breadth that cannot be reached by the traditional perspective of literature, and to discover the law of word composition laws in literatures and historical backgrounds reflected by Tang poems and Song lyrics. Therefore, based on two ancient text corpuses:Quan Tang Shi and Quan Song Ci, the hypernetworks of Tang poems and Song lyrics were established respectively. In the construction of the hypernetworks, a Tang poem or a Song lyrics was taken as a hyperedge, and the characters in Tang poems or Song lyrics were taken as the nodes within the hyperedge. Then, the topological indexes and network characteristics of the hypernetworks of Tang poems and Song lyrics, such as node hyperdegree, node hyperdegree distribution, hyperedge node degree, and hyperedge node degree distribution, were experimentally analyzed, in order to find out the characters use, word use and aesthetic tendency of poets in Tang dynasty and lyricists in Song dynasty. Finally, based on the works of poems and lyrics of Li Bai, Du Fu, Su Shi and Xin Qiji, the work hypernetworks were constructed, and the relevant network parameters were calculated. The analysis results show that there is a great difference between the maximum and minimum hyperdegrees of the two hypernetwork, and the distribution of the hyperdegrees is approximate to the power-law distribution, which indicates the scale-free property of the two hypernetworks. In addition, the degrees of hyperedge nodes in Tang poem hypernetwork are also have obvious distribution characteristics. In specific, the degrees of hyperedge nodes in Tang poems and Song lyrics are more distributed between 20 and 100, and the degrees of hyperedge nodes in Song lyric hypernetwork are more distributed between 30 and 130. Moreover, it is found that the work hypernetworks have smaller average path length and a larger clustering coefficient, which reflects the small-world characteristics of the work hypernetworks.

Reference | Related Articles | Metrics

Select

Review of spatio-temporal trajectory sequence pattern mining methods

KANG Jun, HUANG Shan, DUAN Zongtao, LI Yixiu

Journal of Computer Applications 2021, 41 (8): 2379-2385. DOI: 10.11772/j.issn.1001-9081.2020101571

Abstract （1098）

PDF （1204KB）（1623）

Save

With the rapid development of global positioning technology and mobile communication technology, huge amounts of trajectory data appear. These data are true reflections of the moving patterns and behavior characteristics of moving objects in the spatio-temporal environment, and they contain a wealth of information which carries important application values for the fields such as urban planning, traffic management, service recommendation, and location prediction. And the applications of spatio-temporal trajectory data in these fields usually need to be achieved by sequence pattern mining of spatio-temporal trajectory data. Spatio-temporal trajectory sequence pattern mining aims to find frequently occurring sequence patterns from the spatio-temporal trajectory dataset, such as location patterns (frequent trajectories, hot spots), activity periodic patterns, and semantic behavior patterns, so as to mine hidden information in the spatio-temporal data. The research progress of spatial-temporal trajectory sequence pattern mining in recent years was summarized. Firstly, the data characteristics and applications of spatial-temporal trajectory sequence were introduced. Then, the mining process of spatial-temporal trajectory patterns was described:the research situation in this field was introduced from the perspectives of mining location patterns, periodic patterns and semantic patterns based on spatial-temporal trajectory sequence. Finally, the problems existing in the current spatio-temporal trajectory sequence pattern mining methods were elaborated, and the future development trends of spatio-temporal trajectory sequence pattern mining method were prospected.

Reference | Related Articles | Metrics

Select

Multi-granularity temporal structure representation based outlier detection method for prediction of oil reservoir

MENG Fan, CHEN Guang, WANG Yong, GAO Yang, GAO Dequn, JIA Wenlong

Journal of Computer Applications 2021, 41 (8): 2453-2459. DOI: 10.11772/j.issn.1001-9081.2020101867

Abstract （358）

PDF （1265KB）（371）

Save

The traditional methods for prediction of oil reservoir utilize the seismic attributes generated when seismic waves passing through the stratum and geologic drilling data to make a comprehensive judgment in combination with the traditional geophysical methods. However, this type of prediction methods has high cost of research and judgement and its accuracy strongly depends on the prior knowledge of the experts. To address the above issues, based on the seismic data of the Subei Basin of Jiangsu Oilfield, and considering the sparseness and randomness of oil-labeling samples, a multi-granularity temporal structure representation based outlier detection algorithm was proposed to perform the prediction by using the post-stack seismic trace data. Firstly, the multi-granularity temporal structures for the single seismic trace data was extracted, and the independent feature representations were formed. Secondly, based on extracting multiple granularity temporal structure representations, feature fusion was carried out to form the fusion representation of seismic trace data. Finally, a cost-sensitive method was utilized for the joint training and judgement to the fused features, so as to obtain the results of oil reservoir prediction for these seismic data. Experiments and simulations of the proposed algorithm were performed on an actual seismic data of Jiangsu Oilfield. Experimental results show that the proposed algorithm is improved by 10% on Area Under Curve (AUC) compared to both of the Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) algorithms.

Reference | Related Articles | Metrics

Select

Algorithm for mining top- k high utility itemsets with negative items

SUN Rui, HAN Meng, ZHANG Chunyan, SHEN Mingyao, DU Shiyu

Journal of Computer Applications 2021, 41 (8): 2386-2395. DOI: 10.11772/j.issn.1001-9081.2020101561

Abstract （378）

PDF （1361KB）（329）

Save

Mininng High Utility Itemsets (HUI) with negative items is one of the emerging itemsets mining tasks. In order to mine the result set of HUI with negative items meeting the user needs, a Top- k High utility itemsets with Negative items (THN) mining algorithm was proposed. In order to improve the temporal and spatial performance of the THN algorithm, a strategy to automatically increase the minimum utility threshold was proposed, and the pattern growth method was used for depth-first search; the search space was pruned by using the redefined subtree utility and the redefined local utility; the transaction merging technology and dataset projection technology were employed to solve the problem of scanning the database for multiple times; in order to increase the utility counting speed, the utility array counting technology was used to calculate the utility of the itemset. Experimental results show that the memory usage of THN algorithm is about 1/60 of that of the HUINIV (High Utility Itemsets with Negative Item Values)-Mine algorithm, and is about 1/2 of that of the FHN (Faster High utility itemset miner with Negative unit profits) algorithm; the THN algorithm takes 1/10 runtime of that of the FHN algorithm; and the THN algorithm achieves better performance on dense datasets.

Reference | Related Articles | Metrics

Project Articles