Search Result

Select

Enterprise ESG indicator prediction model based on richness coordination technology

Yan LI, Guanhua YE, Yawen LI, Meiyu LIANG

Journal of Computer Applications 2025, 45 (2): 670-676. DOI: 10.11772/j.issn.1001-9081.2024030262

Abstract （97）

HTML （5）

PDF （1400KB）（805）

Save

Environmental， Social， and Governance （ESG） indicator is a critical indicator for assessing the sustainability of enterprises. The existing ESG assessment systems face challenges such as narrow coverage， strong subjectivity， and poor timeliness. Thus， there is an urgent need for research on prediction models that can forecast ESG indicator accurately using enterprise data. Addressing the issue of inconsistent information richness among ESG-related features in enterprise data， a prediction model RCT （Richness Coordination Transformer） was proposed for enterprise ESG indicator prediction based on richness coordination technology. In this model， an auto-encoder was used in the upstream richness coordination module to coordinate features with heterogeneous information richness， thereby enhancing the ESG indicator prediction performance of the downstream module. Experimental results on real datasets demonstrate that on various prediction indicators， RCT model outperforms multiple models including Temporal Convolutional Network （TCN）， Long Short-Term Memory （LSTM） network， Self-Attention Model （Transformer）， eXtreme Gradient Boosting （XGBoost）， and Light Gradient Boosting Machine （LightGBM）. The above verifies that the effectiveness and superiority of RCT model in ESG indicator prediction.

Table and Figures | Reference | Related Articles | Metrics

Select

Contrastive knowledge distillation method for object detection

Sheng YANG, Yan LI

Journal of Computer Applications 2025, 45 (2): 354-361. DOI: 10.11772/j.issn.1001-9081.2024020212

Abstract （158）

HTML （19）

PDF （4196KB）（270）

Save

Knowledge distillation is one of the most effective model compression methods in tasks such as image classification， but its application in complex tasks such as object detection is relatively limited. The existing knowledge distillation methods mainly focus on constructing information graphs to filter out noise from foreground or background regions during feature extraction by teachers and students， and then minimizing the mean square error loss between features. However， the objective functions of these methods are difficult to further optimize and only utilize the supervision signals of teachers， resulting in a lack of targeted information of incorrect knowledge for students. Based on this， a Contrastive Knowledge Distillation （CKD） method for object detection was proposed， which redesigned the distillation framework and loss function， and not only used the teacher’s supervision signal， but also utilized the constructed negative samples to provide guidance information for knowledge distillation， allowing students to acquire the teacher’s knowledge and acquire more knowledge through self-learning at the same time. Experimental results of the proposed method compared with the baseline on Pascal VOC and COCO2014 datasets using GFocal （Generalized Focal loss） and YOLOv5 models show that when using GFocal model on Pascal VOC dataset， CKD has the mean Average Precision （mAP） improvement of 5.6 percentage points， and the AP₅₀ （Average Precision@0.50） improvement of 5.6 percentage points； and when using YOLOv5 model on COCO2014 dataset， CKD method has the mAP improvement of 1.1 percentage points， and the AP₅₀ improvement of 1.7 percentage points.

Table and Figures | Reference | Related Articles | Metrics

Select

Port traffic flow prediction based on knowledge graph and spatio-temporal diffusion graph convolutional network

Guixiang XUE, Hui WANG, Weifeng ZHOU, Yu LIU, Yan LI

Journal of Computer Applications 2024, 44 (9): 2952-2957. DOI: 10.11772/j.issn.1001-9081.2023081100

Abstract （236）

HTML （1）

PDF （1614KB）（304）

Save

Accurate prediction of port traffic flow is a challenging task due to its stochastic uncertainty and time-unsteady characteristics. In order to improve the accuracy of port traffic flow prediction， a port traffic flow prediction model based on knowledge graph and spatio-temporal diffusion graph convolution network， named KG-DGCN-GRU， was proposed， taking into account the external disturbances such as meteorological conditions and the opening and closing status of the port-adjacent highway. The factors related to the port traffic network were represented by the knowledge graph， and the semantic information of various external factors were learned from the port knowledge graph by using the knowledge representation method， and Diffusion Graph Convolutional Network （DGCN） and Gated Recurrent Unit （GRU） were used to effectively extract the spatio-temporal dependency features of the port traffic flow. The experimental results based on the Tianjin Port traffic dataset show that KG-DGCN-GRU can effectively improve the prediction accuracy through knowledge graph and diffusion graph convolutional network， the Root Mean Squared Error （RMSE） is reduced by 4.85% and 7.04% and the Mean Absolute Error （MAE） is reduced by 5.80% and 8.17%， compared with Temporal Graph Convolutional Network （T-GCN） and Diffusion Convolutional Recurrent Neural Network （DCRNN） under single step prediction （15 min）.

Table and Figures | Reference | Related Articles | Metrics

Select

Improved adaptive large neighborhood search algorithm for multi-depot vehicle routing problem with time window

Yan LI, Dazhi PAN, Siqing ZHENG

Journal of Computer Applications 2024, 44 (6): 1897-1904. DOI: 10.11772/j.issn.1001-9081.2023060760

Abstract （199）

HTML （12）

PDF （2184KB）（121）

Save

Aiming at the Multi-Depot Vehicle Routing Problem with Time Window （MDVRPTW）， an Improved Adaptive Large Neighborhood Search algorithm （IALNS） was proposed. Firstly， a path segmentation algorithm was improved in the stage of constructing the initial solution. Then， in the optimization stage， the designed removal and repair heuristic operators were used to compete with each other to select the optimal operator， a scoring mechanism was introduced for the operators， and the heuristic operator was selected by roulette. At the same time， the iteration cycle was segmented and the operator weight information was dynamically adjusted in each cycle， effectively to prevent the algorithm from falling into local optimum. Finally， simulated annealing mechanism was adopted as the acceptance criterion of the solution. The relevant parameters of the IALNS were determined by experiments on the Cordeau normative instances， and the solution results of the proposed algorithm were compared with other representative research results in this field. The experimental results show that the solution error between IALNS and Variable Neighborhood Search （VNS） algorithm does not exceed 0.8%， even better in some cases； compared with the multi-phase improved shuffled frog leaping algorithm， the average time-consuming of the proposed algorithm is reduced by 12.8%， and the runtime is shorter for most instances. So the above results verify IALNS is an effective algorithm for solving MDVRPTW.

Table and Figures | Reference | Related Articles | Metrics

Select

Top- k high average utility sequential pattern mining algorithm under one-off condition

Keshuai YANG, Youxi WU, Meng GENG, Jingyu LIU, Yan LI

Journal of Computer Applications 2024, 44 (2): 477-484. DOI: 10.11772/j.issn.1001-9081.2023030268

Abstract （234）

HTML （5）

PDF （519KB）（116）

Save

To address the issue that traditional Sequential Pattern Mining （SPM） does not consider pattern repetition and ignores the effects of utility （unit price or profit） and pattern length on user interest， a Top-k One-off high average Utility sequential Pattern mining （TOUP） algorithm was proposed. The TOUP algorithm mainly includes two core steps： average utility calculation and candidate pattern generation. Firstly， a CSP （Calculation Support of Pattern） algorithm based on the occurrence position of each item and the item repetition relation array was proposed to calculate pattern support， thereby achieving rapid calculation of the average utility of patterns. Secondly， candidate patterns were generated by itemset extension and sequence extension， and a maximum average utility upper bound was proposed. Based on this upper bound， effective pruning of candidate patterns was achieved. Experimental results on five real datasets and one synthetic dataset show that compared to the TOUP-dfs and HAOP-ms algorithms， TOUP algorithm reduces the number of candidate patterns by 38.5% to 99.8% and 0.9% to 77.6%， respectively， and decreases the running time by 33.6% to 97.1% and 57.9% to 97.2%， respectively. Therefore， the algorithm performance of TOUP is better， and it can mine patterns of interests to users more efficiently.

Table and Figures | Reference | Related Articles | Metrics

Select

Lightweight fall detection algorithm framework based on RPEpose and XJ-GCN

Ruiyan LIANG, Hui YANG

Journal of Computer Applications 2024, 44 (11): 3639-3646. DOI: 10.11772/j.issn.1001-9081.2023101379

Abstract （122）

HTML （5）

PDF （1283KB）（24）

Save

The traditional joint keypoint detection model based on the Vision Transformer （ViT） model usually adopts 2D Sine Position Embedding， which is prone to losing key two-dimensional shape information in the image， leading to a decrease in accuracy. For behavior classification models， the traditional Spatio-Temporal Graph Convolutional Network （ST?GCN） suffers from the lack of correlation between non-physically connected joint connections in uni-labeling partitioning strategy. To address the above problems， a lightweight real-time fall detection algorithm framework was designed to detect fall behavior quickly and accurately. The framework contains a joint keypoint detection model RPEpose （Relative Position Encoding pose estimation） and a behavior classification model XJ-GCN （Cross-Joint attention Graph Convolutional Network）. On the one hand， a type of relative position encoding was adopted by the RPEpose model to overcome the position insensitivity defect of the original position encoding and improve the performance of the ViT architecture in joint keypoint detection. On the other hand， an X-Joint （Cross-Joint） attention mechanism was proposed， after reconstructing the partitioning strategy into the XJL （X-Joint Labeling） partitioning strategy， the dependencies between all joint connections were modelled to obtain the potential correlation between joint connections with excellent classification performance and few parameters. Experimental results indicate that， on the COCO 2017 validation set， RPEpose model only requires 8.2 GFLOPs （Giga FLOating Point of operations） of computational overhead while achieving a testing Average Precision （AP） of 74.3% for images with a resolution of 256×192； on the NTU RGB+D dataset， the Top-1 accuracy using Cross Subject （X?Sub） as the partitioning standard is 89.6%， and the proposed framework RPEpose+XJ-GCN has a prediction accuracy of 87.2% at a processing speed of 30 frame/s， verifying its high real-time and accuracy.

Table and Figures | Reference | Related Articles | Metrics

Select

Prediction of taxi demands between urban regions by fusing origin-destination spatial-temporal correlation

Yuan WEI, Yan LIN, Shengnan GUO, Youfang LIN, Huaiyu WAN

Journal of Computer Applications 2023, 43 (7): 2100-2106. DOI: 10.11772/j.issn.1001-9081.2022091364

Abstract （242）

HTML （8）

PDF （1507KB）（487）

Save

Accurate prediction of taxi demands between urban regions can provide decision support information for taxi guidance and scheduling as well as passenger travel recommendation， so as to optimize the relation between taxi supply and demand. However， most of the existing models only focus on modeling and predicting the taxi demands within a region， do not consider enough the spatial-temporal correlation between regions， and pay less attention to the more fine-grained demand prediction between regions. To solve the above problems， a prediction model for taxi demands between urban regions — Origin-Destination fusion with Spatial-Temporal Network （ODSTN） model was proposed. In this model， complex spatial-temporal correlations between regions was captured from spatial dimensions of the regions and region pairs respectively and three temporal dimensions of recent， daily and weekly periods by using graph convolution and attention mechanism， and a new path perception fusion mechanism was designed to combine the multi-angle features and finally realize the taxi demand prediction between urban regions. Experiments were carried out on two real taxi order datasets in Chengdu and Manhattan. The results show that the Mean Absolute Error （MAE）， Root Mean Square Error （RMSE） and Mean Absolute Percentage Error （MAPE） of ODSTN model are 0.897 1， 3.527 4， 50.655 6% and 0.589 6， 1.163 8， 61.079 4%， respectively， indicating that ODSTN model has high accuracy in taxi demand prediction tasks.

Table and Figures | Reference | Related Articles | Metrics

Select

Multi-channel pathological image segmentation with gated axial self-attention

Zhi CHEN, Xin LI, Liyan LIN, Jing ZHONG, Peng SHI

Journal of Computer Applications 2023, 43 (4): 1269-1277. DOI: 10.11772/j.issn.1001-9081.2022030333

Abstract （432）

HTML （8）

PDF （4014KB）（151）

Save

In Hematoxylin-Eosin （HE）-stained pathological images， the uneven distribution of cell staining and the diversity of various tissue morphologies bring great challenges to automated segmentation. Traditional convolutions cannot capture the correlation features between pixels in a large neighborhood， making it difficult to further improve the segmentation performance. Therefore， a Multi-Channel Segmentation Network with gated axial self-attention （MCSegNet） model was proposed to achieve accurate segmentation of nuclei in pathological images. In the proposed model， a dual-encoder and decoder structure was adopted， in which the axial self-attention encoding channel was used to capture global features， while the convolutional encoding channel based on residual structure was used to obtain local fine features. The feature representation was enhanced by feature fusion at the end of the encoding channel， providing a good information base for the decoder. And in the decoder， segmentation results were gradually generated by cascading multiple upsampling modules. In addition， the improved hybrid loss function was used to alleviate the common problem of sample imbalance in pathological images effectively. Experimental results on MoNuSeg2020 public dataset show that the improved segmentation method is 2.66 percentage points and 2.77 percentage points higher than U-Net in terms of F1-score and Intersection over Union （IoU） indicators， respectively， and effectively improves the pathological image segmentation effect and the reliability of clinical diagnosis.

Table and Figures | Reference | Related Articles | Metrics

Select

Repair method for process models with concurrent structures based on token replay

Erjing BAI, Xiaoyan LI, Yuyue DU

Journal of Computer Applications 2023, 43 (2): 499-506. DOI: 10.11772/j.issn.1001-9081.2021122154

Abstract （317）

HTML （4）

PDF （3299KB）（95）

Save

Process mining can build process model according to event logs generated by enterprise information management system. There always exist some deviations between the process model and event logs when the actual business process changes. At this time， the process model needs to be repaired. For the process model with concurrent structures， the precision of some existing repairing methods will be reduced because of the addition of self-loops and invisible transitions. Therefore， a method for repairing process models with concurrent structures was proposed on the basis of logic Petri net and token replay. Firstly， according to the relationship between the input-output places of the sub-model and event logs， the insertion position of the sub-model was determined. Then， the deviation positions were determined by a token replay method. Finally， a method was designed to repair the process models based on logical Petri net. The correctness and effectiveness of this method were verified by carrying out simulations on ProM platform， and the proposed method was compared with Fahland’s and other methods. The results show that the precision of this method is about 85%， which is increased by 17 and 11 percentage points respectively compared with those of Fahland’s and Goldratt methods， In the terms of simplicity， the proposed method does not add any self-loop or invisible transition， while Fahland’s and Goldratt methods add some self-loops and invisible transitions. All of the fitting degrees of the three methods are above 0.9， and the fitting degree of Goldratt method is slightly lower. The above verifies that the model repaired by the proposed method has higher fitness and precision.

Table and Figures | Reference | Related Articles | Metrics

Select

Multi-objective optimization model for unmanned aerial vehicles trajectory based on decomposition and trajectory search

Junyan LIU, Feibo JIANG, Yubo PENG, Li DONG

Journal of Computer Applications 2023, 43 (12): 3806-3815. DOI: 10.11772/j.issn.1001-9081.2022121882

Abstract （244）

HTML （6）

PDF （1873KB）（142）

Save

The traditional Deep Learning （DL）-based multi-objective solvers have the problems of low model utilization and being easy to fall into the local optimum. Aiming at these problems， a Multi-objective Optimization model for Unmanned aerial vehicles Trajectory based on Decomposition and Trajectory search （DTMO-UT） was proposed. The proposed model consists of the encoding and decoding parts. First， a Device encoder （Dencoder） and a Weight encoder （Wencoder） were contained in the encoding part， which were used to extract the state information of the Internet of Things （IoT） devices and the features of the weight vectors. And the scalar optimization sub-problems that were decomposed from the Multi-objective Optimization Problem （MOP） were represented by the weight vectors. Hence， the MOP was able to be solved by solving all the sub-problems. The Wencoder was able to encode all sub-problems， which improved the utilization of the model. Then， the decoding part containing the Trajectory decoder （Tdecoder） was used to decode the encoding features to generate the Pareto optimal solutions. Finally， to alleviate the phenomenon of greedy strategy falling into the local optimum， the trajectory search technology was added in trajectory decoder， that was generating multiple candidate trajectories and selecting the one with the best scalar value as the Pareto optimal solution. In this way， the exploration ability of the trajectory decoder was enhanced during trajectory planning， and a better-quality Pareto set was found. The results of simulation experiments show that compared with the mainstream DL MOP solvers， under the condition of 98.93% model parameter quantities decreasing， the proposed model reduces the distribution of MOP solutions by 0.076%， improves the ductility of the solutions by 0.014% and increases the overall performance by 1.23%， showing strong ability of practical trajectory planning of DTMO-UT model.

Table and Figures | Reference | Related Articles | Metrics

Select

Contrast order-preserving pattern mining algorithm

Yufei MENG, Youxi WU, Zhen WANG, Yan LI

Journal of Computer Applications 2023, 43 (12): 3740-3746. DOI: 10.11772/j.issn.1001-9081.2022121828

Abstract （267）

HTML （5）

PDF （909KB）（129）

Save

Aiming at the problem that the existing contrast sequential pattern mining methods mainly focus on character sequence datasets and are difficult to be applied to time series datasets， a new Contrast Order-preserving Pattern Mining （COPM） algorithm was proposed. Firstly， in the candidate pattern generation stage， a pattern fusion strategy was used to reduce the number of candidate patterns. Then， in the pattern support calculation stage， the support of super-pattern was calculated by using the matching results of sub-patterns. Finally， a dynamic pruning strategy of minimum support threshold was designed to further effectively prune the candidate patterns. Experimental results show that on six real time series datasets， the memory consumption of COPM algorithm is at least 52.1% lower than that of COPM-o （COPM-original） algorithm， 36.8% lower than that of COPM-e （COPM-enumeration） algorithm， and 63.6% lower than that of COPM-p （COPM-prune） algorithm. At the same time， the running time of COPM algorithm is at least 30.3% lower than that of COPM-o algorithm， 8.8% lower than that of COPM-e algorithm and 41.2% lower than that of COPM-p algorithm. Therefore， in terms of algorithm performance， COPM algorithm is superior to COPM-o， COPM-e and COPM-p algorithms. The experimental results verify that COPM algorithm can effectively mine the contrast order-preserving patterns to find the differences between different classes of time series datasets.

Table and Figures | Reference | Related Articles | Metrics

Select

Attribute reduction algorithm based on cluster granulation and divergence among clusters

Yan LI, Bin FAN, Jie GUO

Journal of Computer Applications 2022, 42 (9): 2701-2712. DOI: 10.11772/j.issn.1001-9081.2021081371

Abstract （335）

HTML （10）

PDF （3592KB）（82）

PDF（mobile）（654KB）（14）

Save

Attribute reduction is a hot research topic in rough set theory. Most of the algorithms of attribute reduction for continuous data are based on dominance relations or neighborhood relations. However， continuous datasets do not necessarily have dominance relations in attributes. And the attribute reduction algorithms based on neighborhood relations can adjust the granulation degree through neighborhood radius， but it is difficult to unify the radii due to the different dimensions of attributes and the continuous values of radius parameters， resulting in high computational cost of the whole parameter granulation process. To solve this problem， a multi-granularity attribute reduction strategy based on cluster granulation was proposed. Firstly， the similar samples were classified by the clustering method， and the concepts of approximate set， relative positive region and positive region reduction based on clustering were proposed. Secondly， according to JS （Jensen-Shannon） divergence theory， the difference of data distribution of each attribute among clusters was measured， and representative features were selected to distinguish different clusters. Finally， an attribute reduction algorithm was designed using a discernibility matrix. In the proposed algorithm， the attributes were not required to have ordered relations. Different from neighborhood radius， the clustering parameter was discrete， and the dataset was able to be divided into different granulation degrees by adjusting this parameter. Experimental results on UCI and Kent Ridge datasets show that this attribute reduction algorithm can directly deal with continuous data. At the same time， by using this algorithm， the redundant features in the datasets can be removed while maintaining or even improving the classification accuracy by discrete adjustment of the parameters in a small range.

Table and Figures | Reference | Related Articles | Metrics

Select

Facial expression recognition algorithm based on combination of improved convolutional neural network and support vector machine

Guifang QIAO, Shouming HOU, Yanyan LIU

Journal of Computer Applications 2022, 42 (4): 1253-1259. DOI: 10.11772/j.issn.1001-9081.2021071270

Abstract （538）

HTML （29）

PDF （1504KB）（258）

Save

In view of the problems of the current Convolutional Neural Network （CNN） using end layer features to recognize facial expression， such as complex model structure， too many parameters and unsatisfactory recognition， an optimization algorithm based on the combination of improved CNN and Support Vector Machine （SVM） was proposed. First， the network model was designed by the idea of continuous convolution to obtain more nonlinear activations. Then， the adaptive Global Average Pooling （GAP） layer was used to replace the fully connected layer in traditional CNN to reduce the network parameters. Finally， in order to improve generalization ability of the model， SVM classifier instead of the traditional Softmax function was used to realize expression recognition. Experimental results show that the proposed algorithm achieves 73.4% and 98.06% recognition accuracy on Fer2013 and CK+ datasets， which is 2.2 percentage points higher than the traditional LeNet-5 algorithm on Fer2013 dataset. Moreover， this network model has simple structure， less parameters and good robustness.

Table and Figures | Reference | Related Articles | Metrics

Select

Fast failure recovery method based on local redundant hybrid code

Jingyu LIU, Qiuxia NIU, Xiaoyan LI, Qiaoshuo SHI, Youxi WU

Journal of Computer Applications 2022, 42 (4): 1244-1252. DOI: 10.11772/j.issn.1001-9081.2021111917

Abstract （462）

HTML （7）

PDF （926KB）（68）

Save

The parity blocks of the Maximum-Distance-Separable （MDS） code are all global parity blocks. The length of the reconstruction chain increases with the expansion of the storage system， and the reconstruction performance gradually decreases. Aiming at the above problems， a new type of Non-Maximum-Distance-Separable （Non-MDS） code called local redundant hybrid code Code-LM（s，c） was proposed. Firstly， two types of local parity blocks called horizontal parity block in the strip-set and horizontal-diagonal parity block were added in any strip-sets to reduce the length of the reconstruction chain， and the parity layout of the local redundant hybrid code was designed. Then， four reconstruction formulations of the lost data blocks were designed according to the generation rules of the parity blocks and the common block existed in the reconstruction chains of different data blocks. Finally， double-disk failures were divided into three situations depending on the distances of the strip-sets where the failed disks located and the corresponding reconstruction methods were designed. Theoretical analysis and experimental results show that with the same storage scale， compared with RDP （Row-Diagonal Parity）， the reconstruction time of CodeM（s，c） for single-disk failure and double-disk failure can be reduced by 84% and 77% respectively； compared with V²-Code， the reconstruction time of Code-LM（s，c） for single-disk failure and double-disk failure can be reduced by 67% and 73% respectively. Therefore， local redundant hybrid code can support fast recovery from failed disks and improve reliability of storage system.

Table and Figures | Reference | Related Articles | Metrics

Select

Semantic segmentation of RGB-D indoor scenes based on attention mechanism and pyramid fusion

Na YU, Yan LIU, Xiongju WEI, Yuan WAN

Journal of Computer Applications 2022, 42 (3): 844-853. DOI: 10.11772/j.issn.1001-9081.2021030392

Abstract （543）

HTML （19）

PDF （1447KB）（198）

Save

Aiming at the issue of ineffective fusion of multi-modal features of indoor scene semantic segmentation using RGB-D， a network named APFNet （Attention mechanism and Pyramid Fusion Network） was proposed， in which attention mechanism fusion module and pyramid fusion module were designed. To fully use the complementarity of the RGB features and the Depth features， the attention allocation weights of these two kinds of features were respectively extracted by the attention mechanism fusion module， making the network focus more on the multi-modal feature domain with more information content. Local and global information were fused by pyramid fusion module with four different scales of pyramid features， thus scene context was extracted and segmentation accuracies of object edges and small-scale objects were improved. By integrating these two fusion modules into a three-branch “encoder-decoder” network， an “end-to-end” output was realized. Comarative experiments were implemented with the state-of-the-art methods， such as multi-level RGB-D residual feature Fusion network （RDF-152）， Attention Complementary features Network （ACNet） and Spatial information Guided convolution Network （SGNet） on the SUN RGB-D and NYU Depth v2 datasets. Compared with the best-performing method RDF-152， when the layer number of the encoder network was reduced from 152 to 50， the Pixel Accuracy （PA）， Mean Pixel Accuracy （MPA）， and Mean Intersection over Union （MIoU） of APFNet were respectively increased by 0.4， 1.1 and 3.2 percentage points. The semantic segmentation accuracies for small-scale objects such as pillows and photos， and large-scale objects such as boards and ceilings were increased by 0.9 to 3.4 and 12.4 to 18 percentage points respectively. The results show that the proposed APFNet has some advantages in dealing with the semantic segmentation of indoor scenes.

Table and Figures | Reference | Related Articles | Metrics

Select

Voting instance selection algorithm based on learning to hash

Yajie HUANG, Junhai ZHAI, Xiang ZHOU, Yan LI

Journal of Computer Applications 2022, 42 (2): 389-394. DOI: 10.11772/j.issn.1001-9081.2021071188

Abstract （380）

HTML （22）

PDF （574KB）（120）

Save

With the massive growth of data， how to store and use data has become a hot issue in academic research and industrial applications. As one of the methods to solve these problems， instance selection effectively reduces the difficulty of follow-up work by selecting representative instances from original data according to the established rules. Therefore， a voting instance selection algorithm based on learning to hash was proposed. Firstly， the Principal Component Analysis （PCA） method was used to map high-dimensional data to low-dimensional space. Secondly， the k-means algorithm was used to perform iterative operations by combining with the vector quantization method， and the hash codes of the cluster center were used to represent the data. After that， the classified data were randomly selected according to the proportion， and the final instances were selected by voting after several times independent running of the algorithm. Compared with the Compressed Nearest Neighbor （CNN） algorithm and the instance selection algorithm of linear complexity for big data named LSH-IS-F （Instance Selection algorithm by Hashing with two passes）， the proposed algorithm has the compression ratio improved by an average of 19%. The idea of the proposed algorithm is simple and easy to implement， and the algorithm can control the compression ratio automatically by adjusting the parameters. Experimental results on 7 datasets show that the proposed algorithm has a great advantage compared to random hashing in terms of compression ratio and running time with similar test accuracy.

Table and Figures | Reference | Related Articles | Metrics

Select

Feature construction and preliminary analysis of uncertainty for meta-learning

Yan LI, Jie GUO, Bin FAN

Journal of Computer Applications 2022, 42 (2): 343-348. DOI: 10.11772/j.issn.1001-9081.2021071198

Abstract （542）

HTML （68）

PDF （483KB）（196）

Save

Meta-learning is the learning process of applying machine learning methods （meta-algorithms） to seek the mapping between features of a problem （meta-features） and relative performance measures of the algorithm， thereby forming the learning process of meta-knowledge. How to construct and extract meta-features is an important research content. Concerning the problem that most of meta-features used in the existing related researches are statistical features of data， uncertainty modeling was proposed and the impact of uncertainty on learning system was studied. Based on inconsistency of data， complexity of boundary， uncertainty of model output， linear capability to be classified， degree of attribute overlap， and uncertainty of feature space， six kinds of uncertainty meta-features were established for data or models. At the same time，the uncertainty size of the learning problem itself was measured from different perspectives， and specific definitions were given. The correlations between these meta-features were analyzed on artificial datasets and real datasets of a large number of classification problems， and multiple classification algorithms such as K-Nearest Neighbor （KNN） were used to conduct a preliminary analysis of the correlation between meta-features and test accuracy. Results show that the average degree of correlation is about 0.8， indicating that these meta-features have a significant impact on learning performance.

Table and Figures | Reference | Related Articles | Metrics

Select

Dynamic adjusting threshold algorithm for virtual machine migration

ZHAO Chun, YAN Lianshan, CUI Yunhe, XING Huanlai, FENG Bin

Journal of Computer Applications 2017, 37 (9): 2547-2550. DOI: 10.11772/j.issn.1001-9081.2017.09.2547

Abstract （738）

PDF （639KB）（553）

Save

Aiming at the optimization of servers' energy consumption in data center and the reasonable migration time of Virtual Machine (VM), a VM migration algorithm based on Dynamic Adjusting Threshold (DAT) was proposed. Firstly, the migration threshold was dynamically adjusted by analyzing the historical load data acquired from Physical Machine (PM), then the time for migrating VMs was decided by the delay trigger mechanism and the PM load trend prediction. The VM migration algorithm based on DAT was tested on datacenter platform in the laboratory. Experimental results indicate that compared with the static threshold method, the number of the shut down PMs of the proposed algorithm is larger, and the energy consumption of the data center is lower. The VM migration algorithm based on DAT can dynamically migrate VMs according to the variation of PM load, thus improving the utilization of resources and the efficiency of VM migration, reducing the energy consumption of the data center.

Reference | Related Articles | Metrics

Select

Data combination method based on structure's granulation

YAN Lin, LIU Tao, YAN Shuo, LI Feng, RUAN Ning

Journal of Computer Applications 2015, 35 (2): 358-363. DOI: 10.11772/j.issn.1001-9081.2015.02.0358

Abstract （495）

PDF （1014KB）（389）

Save

In order to study the problem about data combinations occurring in real life, different kinds of data information were combined together, leading to a structure called associated-combinatorial structure. Actually, the structure was constituted by a data set, an associated relation and a partition. The aim was to use the structure to set up a method of data combination. To this end, the associated-combinatorial structure was transformed into a granulation structure by granulating the associated relation. In this process, data combinations were completed in accordance with the data classifications. Moreover, because an associated-combinatorial structure or a granulation structure could be represented by the associated matrix, the transformation from a structure to another structure was characterized by algebraic calculations determined by matrix transformations. Therefore, the research not only involved theoretical analysis for the data combination, but also established the data processing method connected with matrix transformations. Accordingly, a computer program with linear complexity was formulated according to the data combinations method. The experimental result proves that the program is accurate and fast.

Reference | Related Articles | Metrics

Select

Analysis on distinguishing product reviews based on top- k emerging patterns

LIU Lu, WANG Yining, DUAN Lei, NUMMENMAA Jyrki, YAN Li, TANG Changjie

Journal of Computer Applications 2015, 35 (10): 2727-2732. DOI: 10.11772/j.issn.1001-9081.2015.10.2727

Abstract （574）

PDF （994KB）（506）

Save

With the development of e-commerce, online shopping Web sites provide reviews for helping a customer to make the best choice. However, the number of reviews is huge, and the content of reviews is typically redundant and non-standard. Thus, it is difficult for users to go through all reviews in a short time and find the distinguishing characteristics of a product from the reviews. To resolve this problem, a method to mine top- k emerging patterns was proposed and applied to mining reviews of different products. Based on the proposed method, a prototype, called ReviewScope, was designed and implemented. ReviewScope can find significant comments of certain goods as decision basis, and provide visualization results. The case study on real world data set of JD.com demonstrates that ReviewScope is effective, flexible and user-friendly.

Reference | Related Articles | Metrics

Select

PM2.5 concentration prediction model of least squares support vector machine based on feature vector

LI Long MA Lei HE Jianfeng SHAO Dangguo YI Sanli XIANG Yan LIU Lifang

Journal of Computer Applications 2014, 34 (8): 2212-2216. DOI: 10.11772/j.issn.1001-9081.2014.08.2212

Abstract （498）

PDF （781KB）（1214）

Save

To solve the problem of Fine Particulate Matter (PM2.5) concentration prediction, a PM2.5 concentration prediction model was proposed. First, through introducing the comprehensive meteorological index, the factors of wind, humidity, temperature were comprehensively considered; then the feature vector was conducted by combining the actual concentration of SO2, NO2, CO and PM10; finally the Least Squares Support Vector Machine (LS-SVM) prediction model was built based on feature vector and PM2.5 concentration data. The experimental results using the data from the city A and city B environmental monitoring centers in 2013 show that, the forecast accuracy is improved after the introduction of a comprehensive weather index, error is reduced by nearly 30%. The proposed model can more accurately predict the PM2.5 concentration and it has a high generalization ability. Furthermore, the author analyzed the relationship between PM2.5 concentration and the rate of hospitalization, hospital outpatient service amount, and found a high correlation between them.

Reference | Related Articles | Metrics

Select

Design of live video streaming, recording and storage system based on Flex, Red5 and MongoDB

ZHEN Jingjing YE Yan LIU Taijun DAI Cheng WANG Honglai

Journal of Computer Applications 2014, 34 (2): 589-592.

Abstract （679）

PDF （632KB）（821）

Save

In order to improve the conventional situation that network video does not play smoothly during live or on-demand and find storage strategy of mass video data, this paper presented an overall design scheme of a real-time live video recording and storage system. The open source streaming media server Red5 and the rich Internet application technology Flex were utilized to achieve live video streaming and recording. The recorded video data would be stored in the open source NoSQL database MongoDB. The experimental results illustrate that the platform can meet requirements of multi-user access and data storage.〖JP〗