Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Enterprise ESG indicator prediction model based on richness coordination technology
Yan LI, Guanhua YE, Yawen LI, Meiyu LIANG
Journal of Computer Applications    2025, 45 (2): 670-676.   DOI: 10.11772/j.issn.1001-9081.2024030262
Abstract97)   HTML5)    PDF (1400KB)(805)       Save

Environmental, Social, and Governance (ESG) indicator is a critical indicator for assessing the sustainability of enterprises. The existing ESG assessment systems face challenges such as narrow coverage, strong subjectivity, and poor timeliness. Thus, there is an urgent need for research on prediction models that can forecast ESG indicator accurately using enterprise data. Addressing the issue of inconsistent information richness among ESG-related features in enterprise data, a prediction model RCT (Richness Coordination Transformer) was proposed for enterprise ESG indicator prediction based on richness coordination technology. In this model, an auto-encoder was used in the upstream richness coordination module to coordinate features with heterogeneous information richness, thereby enhancing the ESG indicator prediction performance of the downstream module. Experimental results on real datasets demonstrate that on various prediction indicators, RCT model outperforms multiple models including Temporal Convolutional Network (TCN), Long Short-Term Memory (LSTM) network, Self-Attention Model (Transformer), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM). The above verifies that the effectiveness and superiority of RCT model in ESG indicator prediction.

Table and Figures | Reference | Related Articles | Metrics
Contrastive knowledge distillation method for object detection
Sheng YANG, Yan LI
Journal of Computer Applications    2025, 45 (2): 354-361.   DOI: 10.11772/j.issn.1001-9081.2024020212
Abstract158)   HTML19)    PDF (4196KB)(270)       Save

Knowledge distillation is one of the most effective model compression methods in tasks such as image classification, but its application in complex tasks such as object detection is relatively limited. The existing knowledge distillation methods mainly focus on constructing information graphs to filter out noise from foreground or background regions during feature extraction by teachers and students, and then minimizing the mean square error loss between features. However, the objective functions of these methods are difficult to further optimize and only utilize the supervision signals of teachers, resulting in a lack of targeted information of incorrect knowledge for students. Based on this, a Contrastive Knowledge Distillation (CKD) method for object detection was proposed, which redesigned the distillation framework and loss function, and not only used the teacher’s supervision signal, but also utilized the constructed negative samples to provide guidance information for knowledge distillation, allowing students to acquire the teacher’s knowledge and acquire more knowledge through self-learning at the same time. Experimental results of the proposed method compared with the baseline on Pascal VOC and COCO2014 datasets using GFocal (Generalized Focal loss) and YOLOv5 models show that when using GFocal model on Pascal VOC dataset, CKD has the mean Average Precision (mAP) improvement of 5.6 percentage points, and the AP50 (Average Precision@0.50) improvement of 5.6 percentage points; and when using YOLOv5 model on COCO2014 dataset, CKD method has the mAP improvement of 1.1 percentage points, and the AP50 improvement of 1.7 percentage points.

Table and Figures | Reference | Related Articles | Metrics
Port traffic flow prediction based on knowledge graph and spatio-temporal diffusion graph convolutional network
Guixiang XUE, Hui WANG, Weifeng ZHOU, Yu LIU, Yan LI
Journal of Computer Applications    2024, 44 (9): 2952-2957.   DOI: 10.11772/j.issn.1001-9081.2023081100
Abstract236)   HTML1)    PDF (1614KB)(304)       Save

Accurate prediction of port traffic flow is a challenging task due to its stochastic uncertainty and time-unsteady characteristics. In order to improve the accuracy of port traffic flow prediction, a port traffic flow prediction model based on knowledge graph and spatio-temporal diffusion graph convolution network, named KG-DGCN-GRU, was proposed, taking into account the external disturbances such as meteorological conditions and the opening and closing status of the port-adjacent highway. The factors related to the port traffic network were represented by the knowledge graph, and the semantic information of various external factors were learned from the port knowledge graph by using the knowledge representation method, and Diffusion Graph Convolutional Network (DGCN) and Gated Recurrent Unit (GRU) were used to effectively extract the spatio-temporal dependency features of the port traffic flow. The experimental results based on the Tianjin Port traffic dataset show that KG-DGCN-GRU can effectively improve the prediction accuracy through knowledge graph and diffusion graph convolutional network, the Root Mean Squared Error (RMSE) is reduced by 4.85% and 7.04% and the Mean Absolute Error (MAE) is reduced by 5.80% and 8.17%, compared with Temporal Graph Convolutional Network (T-GCN) and Diffusion Convolutional Recurrent Neural Network (DCRNN) under single step prediction (15 min).

Table and Figures | Reference | Related Articles | Metrics
Improved adaptive large neighborhood search algorithm for multi-depot vehicle routing problem with time window
Yan LI, Dazhi PAN, Siqing ZHENG
Journal of Computer Applications    2024, 44 (6): 1897-1904.   DOI: 10.11772/j.issn.1001-9081.2023060760
Abstract199)   HTML12)    PDF (2184KB)(121)       Save

Aiming at the Multi-Depot Vehicle Routing Problem with Time Window (MDVRPTW), an Improved Adaptive Large Neighborhood Search algorithm (IALNS) was proposed. Firstly, a path segmentation algorithm was improved in the stage of constructing the initial solution. Then, in the optimization stage, the designed removal and repair heuristic operators were used to compete with each other to select the optimal operator, a scoring mechanism was introduced for the operators, and the heuristic operator was selected by roulette. At the same time, the iteration cycle was segmented and the operator weight information was dynamically adjusted in each cycle, effectively to prevent the algorithm from falling into local optimum. Finally, simulated annealing mechanism was adopted as the acceptance criterion of the solution. The relevant parameters of the IALNS were determined by experiments on the Cordeau normative instances, and the solution results of the proposed algorithm were compared with other representative research results in this field. The experimental results show that the solution error between IALNS and Variable Neighborhood Search (VNS) algorithm does not exceed 0.8%, even better in some cases; compared with the multi-phase improved shuffled frog leaping algorithm, the average time-consuming of the proposed algorithm is reduced by 12.8%, and the runtime is shorter for most instances. So the above results verify IALNS is an effective algorithm for solving MDVRPTW.

Table and Figures | Reference | Related Articles | Metrics
Top- k high average utility sequential pattern mining algorithm under one-off condition
Keshuai YANG, Youxi WU, Meng GENG, Jingyu LIU, Yan LI
Journal of Computer Applications    2024, 44 (2): 477-484.   DOI: 10.11772/j.issn.1001-9081.2023030268
Abstract234)   HTML5)    PDF (519KB)(116)       Save

To address the issue that traditional Sequential Pattern Mining (SPM) does not consider pattern repetition and ignores the effects of utility (unit price or profit) and pattern length on user interest, a Top-k One-off high average Utility sequential Pattern mining (TOUP) algorithm was proposed. The TOUP algorithm mainly includes two core steps: average utility calculation and candidate pattern generation. Firstly, a CSP (Calculation Support of Pattern) algorithm based on the occurrence position of each item and the item repetition relation array was proposed to calculate pattern support, thereby achieving rapid calculation of the average utility of patterns. Secondly, candidate patterns were generated by itemset extension and sequence extension, and a maximum average utility upper bound was proposed. Based on this upper bound, effective pruning of candidate patterns was achieved. Experimental results on five real datasets and one synthetic dataset show that compared to the TOUP-dfs and HAOP-ms algorithms, TOUP algorithm reduces the number of candidate patterns by 38.5% to 99.8% and 0.9% to 77.6%, respectively, and decreases the running time by 33.6% to 97.1% and 57.9% to 97.2%, respectively. Therefore, the algorithm performance of TOUP is better, and it can mine patterns of interests to users more efficiently.

Table and Figures | Reference | Related Articles | Metrics
Lightweight fall detection algorithm framework based on RPEpose and XJ-GCN
Ruiyan LIANG, Hui YANG
Journal of Computer Applications    2024, 44 (11): 3639-3646.   DOI: 10.11772/j.issn.1001-9081.2023101379
Abstract122)   HTML5)    PDF (1283KB)(24)       Save

The traditional joint keypoint detection model based on the Vision Transformer (ViT) model usually adopts 2D Sine Position Embedding, which is prone to losing key two-dimensional shape information in the image, leading to a decrease in accuracy. For behavior classification models, the traditional Spatio-Temporal Graph Convolutional Network (ST?GCN) suffers from the lack of correlation between non-physically connected joint connections in uni-labeling partitioning strategy. To address the above problems, a lightweight real-time fall detection algorithm framework was designed to detect fall behavior quickly and accurately. The framework contains a joint keypoint detection model RPEpose (Relative Position Encoding pose estimation) and a behavior classification model XJ-GCN (Cross-Joint attention Graph Convolutional Network). On the one hand, a type of relative position encoding was adopted by the RPEpose model to overcome the position insensitivity defect of the original position encoding and improve the performance of the ViT architecture in joint keypoint detection. On the other hand, an X-Joint (Cross-Joint) attention mechanism was proposed, after reconstructing the partitioning strategy into the XJL (X-Joint Labeling) partitioning strategy, the dependencies between all joint connections were modelled to obtain the potential correlation between joint connections with excellent classification performance and few parameters. Experimental results indicate that, on the COCO 2017 validation set, RPEpose model only requires 8.2 GFLOPs (Giga FLOating Point of operations) of computational overhead while achieving a testing Average Precision (AP) of 74.3% for images with a resolution of 256×192; on the NTU RGB+D dataset, the Top-1 accuracy using Cross Subject (X?Sub) as the partitioning standard is 89.6%, and the proposed framework RPEpose+XJ-GCN has a prediction accuracy of 87.2% at a processing speed of 30 frame/s, verifying its high real-time and accuracy.

Table and Figures | Reference | Related Articles | Metrics
Prediction of taxi demands between urban regions by fusing origin-destination spatial-temporal correlation
Yuan WEI, Yan LIN, Shengnan GUO, Youfang LIN, Huaiyu WAN
Journal of Computer Applications    2023, 43 (7): 2100-2106.   DOI: 10.11772/j.issn.1001-9081.2022091364
Abstract242)   HTML8)    PDF (1507KB)(487)       Save

Accurate prediction of taxi demands between urban regions can provide decision support information for taxi guidance and scheduling as well as passenger travel recommendation, so as to optimize the relation between taxi supply and demand. However, most of the existing models only focus on modeling and predicting the taxi demands within a region, do not consider enough the spatial-temporal correlation between regions, and pay less attention to the more fine-grained demand prediction between regions. To solve the above problems, a prediction model for taxi demands between urban regions — Origin-Destination fusion with Spatial-Temporal Network (ODSTN) model was proposed. In this model, complex spatial-temporal correlations between regions was captured from spatial dimensions of the regions and region pairs respectively and three temporal dimensions of recent, daily and weekly periods by using graph convolution and attention mechanism, and a new path perception fusion mechanism was designed to combine the multi-angle features and finally realize the taxi demand prediction between urban regions. Experiments were carried out on two real taxi order datasets in Chengdu and Manhattan. The results show that the Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) of ODSTN model are 0.897 1, 3.527 4, 50.655 6% and 0.589 6, 1.163 8, 61.079 4%, respectively, indicating that ODSTN model has high accuracy in taxi demand prediction tasks.

Table and Figures | Reference | Related Articles | Metrics
Multi-channel pathological image segmentation with gated axial self-attention
Zhi CHEN, Xin LI, Liyan LIN, Jing ZHONG, Peng SHI
Journal of Computer Applications    2023, 43 (4): 1269-1277.   DOI: 10.11772/j.issn.1001-9081.2022030333
Abstract432)   HTML8)    PDF (4014KB)(151)       Save

In Hematoxylin-Eosin (HE)-stained pathological images, the uneven distribution of cell staining and the diversity of various tissue morphologies bring great challenges to automated segmentation. Traditional convolutions cannot capture the correlation features between pixels in a large neighborhood, making it difficult to further improve the segmentation performance. Therefore, a Multi-Channel Segmentation Network with gated axial self-attention (MCSegNet) model was proposed to achieve accurate segmentation of nuclei in pathological images. In the proposed model, a dual-encoder and decoder structure was adopted, in which the axial self-attention encoding channel was used to capture global features, while the convolutional encoding channel based on residual structure was used to obtain local fine features. The feature representation was enhanced by feature fusion at the end of the encoding channel, providing a good information base for the decoder. And in the decoder, segmentation results were gradually generated by cascading multiple upsampling modules. In addition, the improved hybrid loss function was used to alleviate the common problem of sample imbalance in pathological images effectively. Experimental results on MoNuSeg2020 public dataset show that the improved segmentation method is 2.66 percentage points and 2.77 percentage points higher than U-Net in terms of F1-score and Intersection over Union (IoU) indicators, respectively, and effectively improves the pathological image segmentation effect and the reliability of clinical diagnosis.

Table and Figures | Reference | Related Articles | Metrics
Repair method for process models with concurrent structures based on token replay
Erjing BAI, Xiaoyan LI, Yuyue DU
Journal of Computer Applications    2023, 43 (2): 499-506.   DOI: 10.11772/j.issn.1001-9081.2021122154
Abstract317)   HTML4)    PDF (3299KB)(95)       Save

Process mining can build process model according to event logs generated by enterprise information management system. There always exist some deviations between the process model and event logs when the actual business process changes. At this time, the process model needs to be repaired. For the process model with concurrent structures, the precision of some existing repairing methods will be reduced because of the addition of self-loops and invisible transitions. Therefore, a method for repairing process models with concurrent structures was proposed on the basis of logic Petri net and token replay. Firstly, according to the relationship between the input-output places of the sub-model and event logs, the insertion position of the sub-model was determined. Then, the deviation positions were determined by a token replay method. Finally, a method was designed to repair the process models based on logical Petri net. The correctness and effectiveness of this method were verified by carrying out simulations on ProM platform, and the proposed method was compared with Fahland’s and other methods. The results show that the precision of this method is about 85%, which is increased by 17 and 11 percentage points respectively compared with those of Fahland’s and Goldratt methods, In the terms of simplicity, the proposed method does not add any self-loop or invisible transition, while Fahland’s and Goldratt methods add some self-loops and invisible transitions. All of the fitting degrees of the three methods are above 0.9, and the fitting degree of Goldratt method is slightly lower. The above verifies that the model repaired by the proposed method has higher fitness and precision.

Table and Figures | Reference | Related Articles | Metrics
Multi-objective optimization model for unmanned aerial vehicles trajectory based on decomposition and trajectory search
Junyan LIU, Feibo JIANG, Yubo PENG, Li DONG
Journal of Computer Applications    2023, 43 (12): 3806-3815.   DOI: 10.11772/j.issn.1001-9081.2022121882
Abstract244)   HTML6)    PDF (1873KB)(142)       Save

The traditional Deep Learning (DL)-based multi-objective solvers have the problems of low model utilization and being easy to fall into the local optimum. Aiming at these problems, a Multi-objective Optimization model for Unmanned aerial vehicles Trajectory based on Decomposition and Trajectory search (DTMO-UT) was proposed. The proposed model consists of the encoding and decoding parts. First, a Device encoder (Dencoder) and a Weight encoder (Wencoder) were contained in the encoding part, which were used to extract the state information of the Internet of Things (IoT) devices and the features of the weight vectors. And the scalar optimization sub-problems that were decomposed from the Multi-objective Optimization Problem (MOP) were represented by the weight vectors. Hence, the MOP was able to be solved by solving all the sub-problems. The Wencoder was able to encode all sub-problems, which improved the utilization of the model. Then, the decoding part containing the Trajectory decoder (Tdecoder) was used to decode the encoding features to generate the Pareto optimal solutions. Finally, to alleviate the phenomenon of greedy strategy falling into the local optimum, the trajectory search technology was added in trajectory decoder, that was generating multiple candidate trajectories and selecting the one with the best scalar value as the Pareto optimal solution. In this way, the exploration ability of the trajectory decoder was enhanced during trajectory planning, and a better-quality Pareto set was found. The results of simulation experiments show that compared with the mainstream DL MOP solvers, under the condition of 98.93% model parameter quantities decreasing, the proposed model reduces the distribution of MOP solutions by 0.076%, improves the ductility of the solutions by 0.014% and increases the overall performance by 1.23%, showing strong ability of practical trajectory planning of DTMO-UT model.

Table and Figures | Reference | Related Articles | Metrics
Contrast order-preserving pattern mining algorithm
Yufei MENG, Youxi WU, Zhen WANG, Yan LI
Journal of Computer Applications    2023, 43 (12): 3740-3746.   DOI: 10.11772/j.issn.1001-9081.2022121828
Abstract267)   HTML5)    PDF (909KB)(129)       Save

Aiming at the problem that the existing contrast sequential pattern mining methods mainly focus on character sequence datasets and are difficult to be applied to time series datasets, a new Contrast Order-preserving Pattern Mining (COPM) algorithm was proposed. Firstly, in the candidate pattern generation stage, a pattern fusion strategy was used to reduce the number of candidate patterns. Then, in the pattern support calculation stage, the support of super-pattern was calculated by using the matching results of sub-patterns. Finally, a dynamic pruning strategy of minimum support threshold was designed to further effectively prune the candidate patterns. Experimental results show that on six real time series datasets, the memory consumption of COPM algorithm is at least 52.1% lower than that of COPM-o (COPM-original) algorithm, 36.8% lower than that of COPM-e (COPM-enumeration) algorithm, and 63.6% lower than that of COPM-p (COPM-prune) algorithm. At the same time, the running time of COPM algorithm is at least 30.3% lower than that of COPM-o algorithm, 8.8% lower than that of COPM-e algorithm and 41.2% lower than that of COPM-p algorithm. Therefore, in terms of algorithm performance, COPM algorithm is superior to COPM-o, COPM-e and COPM-p algorithms. The experimental results verify that COPM algorithm can effectively mine the contrast order-preserving patterns to find the differences between different classes of time series datasets.

Table and Figures | Reference | Related Articles | Metrics
Attribute reduction algorithm based on cluster granulation and divergence among clusters
Yan LI, Bin FAN, Jie GUO
Journal of Computer Applications    2022, 42 (9): 2701-2712.   DOI: 10.11772/j.issn.1001-9081.2021081371
Abstract335)   HTML10)    PDF (3592KB)(82)    PDF(mobile) (654KB)(14)    Save

Attribute reduction is a hot research topic in rough set theory. Most of the algorithms of attribute reduction for continuous data are based on dominance relations or neighborhood relations. However, continuous datasets do not necessarily have dominance relations in attributes. And the attribute reduction algorithms based on neighborhood relations can adjust the granulation degree through neighborhood radius, but it is difficult to unify the radii due to the different dimensions of attributes and the continuous values of radius parameters, resulting in high computational cost of the whole parameter granulation process. To solve this problem, a multi-granularity attribute reduction strategy based on cluster granulation was proposed. Firstly, the similar samples were classified by the clustering method, and the concepts of approximate set, relative positive region and positive region reduction based on clustering were proposed. Secondly, according to JS (Jensen-Shannon) divergence theory, the difference of data distribution of each attribute among clusters was measured, and representative features were selected to distinguish different clusters. Finally, an attribute reduction algorithm was designed using a discernibility matrix. In the proposed algorithm, the attributes were not required to have ordered relations. Different from neighborhood radius, the clustering parameter was discrete, and the dataset was able to be divided into different granulation degrees by adjusting this parameter. Experimental results on UCI and Kent Ridge datasets show that this attribute reduction algorithm can directly deal with continuous data. At the same time, by using this algorithm, the redundant features in the datasets can be removed while maintaining or even improving the classification accuracy by discrete adjustment of the parameters in a small range.

Table and Figures | Reference | Related Articles | Metrics
Facial expression recognition algorithm based on combination of improved convolutional neural network and support vector machine
Guifang QIAO, Shouming HOU, Yanyan LIU
Journal of Computer Applications    2022, 42 (4): 1253-1259.   DOI: 10.11772/j.issn.1001-9081.2021071270
Abstract538)   HTML29)    PDF (1504KB)(258)       Save

In view of the problems of the current Convolutional Neural Network (CNN) using end layer features to recognize facial expression, such as complex model structure, too many parameters and unsatisfactory recognition, an optimization algorithm based on the combination of improved CNN and Support Vector Machine (SVM) was proposed. First, the network model was designed by the idea of continuous convolution to obtain more nonlinear activations. Then, the adaptive Global Average Pooling (GAP) layer was used to replace the fully connected layer in traditional CNN to reduce the network parameters. Finally, in order to improve generalization ability of the model, SVM classifier instead of the traditional Softmax function was used to realize expression recognition. Experimental results show that the proposed algorithm achieves 73.4% and 98.06% recognition accuracy on Fer2013 and CK+ datasets, which is 2.2 percentage points higher than the traditional LeNet-5 algorithm on Fer2013 dataset. Moreover, this network model has simple structure, less parameters and good robustness.

Table and Figures | Reference | Related Articles | Metrics
Fast failure recovery method based on local redundant hybrid code
Jingyu LIU, Qiuxia NIU, Xiaoyan LI, Qiaoshuo SHI, Youxi WU
Journal of Computer Applications    2022, 42 (4): 1244-1252.   DOI: 10.11772/j.issn.1001-9081.2021111917
Abstract462)   HTML7)    PDF (926KB)(68)       Save

The parity blocks of the Maximum-Distance-Separable (MDS) code are all global parity blocks. The length of the reconstruction chain increases with the expansion of the storage system, and the reconstruction performance gradually decreases. Aiming at the above problems, a new type of Non-Maximum-Distance-Separable (Non-MDS) code called local redundant hybrid code Code-LM(sc) was proposed. Firstly, two types of local parity blocks called horizontal parity block in the strip-set and horizontal-diagonal parity block were added in any strip-sets to reduce the length of the reconstruction chain, and the parity layout of the local redundant hybrid code was designed. Then, four reconstruction formulations of the lost data blocks were designed according to the generation rules of the parity blocks and the common block existed in the reconstruction chains of different data blocks. Finally, double-disk failures were divided into three situations depending on the distances of the strip-sets where the failed disks located and the corresponding reconstruction methods were designed. Theoretical analysis and experimental results show that with the same storage scale, compared with RDP (Row-Diagonal Parity), the reconstruction time of CodeM(sc) for single-disk failure and double-disk failure can be reduced by 84% and 77% respectively; compared with V2-Code, the reconstruction time of Code-LM(sc) for single-disk failure and double-disk failure can be reduced by 67% and 73% respectively. Therefore, local redundant hybrid code can support fast recovery from failed disks and improve reliability of storage system.

Table and Figures | Reference | Related Articles | Metrics
Semantic segmentation of RGB-D indoor scenes based on attention mechanism and pyramid fusion
Na YU, Yan LIU, Xiongju WEI, Yuan WAN
Journal of Computer Applications    2022, 42 (3): 844-853.   DOI: 10.11772/j.issn.1001-9081.2021030392
Abstract543)   HTML19)    PDF (1447KB)(198)       Save

Aiming at the issue of ineffective fusion of multi-modal features of indoor scene semantic segmentation using RGB-D, a network named APFNet (Attention mechanism and Pyramid Fusion Network) was proposed, in which attention mechanism fusion module and pyramid fusion module were designed. To fully use the complementarity of the RGB features and the Depth features, the attention allocation weights of these two kinds of features were respectively extracted by the attention mechanism fusion module, making the network focus more on the multi-modal feature domain with more information content. Local and global information were fused by pyramid fusion module with four different scales of pyramid features, thus scene context was extracted and segmentation accuracies of object edges and small-scale objects were improved. By integrating these two fusion modules into a three-branch “encoder-decoder” network, an “end-to-end” output was realized. Comarative experiments were implemented with the state-of-the-art methods, such as multi-level RGB-D residual feature Fusion network (RDF-152), Attention Complementary features Network (ACNet) and Spatial information Guided convolution Network (SGNet) on the SUN RGB-D and NYU Depth v2 datasets. Compared with the best-performing method RDF-152, when the layer number of the encoder network was reduced from 152 to 50, the Pixel Accuracy (PA), Mean Pixel Accuracy (MPA), and Mean Intersection over Union (MIoU) of APFNet were respectively increased by 0.4, 1.1 and 3.2 percentage points. The semantic segmentation accuracies for small-scale objects such as pillows and photos, and large-scale objects such as boards and ceilings were increased by 0.9 to 3.4 and 12.4 to 18 percentage points respectively. The results show that the proposed APFNet has some advantages in dealing with the semantic segmentation of indoor scenes.

Table and Figures | Reference | Related Articles | Metrics
Voting instance selection algorithm based on learning to hash
Yajie HUANG, Junhai ZHAI, Xiang ZHOU, Yan LI
Journal of Computer Applications    2022, 42 (2): 389-394.   DOI: 10.11772/j.issn.1001-9081.2021071188
Abstract380)   HTML22)    PDF (574KB)(120)       Save

With the massive growth of data, how to store and use data has become a hot issue in academic research and industrial applications. As one of the methods to solve these problems, instance selection effectively reduces the difficulty of follow-up work by selecting representative instances from original data according to the established rules. Therefore, a voting instance selection algorithm based on learning to hash was proposed. Firstly, the Principal Component Analysis (PCA) method was used to map high-dimensional data to low-dimensional space. Secondly, the k-means algorithm was used to perform iterative operations by combining with the vector quantization method, and the hash codes of the cluster center were used to represent the data. After that, the classified data were randomly selected according to the proportion, and the final instances were selected by voting after several times independent running of the algorithm. Compared with the Compressed Nearest Neighbor (CNN) algorithm and the instance selection algorithm of linear complexity for big data named LSH-IS-F (Instance Selection algorithm by Hashing with two passes), the proposed algorithm has the compression ratio improved by an average of 19%. The idea of the proposed algorithm is simple and easy to implement, and the algorithm can control the compression ratio automatically by adjusting the parameters. Experimental results on 7 datasets show that the proposed algorithm has a great advantage compared to random hashing in terms of compression ratio and running time with similar test accuracy.

Table and Figures | Reference | Related Articles | Metrics
Feature construction and preliminary analysis of uncertainty for meta-learning
Yan LI, Jie GUO, Bin FAN
Journal of Computer Applications    2022, 42 (2): 343-348.   DOI: 10.11772/j.issn.1001-9081.2021071198
Abstract542)   HTML68)    PDF (483KB)(196)       Save

Meta-learning is the learning process of applying machine learning methods (meta-algorithms) to seek the mapping between features of a problem (meta-features) and relative performance measures of the algorithm, thereby forming the learning process of meta-knowledge. How to construct and extract meta-features is an important research content. Concerning the problem that most of meta-features used in the existing related researches are statistical features of data, uncertainty modeling was proposed and the impact of uncertainty on learning system was studied. Based on inconsistency of data, complexity of boundary, uncertainty of model output, linear capability to be classified, degree of attribute overlap, and uncertainty of feature space, six kinds of uncertainty meta-features were established for data or models. At the same time,the uncertainty size of the learning problem itself was measured from different perspectives, and specific definitions were given. The correlations between these meta-features were analyzed on artificial datasets and real datasets of a large number of classification problems, and multiple classification algorithms such as K-Nearest Neighbor (KNN) were used to conduct a preliminary analysis of the correlation between meta-features and test accuracy. Results show that the average degree of correlation is about 0.8, indicating that these meta-features have a significant impact on learning performance.

Table and Figures | Reference | Related Articles | Metrics
Dynamic adjusting threshold algorithm for virtual machine migration
ZHAO Chun, YAN Lianshan, CUI Yunhe, XING Huanlai, FENG Bin
Journal of Computer Applications    2017, 37 (9): 2547-2550.   DOI: 10.11772/j.issn.1001-9081.2017.09.2547
Abstract738)      PDF (639KB)(553)       Save
Aiming at the optimization of servers' energy consumption in data center and the reasonable migration time of Virtual Machine (VM), a VM migration algorithm based on Dynamic Adjusting Threshold (DAT) was proposed. Firstly, the migration threshold was dynamically adjusted by analyzing the historical load data acquired from Physical Machine (PM), then the time for migrating VMs was decided by the delay trigger mechanism and the PM load trend prediction. The VM migration algorithm based on DAT was tested on datacenter platform in the laboratory. Experimental results indicate that compared with the static threshold method, the number of the shut down PMs of the proposed algorithm is larger, and the energy consumption of the data center is lower. The VM migration algorithm based on DAT can dynamically migrate VMs according to the variation of PM load, thus improving the utilization of resources and the efficiency of VM migration, reducing the energy consumption of the data center.
Reference | Related Articles | Metrics
Data combination method based on structure's granulation
YAN Lin, LIU Tao, YAN Shuo, LI Feng, RUAN Ning
Journal of Computer Applications    2015, 35 (2): 358-363.   DOI: 10.11772/j.issn.1001-9081.2015.02.0358
Abstract495)      PDF (1014KB)(389)       Save

In order to study the problem about data combinations occurring in real life, different kinds of data information were combined together, leading to a structure called associated-combinatorial structure. Actually, the structure was constituted by a data set, an associated relation and a partition. The aim was to use the structure to set up a method of data combination. To this end, the associated-combinatorial structure was transformed into a granulation structure by granulating the associated relation. In this process, data combinations were completed in accordance with the data classifications. Moreover, because an associated-combinatorial structure or a granulation structure could be represented by the associated matrix, the transformation from a structure to another structure was characterized by algebraic calculations determined by matrix transformations. Therefore, the research not only involved theoretical analysis for the data combination, but also established the data processing method connected with matrix transformations. Accordingly, a computer program with linear complexity was formulated according to the data combinations method. The experimental result proves that the program is accurate and fast.

Reference | Related Articles | Metrics
Analysis on distinguishing product reviews based on top- k emerging patterns
LIU Lu, WANG Yining, DUAN Lei, NUMMENMAA Jyrki, YAN Li, TANG Changjie
Journal of Computer Applications    2015, 35 (10): 2727-2732.   DOI: 10.11772/j.issn.1001-9081.2015.10.2727
Abstract574)      PDF (994KB)(506)       Save
With the development of e-commerce, online shopping Web sites provide reviews for helping a customer to make the best choice. However, the number of reviews is huge, and the content of reviews is typically redundant and non-standard. Thus, it is difficult for users to go through all reviews in a short time and find the distinguishing characteristics of a product from the reviews. To resolve this problem, a method to mine top- k emerging patterns was proposed and applied to mining reviews of different products. Based on the proposed method, a prototype, called ReviewScope, was designed and implemented. ReviewScope can find significant comments of certain goods as decision basis, and provide visualization results. The case study on real world data set of JD.com demonstrates that ReviewScope is effective, flexible and user-friendly.
Reference | Related Articles | Metrics
PM2.5 concentration prediction model of least squares support vector machine based on feature vector
LI Long MA Lei HE Jianfeng SHAO Dangguo YI Sanli XIANG Yan LIU Lifang
Journal of Computer Applications    2014, 34 (8): 2212-2216.   DOI: 10.11772/j.issn.1001-9081.2014.08.2212
Abstract498)      PDF (781KB)(1214)       Save

To solve the problem of Fine Particulate Matter (PM2.5) concentration prediction, a PM2.5 concentration prediction model was proposed. First, through introducing the comprehensive meteorological index, the factors of wind, humidity, temperature were comprehensively considered; then the feature vector was conducted by combining the actual concentration of SO2, NO2, CO and PM10; finally the Least Squares Support Vector Machine (LS-SVM) prediction model was built based on feature vector and PM2.5 concentration data. The experimental results using the data from the city A and city B environmental monitoring centers in 2013 show that, the forecast accuracy is improved after the introduction of a comprehensive weather index, error is reduced by nearly 30%. The proposed model can more accurately predict the PM2.5 concentration and it has a high generalization ability. Furthermore, the author analyzed the relationship between PM2.5 concentration and the rate of hospitalization, hospital outpatient service amount, and found a high correlation between them.

Reference | Related Articles | Metrics
Design of live video streaming, recording and storage system based on Flex, Red5 and MongoDB
ZHEN Jingjing YE Yan LIU Taijun DAI Cheng WANG Honglai
Journal of Computer Applications    2014, 34 (2): 589-592.  
Abstract679)      PDF (632KB)(821)       Save
In order to improve the conventional situation that network video does not play smoothly during live or on-demand and find storage strategy of mass video data, this paper presented an overall design scheme of a real-time live video recording and storage system. The open source streaming media server Red5 and the rich Internet application technology Flex were utilized to achieve live video streaming and recording. The recorded video data would be stored in the open source NoSQL database MongoDB. The experimental results illustrate that the platform can meet requirements of multi-user access and data storage.〖JP〗
Related Articles | Metrics
High-speed data acquisition and transmission system for low-energy X-ray industrial CT
YANG Lei GAOFuqiang LI Ling CHEN Yan LI Ren
Journal of Computer Applications    2014, 34 (11): 3361-3364.   DOI: 10.11772/j.issn.1001-9081.2014.11.3361
Abstract291)      PDF (623KB)(601)       Save

To meet the application demand of high speed scanning and massive data transmission in industrial Computed Tomography (CT) of low-energy X-ray, a system of high-speed data acquisition and transmission for low-energy X-ray industrial CT was designed. X-CARD 0.2-256G of DT company was selected as the detector. In order to accommodate the needs of high-speed analog to digital conversion, high-speed time division multiplexing circuit and ping-pong operation for the data cache were combined; a gigabit Ethernet design was conducted with Field Programmable Gate Array (FPGA) selected as the master chip,so as to meet the requirements of high-speed transmission of multi-channel data. The experimental result shows that the speed of data acquisition system reaches 1MHz, the transmission speed reaches 926Mb/s and the dynamic range is greater than 5000. The system can effectively shorten the scanning time of low energy X-ray detection, which can meet the requirements of data transmission of more channels.

Reference | Related Articles | Metrics
Query algorithm based on mesh structure in large-scale smart grid
WANG Yan HAO Xiuping SONG Baoyan LI Xuecheng XING Zengwei
Journal of Computer Applications    2014, 34 (11): 3126-3130.   DOI: 10.11772/j.issn.1001-9081.2014.11.3126
Abstract226)      PDF (841KB)(576)       Save

Currently, the query of transmission lines monitoring system in smart grid is mostly aiming at the global query of Wireless Sensor Network (WSN), which cannot satisfy the flexible and efficient query requirements based on any area. The layout and query characteristics of network were analyzed in detail, and a query algorithm based on mesh structure in large-scale smart grid named MSQuery was proposed. The algorithm aggregated the data of query nodes within different grids to one or more logical query trees, and an optimized path of collecting query result was built by the merging strategy of the logical query tree. Experiments were conducted among MSQuery, RSA which used routing structure for querying and SkySensor which used cluster structure for querying. The simulation results show that MSQuery can quickly return the query results in query window, reduce the communication cost, and save the energy of sensor nodes.

Reference | Related Articles | Metrics
Nonlinear modeling of power amplifier based on improved radial basis function networks
LI Ling LIU Taijun YE Yan LIN Wentao
Journal of Computer Applications    2014, 34 (10): 2904-2907.   DOI: 10.11772/j.issn.1001-9081.2014.10.2904
Abstract285)      PDF (535KB)(384)       Save

Aiming at the nonlinear modeling of Power Amplifier (PA), an improved Radial Basis Function Neural Networks (RBFNN) model was proposed. Firstly, time-delay of cross terms and output feedback were added in the input. Parameters (weigths and centers) of the proposed model were extracted using the Orthogonal Least Square (OLS) algorithm. Then Doherty PA was trained and validated successfully by 15MHz three-carrier Wideband Code Division Multiple Access (WCDMA) signal, and the Normalized Mean Square Error (NMSE) can reach -45dB. Finally, the inverse class F power amplifier was used to test the universality of the model. The simulation results show that the model can more truly fit characteristics of power amplifier.

Reference | Related Articles | Metrics
Algorithm of optimal surface deployment in wireless sensor networks
LI Yingfang YAN Li YANG Bo
Journal of Computer Applications    2013, 33 (10): 2730-2733.  
Abstract664)      PDF (608KB)(740)       Save
Node deployment is a basic problem in sensor networks, which directly relates to the performance of the entire network. Most existing researches on sensor network node deployment are for the case of twodimensional planar and three dimensions space, but very few researches for threedimensional surface deployment scenario. This paper proposed an algorithm of optimal surface deployment in wireless sensor networks. First by mathematical or differential geometry method for threedimensional surface it constructed mathematical model, and then through the centroid of the threedimensional surface Voronoi subdivision partitions, an error function was proposed to evaluate the superiority of deployment method. Finally compared with other surface deployment methods, the performance of the proposed algorithm in this paper is superior.
Related Articles | Metrics
Digital watermarking protocol based on El Gamal algorithm
YAN Lixia XIAO Mingbo
Journal of Computer Applications    2013, 33 (09): 2529-2531.   DOI: 10.11772/j.issn.1001-9081.2013.09.2529
Abstract838)      PDF (623KB)(543)       Save
In light of the drawbacks of current digital watermarking protocols, such as requiring frequent involvement of buyers, assuming that buyers' knowledge of signature or watermark, and not considering appropriate usage control of digital products, a secure, practical and extensible watermarking protocol was proposed, by utilizing the homomorphic, commutative El Gamal encryption algorithm and the machine fingerprint-based copyright control scheme. Besides the basic functions of the digital watermarking protocol, this protocol also considered the interests of both buyer and seller to some extent, and improved user's experience with a transaction model similar to the traditional one.
Related Articles | Metrics
Optimization algorithm for I-V curve fitting of solar cell
HU Keman HU Haiyan LIU Guiguo
Journal of Computer Applications    2013, 33 (05): 1481-1484.   DOI: 10.3724/SP.J.1087.2013.01481
Abstract929)      PDF (679KB)(733)       Save
A new optimization algorithm, GA-AFSA, was proposed by integrating Genetic Algorithm (GA) and Artificial Fish Swarm Algorithm (AFSA) to fit for the mathematic model of I-V curve of solar cell. It maintained the global optimization advantages of GA and quick convergence of AFSA while overcoming the defects of GA's slow convergence and AFSA's stepping without a definite purpose. By fitting the five important parameters of I-V curve, namely the photo-generated current of solar cell, quality factor of diode, series resistance, reverse saturation current and shunt resistance, GA-AFSA made a great improvement. Compared with the existing algorithm, the new one has a higher precision and a rapid convergence speed.
Reference | Related Articles | Metrics
Detection and defense scheme for selective forwarding attacks in wireless sensor network
FU Xiang-yan LI Ping WU Jia-ying
Journal of Computer Applications    2012, 32 (10): 2711-2715.   DOI: 10.3724/SP.J.1087.2012.02711
Abstract939)      PDF (956KB)(567)       Save
To improve the detection rate for malicious node and the defensive ability of system,a detection method based on optimal random routing algorithm and neighbor node monitoring was proposed, which was against the selective forwarding attack about Wireless Sensor Network (WSN). This method created the forward path by introducing some parameters such as distance, trust degree, etc. At the same time, it also used the node monitor scheme to detect and defend malicious node during the routing discovery and selection process. Simulation was completed at Matlab environment and performance comparison was done with other methods. Analysis and simulation results show that this method is effective for detecting selective forwarding attack and can ensure reliable packets delivery to destination using relatively less energy.
Reference | Related Articles | Metrics
Data storage method supporting large-scale smart grid
SONG Bao-yan ZHANG Hong-mei WANG Yan LI Qiong
Journal of Computer Applications    2012, 32 (09): 2496-2499.   DOI: 10.3724/SP.J.1087.2012.02496
Abstract1006)      PDF (848KB)(610)       Save
Concerning that the monitoring data in large-scale smart grid are massive, real-time and dynamic, a new data storage approach supporting large-scale smart grid based on data-centric was proposed, which is a hierarchical extension scheme for storing massive dynamic data. Firstly, the extended Hash coding method could adjust the number of storage nodes dynamically to avoid data loss of sudden or frequent events and increase system availability. Then, the multi-threshold leveling method was used to distribute data to multiple storage nodes, which could avoid hotspot storage problem and achieve load balance. Simulation results show that this method is able to satisfy the need of massive data storage, to obtain better load balance, to lower the total energy consumption and to extend the life cycle of the whole network.
Reference | Related Articles | Metrics