Data science and technology

Select

Visual analysis of multivariate spatio-temporal data for origin-destination flow

Siyi ZHOU, Tianrui LI

Journal of Computer Applications 2024, 44 (2): 452-459. DOI: 10.11772/j.issn.1001-9081.2023020178

Abstract （138）

HTML （8）

PDF （3328KB）（803）

Save

Integrated Circuit （IC） card can record a resident’s mobile travel， reflecting the resident’s Origin-Destination （OD） information. However， due to the large scale of OD flow data， it is easy to cause visual clutter when visualizing the spatial distribution of OD flow directly. Moreover， multivariate data is difficult to be combined with flow data because it contains a variety of different types of data. To solve the problem that direct visualizing the spatial distribution of large-scale OD data is easy to cause visual occlusion， a flow clustering method based on Orthogonal Nonnegative Matrix Decomposition （ONMF） was proposed. The OD data was clustered before being visualized， so that unnecessary occlusion was reduced. For that it is difficult to combine and analyze multivariate spatio-temporal data with multiple types， a site multivariate time series data view for bus stop was designed. Bus stop flow and four types of multivariate data — air quality， air temperature， relative humidity， and rainfall were coded on the same time series， to improve the spatial utilization rate of the view， and could be compared and analyzed. To assist users to explore and analyze， an interactive visual analysis system was developed based on origin-destination flow and multivariate data， and a variety of interactive operations were designed to improve the efficiency of user exploration. Finally， based on the Singapore IC card dataset， the proposed clustering method was evaluated from clustering effect and running time. In the comparison experiment results， using silhouette coefficient to evaluate the clustering effect， the clustering effect of the proposed method is improved by 0.028 compared with the original method and 0.253 compared with K-means clustering method. The running time comparison results show that its running time is 254 seconds less than that of ONMFS （Orthogonal NMF through Subspace exploration） method with better clustering effect. The effectiveness of the system was verified by case analysis and system function comparison.

Table and Figures | Reference | Related Articles | Metrics

Select

Two-stage recommendation algorithm of Siamese graph convolutional neural network

Zhiwen JING, Yujia ZHANG, Boting SUN, Hao GUO

Journal of Computer Applications 2024, 44 (2): 469-476. DOI: 10.11772/j.issn.1001-9081.2023020180

Abstract （183）

HTML （21）

PDF （2896KB）（586）

Save

To solve the problem that the two-tower neural network in the recommendation system is difficult to learn the interaction information between the user side and the item side and the graph connection information， a new algorithm TSN （Two-stage Siamese graph convolutional Neural network recommendation algorithm） was proposed. First， a heterogeneous graph based on user behavior was built. Then， a graph convolutional Siamese network was designed between the two-tower neural networks， so as to achieve information interaction while learning the connection information of the heterogeneous graph. Finally， by designing a special structure of two-stage information sharing mechanism， the neural networks on the user side and the item side could transmit information dynamically and bidirectionally during the training process， and neural network cascading was effectively avoided. In comparative experiments on MovieLens and Douban movie datasets， the NDCG@10， NDCG@50， NDCG@100 of the proposed algorithm are 11.39% to 23.98% higher than those of the optimal benchmark algorithm DAT （Dual Augmented Two-tower model for online large-scale recommendation）. The results show that the proposed algorithm can alleviate the problem of lack of information interaction in the two-tower neural network； and significantly improves the recommendation performance compared with the previous algorithms.

Table and Figures | Reference | Related Articles | Metrics

Select

Efficient similar exercise retrieval model based on unsupervised semantic hashing

Wei TONG, Liyang HE, Rui LI, Wei HUANG, Zhenya HUANG, Qi LIU

Journal of Computer Applications 2024, 44 (1): 206-216. DOI: 10.11772/j.issn.1001-9081.2023091260

Abstract （196）

HTML （6）

PDF （1988KB）（159）

Save

Finding similar exercises aims to retrieve exercises with similar testing goals to a given query exercise from the exercise database. As online education evolves， the exercise database is growing in size， and due to the professional characteristic of the exercises， it is not easy to annotate their relations. Thus， online education systems require an efficient and unsupervised model for finding similar exercise. Unsupervised semantic hashing can map high-dimensional data to compact and efficient binary representation under the premise of unsupervised signals. However，it is inadequate to simply apply the semantic hashing model to the similar exercise retrieval model because exercise data contains rich semantic information while the representation space of binary vector is limited. To address this issue， a similar exercise retrieval model was introduced to acquire and retain crucial information. Firstly， a crucial information acquisition module was designed to acquire critical information from exercise data and a de-redundancy object loss was proposed to eliminate redundant information. Secondly， a time-aware activation function was introduced to reduce coding information loss. Thirdly， to maximize the utilization of the Hamming space， a bit balance loss and a bit independent loss were introduced to optimize the distribution of binary representation in the optimization process. Experimental results on MATH and HISTORY datasets demonstrate that the proposed model outperforms the state-of-the-art text semantic hashing model Deep Hash InfoMax （DHIM）， with an average improvement of approximately 54% and 23% respectively across three recall settings. Moreover， compared to the best-performing similar exercise retrieval model QuesCo， the proposed model demonstrates a clear advantage on search efficiency.

Table and Figures | Reference | Related Articles | Metrics

Select

Social event recommendation method based on unexpectedness metric

Tao SUN, Zhangtian DUAN, Haonan ZHU, Peihao GUO, Heli SUN

Journal of Computer Applications 2024, 44 (3): 760-766. DOI: 10.11772/j.issn.1001-9081.2023030362

Abstract （207）

HTML （15）

PDF （919KB）（127）

Save

In Event-Based Social Network （EBSN）， the recommendation work starts from the user historical preferences to model user preferences， which hinders the scope and ways for users to access new things. Aiming at the above problems， an unexpectedness metric-based social event recommendation model was proposed， namely UER（Unexpectedness-based Event Recommendation）. UER model included two sub-models， Base and Unexpected. Firstly， based on the interaction sequence characteristics of users， events， and user historical events， the Base sub-model used the attention mechanism to measure the weights of events in user historical preferences， and finally predicted the probabilities of users participating in events. Secondly， multiple interest representations of the user were extracted by Unexpected sub-model through the self-attention mechanism to calculate the unexpectedness of the user itself and the unexpectedness value of the candidate event to the user according to the multiple interest representations of the user， so as to measure the unexpectedness of the recommended event. Experimental results on Meetup-California dataset show that compared with Deep Interest Network （DIN） and Personalized Unexpected Recommender System （PURS）， the recommendation Hit Ratio （HR） of the UER model is increased by 22.9% and 30.3%， the Normalized Discounted Cumulative Gain （NDCG） is increased by 27.5% and 42.3%， and the unexpectedness of recommended events is increased by 54.5% and 21.4% respectively. On Meetup-NewYork dataset， the recommendation HR of the UER model is increased by 18.2% and 21.8%， the NDCG is increased by 26.9% and 32.0%， and the unexpectedness of recommended events is increased by 52.6% and 20.8% respectively.

Table and Figures | Reference | Related Articles | Metrics

Select

Motif detection algorithm in multiplex networks

Shuhong XUE, Biao FENG, Hailong YU, Li WANG, Yunyun YANG

Journal of Computer Applications 2024, 44 (3): 752-759. DOI: 10.11772/j.issn.1001-9081.2023030300

Abstract （218）

HTML （13）

PDF （2299KB）（206）

Save

The interaction between entities in complex systems is vividly described by multiplex networks， and motifs frequently appear in networks as a higher-order structure. Compared with single-layer motifs， multiplex motifs have the characteristics of large quantity， diverse types， and complicated structure. Given the current lack of complete detection algorithm for multiplex motifs， a Fast Algorithm for Multiplex Motif Detection （FAMMD） suitable for multiplex networks was proposed. Firstly， an improved ESU （Enumerate SUbgraphs） algorithm was used to enumerate multiplex subgraphs. Then a method combining layer markers and binary strings was used for accelerating the process of isomorphism detection， and a null model that preserved degree sequences and inter-layer dependencies was constructed for multiplex subgraph testing. Finally， motif detection was performed on two-layer real networks. Multiplex motifs exhibited a closely connected triple mode， and they were more homogeneous in social networks while more complementary in transportation networks. Experimental results show that the proposed method can accurately and quickly detect multiplex motifs that reflect the structure characteristics of the network and conform the actual situation.

Table and Figures | Reference | Related Articles | Metrics

Select

Early classification model of multivariate time series based on orthogonal locality preserving projection and cost optimization

Zixuan YUAN, Xiaoqing WENG, Ningzhen GE

Journal of Computer Applications 2024, 44 (6): 1832-1841. DOI: 10.11772/j.issn.1001-9081.2023060761

Abstract （151）

HTML （5）

PDF （1806KB）（56）

Save

Early Time Series Classification （ETSC） has two contradictory goals： earliness and accuracy. The realization of early classification is always at the expense of its accuracy. The existing optimization-based early classification methods of Multivariate Time Series （MTS） consider the costs of wrong classification and delayed decision-making in the cost function， but ignore the influence of local structure between samples in MTS dataset on classification performance. To solve the problem， an early classification model of MTS based on Orthogonal Locality Preserving Projection （OLPP） and cost Optimization for Accuracy and Earliness （OLPPMOAE） was proposed. First， MTS sample prefixes were mapped to a low-dimensional space by using OLPP to keep the local structure of the original dataset. Then， a group of Gaussian Process （GP） classifiers were trained in low-dimensional space， and the class probabilities of the training set at each moment were generated. Finally， Particle Swarm Optimization （PSO） algorithm was used to learn the optimal parameters in the stopping rule from these kinds of probabilities. The experimental results on six MTS datasets show that， the accuracy of OLPPMOAE is significantly higher than that of the cost-based model $R 1_C l r$ （stopping Rule and Cost function with regularization term l₁ and l₂） with essentially the same earliness， the average accuracy is improved by 11.33% to 15.35%， and the Harmonic Mean （HM） is improved by 4.71% to 9.01%. Therefore， the proposed model can classify MTS as early as possible with high accuracy.

Table and Figures | Reference | Related Articles | Metrics

Select

Fuzzy-rough set based unsupervised dynamic feature selection algorithm

Lei MA, Chuan LUO, Tianrui LI, Hongmei CHEN

Journal of Computer Applications 2023, 43 (10): 3121-3128. DOI: 10.11772/j.issn.1001-9081.2022101543

Abstract （234）

HTML （14）

PDF （511KB）（223）

Save

Dynamic feature selection algorithms can improve the time efficiency of processing dynamic data. Aiming at the problem that there are few unsupervised dynamic feature selection algorithms based on fuzzy-rough sets， an Unsupervised Dynamic Fuzzy-Rough set based Feature Selection （UDFRFS） algorithm was proposed under the condition of features arriving in batches. First， by defining a pseudo triangular norm and new similarity relationship， the process of updating fuzzy relation value was performed on the basis of existing data to reduce unnecessary calculation. Then， by utilizing the existing feature selection results， dependencies were adopted to judge if the original feature part would be recalculated to reduce the redundant process of feature selection， and the feature selection was further speeded up. Experimental results show that compared to the static dependency-based unsupervised fuzzy-rough set feature selection algorithm， UDFRFS can achieve the time efficiency improvement of more than 90 percentage points with good classification accuracy and clustering performance.

Table and Figures | Reference | Related Articles | Metrics

Select

Time series classification method based on multi-scale cross-attention fusion in time-frequency domain

Mei WANG, Xuesong SU, Jia LIU, Ruonan YIN, Shan HUANG

Journal of Computer Applications 2024, 44 (6): 1842-1847. DOI: 10.11772/j.issn.1001-9081.2023060731

Abstract （458）

HTML （10）

PDF （2511KB）（847）

Save

To address the problem of low classification accuracy caused by insufficient potential information interaction between time series subsequences， a time series classification method based on multi-scale cross-attention fusion in time-frequency domain called TFFormer （Time-Frequency Transformer） was proposed. First， time and frequency spectrums of the original time series were divided into subsequences with the same length respectively， and the point-value coupling problem was solved by adding positional embedding after linear projection. Then， the long-term time series dependency problem was solved because the model was made to focus on more important time series features by Improved Multi-Head self-Attention （IMHA） mechanism. Finally， a multi-scale Cross-Modality Attention （CMA） module was proposed to enhance the interaction between the time domain and frequency domain， so that the model could further mine the frequency information of the time series. The experimental results show that compared with Fully Convolutional Network （FCN）， the classification accuracy of the proposed method on Trace， StarLightCurves and UWaveGestureLibraryAll datasets increased by 0.3， 0.9 and 1.4 percentage points. It is proved that by enhancing the information interaction between time domain and frequency domain of the time series， the model convergence speed and classification accuracy can be improved.

Table and Figures | Reference | Related Articles | Metrics

Select

Distributed temporal index for temporal aggregation range query

Fanjun MENG, Bin HAN, Shucheng HUANG, Xiangdong MEI

Journal of Computer Applications 2024, 44 (6): 1848-1854. DOI: 10.11772/j.issn.1001-9081.2023060830

Abstract （144）

HTML （5）

PDF （1444KB）（117）

Save

In the era of big data and cloud computing， querying and analyzing temporal big data faces many important challenges. Focused on the issues such as poor query performance and ineffective utilization of indexes for temporal aggregation range query， a Distributed Temporal Index （DTI） for temporal aggregation range query was proposed. Firstly， random or round-robin strategy was used to partition the temporal data. Secondly， intra-partition index construction algorithm based on timestamp’s bit array prefix was used to build intra-partition index， and partition statistics including time span were recorded. Thirdly， the data partitions whose time span overlapped with the query time interval were selected by predicate pushdown operation， and were pre-aggregated by index scan. Finally， all pre-aggregated values obtained from each partition were merged and aggregated by time. The experimental results show that the execution time of intra-partition index construction algorithm of the index for processing data with density of 2 400 entries per unit of time is similar to the execution time for processing data with density of 0.001 entries per unit of time. Compared to ParTime， the temporal aggregation range query algorithm with index takes at least 22% less time for each step when querying the data in the first 75% of timeline and at least 11% less time for each step when executing selective aggregation. Therefore， the algorithm with index is faster in most temporal aggregate range query tasks and its intra-partition index construction algorithm is capable to solve data sparsity problem with high efficiency.

Table and Figures | Reference | Related Articles | Metrics

Select

Shorter long-sequence time series forecasting model

Zexin XU, Lei YANG, Kangshun LI

Journal of Computer Applications 2024, 44 (6): 1824-1831. DOI: 10.11772/j.issn.1001-9081.2023060799

Abstract （298）

HTML （21）

PDF （2751KB）（119）

Save

Aiming at the problem that most of the existing researches study short-sequence time series forecasting and long-sequence time series forecasting separately， which leads to the poor forecasting accuracy of the model in the shorter long-sequence time series， a Shorter Long-sequence Time Series Forecasting Model （SLTSFM） was proposed. Firstly， a Sequence-to-Sequence （Seq2Seq） structure was constructed using Convolutional Neural Network （CNN） and PBUSM （Probsparse Based on Uniform Selection Mechanism） self-attention mechanism， which was used to extract the features of the long-sequence input. Secondly， “far light， near heavy” strategy was designed to apply to reallocate the features of each time period extracted from multiple Long Short-Term Memory （LSTM） modules， which were more capable of short-sequence input feature extraction. Finally， the reallocated features were used to enhance the extracted long-sequence input features to improve the forecasting accuracy and realize the time series forecasting. Four publicly available time series datasets were utilized to verify the effectiveness of the proposed model. The experimental results demonstrate that， compared with the suboptimal comprehensive performing model Gated Recurrent Unit （GRU）， the Mean Absolute Error （MAE） metrics of SLTSFM were reduced by 61.54%， 13.48%， 0.92% and 19.58% for univariate time series forecasting， and were reduced by 17.01%， 18.13%， 3.24% and 6.73% for multivariate time series forecasting on the four datasets. It’s verified that SLTSFM is effective in improving the accuracy of shorter long-sequence time series forecasting.

Table and Figures | Reference | Related Articles | Metrics

Select

Academic anomaly citation group detection based on local extended community detection

Xinrui LIN, Xiaofei WANG, Yan ZHU

Journal of Computer Applications 2024, 44 (6): 1855-1861. DOI: 10.11772/j.issn.1001-9081.2023050702

Abstract （157）

HTML （7）

PDF （1689KB）（103）

Save

Some scholars in the academic social network may form anomaly citation groups， and excessively cite each other’s papers for profit. Most of the existing anomaly group detection algorithms separate community detection from node representation learning， which leads to the limited performance of anomaly group detection. To deal with the issue， a Group Anomaly Detection based on Local extended community detection （GADL） algorithm was proposed. The author anomaly citation features were extracted by using semantic information such as research field and title content of the paper. An extension metric function based on node transition similarity， node community membership， citation anomaly and BFS （Breath First Search） depth was defined. The optimal anomaly detection performance could be obtained by combining anomaly community detection and anomaly node detection， and jointly optimizing them in a unified framework. Compared with ALP algorithm， the proposed algorithm improved the Area Under Curve （AUC） by 6.07%， 5.35% and 3.38% respectively on the ACM， DBLP1， and DBLP2 datasets.Experimental results on real datasets show that GADL can effectively detect academic anomaly citations.

Table and Figures | Reference | Related Articles | Metrics

Select

Recommendation method based on knowledge‑awareness and cross-level contrastive learning

Jie GUO, Jiayu LIN, Zuhong LIANG, Xiaobo LUO, Haitao SUN

Journal of Computer Applications 2024, 44 (4): 1121-1127. DOI: 10.11772/j.issn.1001-9081.2023050613

Abstract （240）

HTML （15）

PDF （968KB）（207）

Save

As a kind of side information， Knowledge Graph （KG） can effectively improve the recommendation quality of recommendation models， but the existing knowledge-awareness recommendation methods based on Graph Neural Network （GNN） suffer from unbalanced utilization of node information. To address the above problem， a new recommendation method based on Knowledge?awareness and Cross-level Contrastive Learning （KCCL） was proposed. To alleviate the problem of unbalanced node information utilization caused by the sparse interaction data and noisy knowledge graph that deviate from the true representation of inter-node dependencies during information aggregation， a contrastive learning paradigm was introduced into knowledge-awareness recommendation model of GNN. Firstly， the user-item interaction graph and the item knowledge graph were integrated into a heterogeneous graph， and the node representations of users and items were realized by a GNN based on the graph attention mechanism. Secondly， consistent noise was added to the information propagation aggregation layer for data augmentation to obtain node representations of different levels， and the obtained outermost node representation was compared with the innermost node representation for cross-level contrastive learning. Finally， the supervised recommendation task and the contrastive learning assistance task were jointly optimized to obtain the final representation of each node. Experimental results on DBbook2014 and MovieLens-1m datasets show that compared to the second prior contrastive method， the Recall@10 of KCCL is improved by 3.66% and 0.66%， respectively， and the NDCG@10 is improved by 3.57% and 3.29%， respectively， which verifies the effectiveness of KCCL.

Table and Figures | Reference | Related Articles | Metrics

Select

Fuzzy clustering algorithm based on belief subcluster cutting

Yu DING, Hanlin ZHANG, Rong LUO, Hua MENG

Journal of Computer Applications 2024, 44 (4): 1128-1138. DOI: 10.11772/j.issn.1001-9081.2023050610

Abstract （137）

HTML （6）

PDF （4644KB）（348）

Save

Belief Peaks Clustering （BPC） algorithm is a new variant of Density Peaks Clustering （DPC） algorithm based on fuzzy perspective. It uses fuzzy mathematics to describe the distribution characteristics and correlation of data. However， BPC algorithm mainly relies on the information of local data points in the calculation of belief values， instead of investigating the distribution and structure of the whole dataset. Moreover， the robustness of the original allocation strategy is weak. To solve these problems， a fuzzy Clustering algorithm based on Belief Subcluster Cutting （BSCC） was proposed by combining belief peaks and spectral method. Firstly， the dataset was divided into many high-purity subclusters by local belief information. Then， the subcluster was regarded as a new sample， and the spectral method was used for cutting graph clustering through the similarity relationship between clusters， thus coupling local information and global information. Finally， the points in the subcluster were assigned to the class cluster where the subcluster was located to complete the final clustering. Compared with BPC algorithm， BSCC has obvious advantages on datasets with multiple subclusters， and it has the ACCuracy （ACC） improvement of 16.38 and 21.35 percentage points on americanflag dataset and Car dataset， respectively. Clustering experimental results on synthetic datasets and real datasets show that BSCC outperforms BPC and the other seven clustering algorithms on the three evaluation indicators of Adjusted Rand Index （ARI）， Normalized Mutual Information （NMI） and ACC.

Table and Figures | Reference | Related Articles | Metrics

Select

Multi-order nearest neighbor graph clustering algorithm by fusing transition probability matrix

Tongtong XU, Bin XIE, Chunhao ZHANG, Ximei ZHANG

Journal of Computer Applications 2024, 44 (5): 1527-1538. DOI: 10.11772/j.issn.1001-9081.2023050727

Abstract （188）

HTML （15）

PDF （6953KB）（95）

Save

Clustering is to divide a dataset into multiple clusters based on the similarity between samples. Most existing clustering methods face two challenges. On the one hand， when defining the similarity between samples， the spatial distribution structure of the samples is often not considered， making it difficult to construct a stable similarity matrix. On the other hand， the sample graph structure constructed by graph clustering is too complex and has high computational costs. To solve these two problems， a Multi-order Nearest Neighbor Graph Clustering algorithm by fusing transition probability matrix （MNNGC） was proposed. Firstly， the nearest neighbor relationship and spatial distribution structure of samples were comprehensively considered， the similarity defined by shared nearest neighbor was weighted for densification， and the densification affinity matrix between nodes was obtained. Secondly， by utilizing multi-order probability transition between nodes， the correlation degrees of non-adjacent nodes were predicted， and a stable inter-node affinity matrix was obtained by fusing the multi-order transition probability matrix. Then， to further enhance the local structure of the graph， the multi-order nearest neighbor graph of nodes was reconstructed， and hierarchically clustered. Finally， the edge node allocation strategy was optimized. Positioning experimental results show that MNNGC achieves the highest Accuracy （Acc） among comparison clustering algorithms on all the synthetic datasets and 8 UCI datasets. The Acc， Adjusted Mutual Information （AMI）， Adjusted Rand Index （ARI） and Fowlkes and Mallows Index （FMI） of MNNGC algorithm are improved by 38.6， 27.2， 45.4 and 35.1 percentage points， respectively， compared with Local Density Peaks-based Spectral Clustering （LDP-SC） algorithm.

Table and Figures | Reference | Related Articles | Metrics

Select

Large-scale subspace clustering algorithm with Local structure learning

Qize REN, Hongjie JIA, Dongyu CHEN

Journal of Computer Applications 2023, 43 (12): 3747-3754. DOI: 10.11772/j.issn.1001-9081.2022111750

Abstract （238）

HTML （6）

PDF （768KB）（508）

Save

The conventional large-scale subspace clustering methods ignore the local structure that prevails among the data when computing the anchor affinity matrix， and have large error when calculating the approximate eigenvectors of the Laplacian matrix， which is not conducive to data clustering. Aiming at the above problems， a Large-scale Subspace Clustering algorithm with Local structure learning （LLSC） was proposed. In the proposed algorithm， the local structure learning was embedded into the learning of anchor affinity matrix， which was able to comprehensively use global and local information to mine the subspace structure of data. In addition， inspired by Nonnegative Matrix Factorization （NMF）， an iterative optimization method was designed to simplify the solution of anchor affinity matrix. Then， the mathematical relationship between the anchor affinity matrix and the Laplacian matrix was established according to the Nystr?m approximation method， and the calculation method of the eigenvectors of the Laplacian matrix was modified to improve the clustering performance. Compared to LMVSC （Large-scale Multi-View Subspace Clustering）， SLSR （Scalable Least Square Regression）， LSC-k （Landmark-based Spectral Clustering using k-means）， and k-FSC（k-Factorization Subspace Clustering）， LLSC demonstrates significant improvements on four widely used large-scale datasets. Specifically， on the Pokerhand dataset， the accuracy of LLSC is 28.18 points percentage higher than that of k-FSC. These results confirm the effectiveness of LLSC.

Table and Figures | Reference | Related Articles | Metrics

Select

Top- k high average utility sequential pattern mining algorithm under one-off condition

Keshuai YANG, Youxi WU, Meng GENG, Jingyu LIU, Yan LI

Journal of Computer Applications 2024, 44 (2): 477-484. DOI: 10.11772/j.issn.1001-9081.2023030268

Abstract （236）

HTML （5）

PDF （519KB）（116）

Save

To address the issue that traditional Sequential Pattern Mining （SPM） does not consider pattern repetition and ignores the effects of utility （unit price or profit） and pattern length on user interest， a Top-k One-off high average Utility sequential Pattern mining （TOUP） algorithm was proposed. The TOUP algorithm mainly includes two core steps： average utility calculation and candidate pattern generation. Firstly， a CSP （Calculation Support of Pattern） algorithm based on the occurrence position of each item and the item repetition relation array was proposed to calculate pattern support， thereby achieving rapid calculation of the average utility of patterns. Secondly， candidate patterns were generated by itemset extension and sequence extension， and a maximum average utility upper bound was proposed. Based on this upper bound， effective pruning of candidate patterns was achieved. Experimental results on five real datasets and one synthetic dataset show that compared to the TOUP-dfs and HAOP-ms algorithms， TOUP algorithm reduces the number of candidate patterns by 38.5% to 99.8% and 0.9% to 77.6%， respectively， and decreases the running time by 33.6% to 97.1% and 57.9% to 97.2%， respectively. Therefore， the algorithm performance of TOUP is better， and it can mine patterns of interests to users more efficiently.

Table and Figures | Reference | Related Articles | Metrics

Select

Survey on anomaly detection algorithms for unmanned aerial vehicle flight data

Chaoshuai QI, Wensi HE, Yi JIAO, Yinghong MA, Wei CAI, Suping REN

Journal of Computer Applications 2023, 43 (6): 1833-1841. DOI: 10.11772/j.issn.1001-9081.2022060808

Abstract （517）

HTML （28）

PDF （3156KB）（594）

Save

Focused on the issue of anomaly detection for Unmanned Aerial Vehicle （UAV） flight data in the field of UAV airborne health monitoring， firstly， the characteristics of UAV flight data， the common flight data anomaly types and the corresponding demands on anomaly detection algorithms for UAV flight data were presented. Then， the existing research on UAV flight data anomaly detection algorithms was reviewed， and these algorithms were classified into three categories： prior-knowledge based algorithms for qualitative anomaly detection， model-based algorithms for quantitative anomaly detection， and data-driven anomaly detection algorithms. At the same time， the application scenarios， advantages and disadvantages of the above algorithms were analyzed. Finally， the current problems and challenges of UAV anomaly detection algorithms were summarized， and key development directions of the field of UAV anomaly detection were prospected， thereby providing reference ideas for future research.

Table and Figures | Reference | Related Articles | Metrics

Select

Feature selection for imbalanced data based on neighborhood tolerance mutual information and whale optimization algorithm

Lin SUN, Jinxu HUANG, Jiucheng XU

Journal of Computer Applications 2023, 43 (6): 1842-1854. DOI: 10.11772/j.issn.1001-9081.2022050691

Abstract （292）

HTML （7）

PDF （1713KB）（254）

Save

Aiming at the problems that most feature selection algorithms do not fully consider class non-uniform distribution of data， the correlation between features and the influence of different parameters on the feature selection results， a feature selection method for imbalanced data based on neighborhood tolerance mutual information and Whale Optimization Algorithm （WOA） was proposed. Firstly， for the binary and multi-class datasets in incomplete neighborhood decision system， two kinds of feature importances of imbalanced data were defined on the basis of the upper and lower boundary regions. Then， to fully reflect the decision-making ability of features and the correlation between features， the neighborhood tolerance mutual information was developed. Finally， by integrating the feature importance of imbalanced data and the neighborhood tolerance mutual information， a Feature Selection for Imbalanced Data based on Neighborhood tolerance mutual information （FSIDN） algorithm was designed， where the optimal parameters of feature selection algorithm were obtained by using WOA， and the nonlinear convergence factor and adaptive inertia weight were introduced to improve WOA and avoid WOA from falling into the local optimum. Experiments were conducted on 8 benchmark functions， the results show that the improved WOA has good optimization performance； and the experimental results of feature selection on 13 binary and 4 multi-class imbalanced datasets show that the proposed algorithm can effectively select the feature subsets with good classification effect compared with the other related algorithms.

Table and Figures | Reference | Related Articles | Metrics

Select

Agglomerative hierarchical clustering algorithm based on hesitant fuzzy set

Wenquan LI, Yimin MAO, Xindong PENG

Journal of Computer Applications 2023, 43 (12): 3755-3763. DOI: 10.11772/j.issn.1001-9081.2023010094

Abstract （307）

HTML （7）

PDF （626KB）（104）

Save

Aiming at the problems of information distortion， poor objectivity of attribute weights， and high time complexity in hesitant fuzzy clustering analysis， an Agglomerative Hierarchical Clustering algorithm based on Hesitant Fuzzy set （AHCHF） was proposed. Firstly， the average value of hesitancy fuzzy elements was used to expand the data object with small hesitation. Secondly， the weights of data object before and after expansion were calculated by using the original information entropy and internal maximum difference， and the comprehensive attribute weight was determined according to the minimum discrimination information between the two weight vectors. Finally， with the goal of making the sum of weighted distances smaller， a center point construction method with constant hesitation was given. Experimental results on specific examples and synthetic datasets show that compared with the classic Hesitant Fuzzy Hierarchical Clustering algorithm （HFHC） and the recent Fuzzy Hierarchical Clustering Algorithm （FHCA）， the proposed AHCHF increases the mean Silhouette Coefficient （SC） by 23.99% and 9.28% respectively， and shortens the running time by 27.18% and 6.40% averagely and respectively， proving that the proposed algorithm can effectively solve the problems of information distortion and poor objectivity of attribute weights， and improve the clustering effect and performance well.

Table and Figures | Reference | Related Articles | Metrics

Select

Directed gene regulatory network inference algorithm based on t-test and stepwise network search

Du CHEN, Yuanyuan LI, Yu CHEN

Journal of Computer Applications 2024, 44 (1): 199-205. DOI: 10.11772/j.issn.1001-9081.2023010086

Abstract （248）

HTML （8）

PDF （1783KB）（67）

Save

In order to overcome the shortage that the Path Consensus Algorithm based on Conditional Mutual Information （PCA-CMI） cannot identify the regulation direction and further improve the accuracy of network inference， a Directed Network Inference algorithm enhanced by t-Test and Stepwise Regulation Search （DNI-T-SRS） was proposed. First， the upstream and downstream relationships of genes were identified by a t-test performed on the expression data with different perturbation settings， by which the conditional genes were selected for guiding Path Consensus （PC） algorithm and calculating Conditional Mutual Inclusive Information （CMI2） to remove redundant regulations， and an algorithm named CMI2-based network inference guided by t-Test （CMI2NI-T） was developed. Then， the corresponding Michaelis-Menten differential equation model was established to fit the expression data， and the network inference result was further corrected by a stepwise network search based on Bayesian information criterion. Numerical experiments were conducted on two benchmark networks of the DREAM6 challenge， and the Area Under Curves （AUCs） of CMI2NI-T were 0.767 9 and 0.979 6， which were 16.23% and 11.62% higher than those of PCA-CMI. With the help of additional process of data fitting， the DNI-T-SRS achieved the inference accuracies of 86.67% and 100.00%， which were 18.19% and 10.52% higher than those of PCA-CMI. The experimental results demonstrate that the proposed DNI-T-SRS can eliminate indirect regulatory relationships and preserve direct regulatory connections， which contributes to precise inference results of gene regulatory networks.

Table and Figures | Reference | Related Articles | Metrics

Select

Maximum cycle truss community search based on hierarchical tree index on directed graphs

Chuanyu ZONG, Chunhe ZHANG, Xiufeng XIA

Journal of Computer Applications 2024, 44 (1): 190-198. DOI: 10.11772/j.issn.1001-9081.2023010071

Abstract （221）

HTML （7）

PDF （2751KB）（51）

Save

Community search aims to find highly cohesive connected subgraphs containing user query vertices in information networks. Cycle truss is a community search model based on cycle triangle. However， the existing index-based cycle truss community search methods suffer from the drawbacks of large index space， low search efficiency， and low community cohesion. A maximum cycle truss community search method based on hierarchical tree index was proposed to address this issue. Firstly， a k-cycle truss decomposition algorithm was proposed， and two important concepts， cycle triangle connectivity and k-level equivalence were introduced. Based on k-level equivalence， the hierarchical tree index TreeCIndex and the table index SuperTable were designed. On this basis， two efficient cycle truss community search algorithms were proposed. The proposed algorithms were compared with existing community search algorithms based on TrussIndex and EquiTruss on four real datasets. The experimental results show that the space consumptions of TreeCIndex and SuperTable are at least 41.5% lower and the index construction time is 8.2% to 98.3% lower compared to TrussIndex and EquiTruss； furthermore， the efficiencies of searching for maximum cycle truss communities is increased by one and two orders of magnitude.

Table and Figures | Reference | Related Articles | Metrics

Select

Identification method of influence nodes in multilayer hypernetwork based on evidence theory

Kuo TIAN, Yinghan WU, Feng HU

Journal of Computer Applications 2024, 44 (1): 182-189. DOI: 10.11772/j.issn.1001-9081.2023010021

Abstract （206）

HTML （10）

PDF （2830KB）（66）

Save

In view of the fact that most researches on multilayer hypernetwork mainly focus on the topology structure， and influence node identification methods involve relatively single indicators， which cannot comprehensively and accurately identify influence nodes， an identification method of influence nodes in multilayer hypernetwork based on evidence theory was proposed. Firstly， based on the topology structure of multilayer hypernetwork， Multilayer Aggregation Hypernetwork （MAH） was constructed according to the idea of aggregation network. Secondly， the discernment framework of problem was defined based on evidence theory. Finally， Dempster-Shafer （D-S） evidence combination method was used to fuse local， location and global indicators of network to identify influence nodes. The proposed method was applied to physics-computer science double-layer scientific research cooperation hypernetwork constructed by arXiv dataset. Compared with hyperdegree centrality， K-shell， closeness centrality methods， etc.， the proposed method has the fastest propagation speed and reaches steady state first in the Susceptible-Infected-Susceptible （SIS） hypernetwork propagation model based on Reactive Process （RP） and Contact Process （CP） strategies. After isolating top 6% of influence nodes， the average network hyperdegree， clustering coefficient and network efficiency decreased. With the increase of proportion of isolated influence nodes， the growth rate of number of network subgraphs was similar to that of the closeness centrality method. The coarse granularity of identification result was measured by monotonicity index value， which reached 0.999 8， and recognition result had a high discrimination degree. The results of several experiments show that the proposed identification method of influence nodes in multilayer hypernetwork is accurate and effective.

Table and Figures | Reference | Related Articles | Metrics

Select

Group recommendation method based on implicit trust and group consensus

Tingting LI, Junfeng CHU, Yanyan WANG

Journal of Computer Applications 2024, 44 (2): 460-468. DOI: 10.11772/j.issn.1001-9081.2023030267

Abstract （200）

HTML （15）

PDF （1711KB）（116）

Save

Focused on the issue that existing group recommendation methods take less account of the implicit estimation of socialization relationships among group members and the use of group consensus to reduce the influence of preference conflicts， a Group Recommendation method based on implicit Trust and group Consensus （GR-TC） was proposed. The method was divided into a recommendation phase and a consensus phase. In the recommendation phase， implicit trust values were mined based on preference information and social relationships among members. The members’ individual preferences and weights， and the initial group preferences were estimated. In the consensus phase， inconsistent members were identified by consensus measurement and identification rules， a maximum harmony optimization consensus model was built， and the group recommendation list was obtained by adjusting and updating the group preferences. Experimental results show that social relationships among members affect group recommendation results， reasonable selection of implicit trust weights improves the harmony of inconsistent members. Compared with the traditional consensus feedback mechanism， the implicit trust-induced maximum harmony consensus feedback mechanism has less adjustment cost and less impact on inconsistent members.

Table and Figures | Reference | Related Articles | Metrics

Select

Feature selection algorithm for high-dimensional data with maximum correlation and maximum difference

Shengjie MENG, Wanjun YU, Ying CHEN

Journal of Computer Applications 2024, 44 (3): 767-771. DOI: 10.11772/j.issn.1001-9081.2023030365

Abstract （219）

HTML （12）

PDF （698KB）（156）

Save

Aiming at the problems of redundant information and too high dimension in high-dimensional data， a Maximum Correlation maximum Difference feature selection algorithm （MCD） based on the maximum correlation of information quantity was proposed. Firstly， the correlation between Mutual Information （MI） measurement features and labels was used to sort and select features with the largest mutual information into feature subsets according to the relevant knowledge of information theory. Then， the information distance was introduced to measure the information redundancy and difference between the two features， and the evaluation criteria were designed to evaluate each feature， so that the correlation between the features and labels， and the difference between the features were the largest. Finally， the forward search strategy combined with the evaluation criteria was used to reduce the attributes and optimize the feature subset. Using 2 different classifiers， comparative experiments were carried out on 6 datasets with 5 classical algorithms such as mRMR （minimal-Redundancy-Maximal-Relevance criterion） and RReliefF， and the validity of MCD was verified by using the classification accuracy. Under the Support Vector Machine （SVM） classifier， the average classification accuracy increased by 5.67 - 23.80 percentage points， respectively； and under the K-Nearest Neighbor （KNN） classifier， the average classification accuracy increased by 2.69 - 25.18 percentage points， respectively. It can be seen that in the vast majority of cases， MCD can effectively remove redundant features and significantly improve classification accuracy.

Table and Figures | Reference | Related Articles | Metrics

Select

Attribute reduction for high-dimensional data based on bi-view of similarity and difference

Yuanjiang LI, Jinsheng QUAN, Yangyi TAN, Tian YANG

Journal of Computer Applications 2023, 43 (5): 1467-1472. DOI: 10.11772/j.issn.1001-9081.2022081154

Abstract （302）

HTML （5）

PDF （464KB）（91）

Save

Concerning of the curse of dimensionality caused by too high data dimension and redundant information， a high-dimensional Attribute Reduction algorithm based on Similarity and Difference Matrix （ARSDM） was proposed. In this algorithm， on the basis of discernibility matrix， the similarity measure for samples in the same class was added to form a comprehensive evaluation of all samples. Firstly， the distances of samples under each attribute were calculated， and the similarity of same class and the difference of different classes were obtained based on these distances. Secondly， a similarity and difference matrix was established to form an evaluation of the entire dataset. Finally， attribute reduction was performed， i.e.， each column of the similarity and difference matrix was summed， the feature with the largest value was selected into the reduction in proper order， and the row vector of the corresponding sample pair was set to the zero vector. Experimental results show that compared with the classical attribute reduction algorithms DMG （Discernibility Matrix based on Graph theory）， FFRS （Fitting Fuzzy Rough Sets） and GBNRS （Granular Ball Neighborhood Rough Sets）， the average classification accuracy of ARSDM is increased by 1.07， 6.48， and 8.92 percentage points respectively under the Classification And Regression Tree （CART） classifier， and increased by 1.96， 11.96， and 12.39 percentage points under the Support Vector Machine （SVM） classifier. At the same time， ARSDM outperforms GBNRS and FFRS in running efficiency. It can be seen that ARSDM can effectively remove redundant information and improve the classification accuracy.

Table and Figures | Reference | Related Articles | Metrics

Select

Semi-supervised three-way clustering ensemble based on Seeds set and pairwise constraints

Chunmao JIANG, Peng WU, Zhicong LI

Journal of Computer Applications 2023, 43 (5): 1481-1488. DOI: 10.11772/j.issn.1001-9081.2022071094

Abstract （278）

HTML （7）

PDF （1442KB）（90）

Save

Using appropriate strategies， clustering ensemble can effectively improve the stability， robustness and precision of clustering results by fusing multiple base cluster members with differences. Current research on the clustering ensemble rarely uses known priori information， and it is difficult to describe belonging relationships between objects and clusters when facing complex data. Therefore， a semi-supervised three-way clustering ensemble method was proposed on the basis of Seeds set and pairwise constraints. Firstly， based on the existing label information， a new three-way label propagation algorithm was proposed to construct the base cluster members. Secondly， a semi-supervised three-way clustering ensemble framework was designed to integrate the base cluster members to construct a consistent similarity matrix， and this matrix was optimized by using pairwise constraint information. Finally， the three-way spectral clustering was employed as a consistency function to cluster the similarity matrix to obtain the final clustering results. Experimental results on several real datasets in UCI show that compared with the semi-supervised clustering ensemble algorithms including Cluster-based Similarity Partitioning Algorithm （CSPA）， HyperGraph Partitioning Algorithm （HGPA）， Meta-CLustering Algorithm （MCLA）， Label Propagation Algorithm （LPA） and Cop-Kmeans， the proposed method achieves the best results on most of the datasets in terms of Normalized Mutual Information （NMI）， Adjusted Rand Index （ARI） and F-measure.

Table and Figures | Reference | Related Articles | Metrics

Select

Community mining algorithm based on multi-relationship of nodes and its application

Lin ZHOU, Yuzhi XIAO, Peng LIU, Youpeng QIN

Journal of Computer Applications 2023, 43 (5): 1489-1496. DOI: 10.11772/j.issn.1001-9081.2022081218

Abstract （387）

HTML （18）

PDF （4478KB）（174）

Save

In order to measure the similarity of multi-relational nodes and mine the community structure with multi-relational nodes， a community mining algorithm based on multi-relationship of nodes， called LSL-GN， was proposed. Firstly， based on node similarity and node reachability， LHN-ISL， a similarity measurement index for multi-relational nodes， was described to reconstruct the low-density model of the target network， and the community division was completed by combining with GN （Girvan-Newman） algorithm. The LSL-GN algorithm was compared with several classical community mining algorithms on Modularity （Q value）， Normalized Mutual Information （NMI） and Adjusted Rand Index （ARI）. The results show that LSL-GN algorithm achieves the best results in terms of three indexes， indicating that the community division quality of LSL-GN is better. The “User-Application” mobile roaming network model was divided by LSL-GN algorithm into community structures based on basic applications such as Ctrip， Amap and Didi Travel. These results of community division can provide strategic reference information for designing personalized package services.

Table and Figures | Reference | Related Articles | Metrics

Select

Spectral clustering based dynamic community discovery algorithm in social network

Yu YANG, Weiwei DUAN

Journal of Computer Applications 2023, 43 (10): 3129-3135. DOI: 10.11772/j.issn.1001-9081.2022101517

Abstract （346）

HTML （13）

PDF （2785KB）（943）

Save

Dynamic community discovery is an important research area in Social Network Analysis （SNA）. As nodes joining or leaving social networks， the relationships between nodes establish or terminate， which affects community structure changes. The discovery algorithms of static communities in social networks lack of the essential historical information of community nodes， resulting in the insufficient network structure analysis as well as clustering information and the high computational cost. Aiming at these problems， based on the division of the community network evolution events， according to the analysis of the major community events， a Spectral Clustering based Dynamic Community Discovery Algorithm （SC-DCDA） was proposed. Firstly， according to the experimental observation， the dimensionality of high-dimensional data was reduced by using the method of spectral mapping. At the same time， the improved Fuzzy C-Means clustering （FCM） algorithm was adopted to determine the correlation between the nodes in the dynamic social network and the communities to be discovered. Secondly， the community structures were analyzed according to the evolutionary similarity matrix. Finally， the real network datasets and community discovery algorithm indicators， such as modularity score and Silhouette coefficient， were used to evaluate the effects of the proposed algorithm. Experimental results show that the computational cost of SC-DCDA is reduced by 8.37% compared with traditional spectral clustering， the average modularity score of the algorithm on all datasets is 0.49， and the qualitative analysis results of other algorithm metrics are also good， indicating that the proposed algorithm performs well in information interaction， clustering effect， and accuracy.

Table and Figures | Reference | Related Articles | Metrics

Select

Collaborative filtering algorithm based on collaborative training and Boosting

Xiaohan YANG, Guosheng HAO, Xiehua ZHANG, Zihao YANG

Journal of Computer Applications 2023, 43 (10): 3136-3141. DOI: 10.11772/j.issn.1001-9081.2022101489

Abstract （276）

HTML （15）

PDF （1305KB）（328）

Save

Collaborative Filtering （CF） algorithm can realize personalized recommendation on the basis of the similarity between items or users. However， data sparsity has always been one of the challenges faced by CF algorithm. In order to improve the prediction accuracy， a CF algorithm based on Collaborative Training and Boosting （CFCTB） was proposed to solve the problem of sparse user-item scores. First， two CFs were integrated into a framework by using collaborative training， pseudo-labeled samples with high confidence were added to each other’s training set by the two CFs， and Boosting weighted training data were used to assist the collaborative training. Then， the weighted integration was used to predict the final user scores， and the accumulation of noise generated by pseudo-labeled samples was avoided effectively， thereby further improving the recommendation performance. Experimental results show that the accuracy of the proposed algorithm is better than that of the single models on four open datasets. On CiaoDVD dataset with the highest sparsity， compared with Global and Local Kernels for recommender systems （GLocal-K）， the proposed algorithm has the Mean Absolute Error （MAE） reduced by 4.737%. Compared with ECoRec （Ensemble of Co-trained Recommenders） algorithm， the proposed algorithm has the Root Mean Squared Error （RMSE） decreased by 7.421%. The above rasults verify the effectiveness of the proposed algorithm.

Table and Figures | Reference | Related Articles | Metrics

Select

Multiple clustering algorithm based on dynamic weighted tensor distance

Zhuangzhuang XUE, Peng LI, Weibei FAN, Hongjun ZHANG, Fanshuo MENG

Journal of Computer Applications 2023, 43 (11): 3449-3456. DOI: 10.11772/j.issn.1001-9081.2022101626

Abstract （243）

HTML （2）

PDF （2437KB）（185）

Save

When measuring the importance of attributes in Tensor-based Multiple Clustering algorithm （TMC）， the relevance of attribute combinations within object tensors are ignored， and the selected and unselected feature space are incompletely separated because of the fixed weight strategy under different feature space selection. For above problems， a Multiple Clustering algorithm based on Dynamic Weighted Tensor Distance （DWTD-MC） was proposed. Firstly， a self-association tensor model was constructed to improve the accuracy of attribute importance measurement of each feature space. Then， a multi-view weight tensor model was built to meet the task requirements of multiple clustering analysis by dynamic weighting strategy under different feature space selection. Finally， the dynamic weighted tensor distance was used to measure the similarity of data points， generating multiple clustering results. Simulation results on real datasets show that DWTD-MC outperforms comparative algorithms such as TMC in terms of Jaccard Index （JI）， Dunn Index （DI）， Davies-Bouldin index （DB） and Silhouette Coefficient （SC）. It can obtain high quality clustering results while maintaining low redundancy among clustering results， as well as meeting the task requirements of multiple clustering analysis.

Table and Figures | Reference | Related Articles | Metrics

Project Articles