Project Articles

    China Conference on Data Mining 2022 (CCDM 2022)

    Default Latest Most Read
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Discriminative multidimensional scaling for feature learning
    Haitao TANG, Hongjun WANG, Tianrui LI
    Journal of Computer Applications    2023, 43 (5): 1323-1329.   DOI: 10.11772/j.issn.1001-9081.2022030419
    Abstract501)   HTML91)    PDF (1101KB)(441)       Save

    Traditional multidimensional scaling method achieves low-dimensional embedding, which maintains the topological structure of data points but ignores the discriminability of the low-dimensional embedding itself. Based on this, an unsupervised discriminative feature learning method based on multidimensional scaling method named Discriminative MultiDimensional Scaling model (DMDS) was proposed to discover the cluster structure while learning the low-dimensional data representation. DMDS can make the low-dimensional embeddings of the same cluster closer to make the learned data representation be more discriminative. Firstly, a new objective function corresponding to DMDS was designed, reflecting that the learned data representation could maintain the topology and enhance discriminability simultaneously. Secondly, the objective function was reasoned and solved, and a corresponding iterative optimization algorithm was designed according to the reasoning process. Finally, comparison experiments were carried out on twelve public datasets in terms of average accuracy and average purity of clustering. Experimental results show that DMDS outperforms the original data representation and the traditional multidimensional scaling model based on the comprehensive evaluation of Friedman statistics, the low-dimensional embeddings learned by DMDS are more discriminative.

    Table and Figures | Reference | Related Articles | Metrics
    Improved capsule network based on multipath feature
    Qinghai XU, Shifei DING, Tongfeng SUN, Jian ZHANG, Lili GUO
    Journal of Computer Applications    2023, 43 (5): 1330-1335.   DOI: 10.11772/j.issn.1001-9081.2022030367
    Abstract339)   HTML40)    PDF (1560KB)(249)       Save

    Concerning the problems of poor classification of Capsule Network (CapsNet) on complex datasets and large number of parameters in the routing process, a Capsule Network based on Multipath feature (MCNet) was proposed, including a novel capsule feature extractor and a novel capsule pooling method. By the capsule feature extractor, the features of different layers and locations were extracted in parallel from multiple paths, and then the features were encoded into capsule features containing more semantic information. In the capsule pooling method, the most active capsules at each position of the capsule feature map were selected, and the effective capsule features were represented by a small number of capsules. Comparisons were performed on four datasets (CIFAR-10, SVHN, Fashion-MNIST, MNIST) with models such as CapsNet. Experimental results show that MCNet has the classification accuracy of 79.27% on CIFAR-10 dataset and the number of trainable parameters of 6.25×106; compared with CapsNet, MCNet has the classification accuracy improved by 8.7%, and the number of parameters reduced by 46.8%. MCNet can effectively improve the classification accuracy while reducing the number of trainable parameters.

    Table and Figures | Reference | Related Articles | Metrics
    Comparison of three-way concepts under attribute clustering
    Xiaoyan ZHANG, Jiayi WANG
    Journal of Computer Applications    2023, 43 (5): 1336-1341.   DOI: 10.11772/j.issn.1001-9081.2022030399
    Abstract214)   HTML18)    PDF (471KB)(130)       Save

    Three-way concept analysis is a very important topic in the field of artificial intelligence. The biggest advantage of this theory is that it can study “attributes that are commonly possessed” and “attributes that are commonly not possessed” of the objects in the formal context at the same time. It is well known that the new formal context generated by attribute clustering has a strong connection with the original formal context, and there is a close internal connection between the original three-way concepts and the new three-way concepts obtained by attribute clustering. Therefore, the comparative study and analysis of three-way concepts under attribute clustering were carried out. Firstly, the concepts of pessimistic, optimistic and general attribute clusterings were proposed on the basis of attribute clustering, and the relationship among these three concepts was studied. Moreover, the difference between the original three-way concepts and the new ones was studied by comparing the clustering process with the formal process of three-way concepts. Furthermore, two minimum constraint indexes were put forward from the perspective of object-oriented and attribute-oriented respectively, and the influence of attribute clustering on three-way concept lattice was explored. The above results further enrich the theory of three-way concept analysis and provide feasible ideas for the field of visual data processing.

    Table and Figures | Reference | Related Articles | Metrics
    Iteratively modified robust extreme learning machine
    Xinwei LYU, Shuxia LU
    Journal of Computer Applications    2023, 43 (5): 1342-1348.   DOI: 10.11772/j.issn.1001-9081.2022030429
    Abstract220)   HTML15)    PDF (823KB)(88)       Save

    Many variations of Extreme Learning Machine (ELM) aim at improving the robustness of ELMs to outliers, while the traditional Robust Extreme Learning Machine (RELM) is very sensitive to outliers. How to deal with too many extreme outliers in the data becomes the most difficult problem for constructing RELM models. For outliers with large residuals, a bounded loss function was used to eliminate the pollution of outliers to the model; to solve the problem of excessive outliers, iterative modification technique was used to modify data to reduce the influence caused by excessive outliers. Combining these two approaches, an Iteratively Modified RELM (IMRELM) was proposed and it was solved by iteration. In each iteration, the samples were reweighted to reduce the influence of outliers and the under-fitting was avoided in the process of continuous modification. IMRELM, ELM, Weighted ELM (WELM), Iteratively Re-Weighted ELM (IRWELM) and Iterative Reweighted Regularized ELM (IRRELM) were compared on synthetic datasets and real datasets with different outlier levels. On the synthetic dataset with 80% outliers, the Mean-Square Error (MSE) of IRRELM is 2.450 44, and the MSE of IMRELM is 0.000 79. Experimental results show that IMRELM has good prediction accuracy and robustness on data with excessive extreme outliers.

    Table and Figures | Reference | Related Articles | Metrics
    Multi-label cross-modal hashing retrieval based on discriminative matrix factorization
    Yu TAN, Xiaoqin WANG, Rushi LAN, Zhenbing LIU, Xiaonan LUO
    Journal of Computer Applications    2023, 43 (5): 1349-1354.   DOI: 10.11772/j.issn.1001-9081.2022030424
    Abstract304)   HTML16)    PDF (929KB)(157)       Save

    Existing cross-modal hashing algorithms underestimate the importance of semantic differences between different class labels and ignore the balance condition of hash vectors, which makes the learned hash codes less discriminative. In addition, some methods utilize the label information to construct similarity matrix and treat multi-label data as single label ones to perform modeling, which causes large semantic loss in multi-label cross-modal retrieval. To preserves accurate similarity relationship between heterogeneous data and the balance property of hash vectors, a novel supervised hashing algorithm, namely Discriminative Matrix Factorization Hashing (DMFH) was proposed. In this method, the Collective Matrix Factorization (CMF) of the kernelized features was used to obtain a shared latent subspace. The proportion of common labels between the data was also utilized to describe the similarity degree of the heterogeneous data. Besides, a balanced matrix was constructed by label balanced information to generate hash vectors with balance property and maximize the inter-class distances among different class labels. By comparing with seven advanced cross-modal hashing retrieval methods on two commonly used multi-label datasets, MIRFlickr and NUS-WIDE, DMFH achieves the best mean Average Precision (mAP) on both I2T (Image to Text) and T2I (Text to Image) tasks, and the mAPs of T2I are better, indicating that DMFH can utilize the multi-label semantic information in text modal more effectively. The validity of the constructed balanced matrix and similarity matrix is also analyzed, verifying that DMFH can maintain semantic information and similarity relations, and is effective in cross-modal hashing retrieval.

    Table and Figures | Reference | Related Articles | Metrics
    Teaching-learning-based optimization algorithm based on cooperative mutation and Lévy flight strategy and its application
    Hao GAO, Qingke ZHANG, Xianglong BU, Junqing LI, Huaxiang ZHANG
    Journal of Computer Applications    2023, 43 (5): 1355-1364.   DOI: 10.11772/j.issn.1001-9081.2022030420
    Abstract355)   HTML8)    PDF (2787KB)(190)       Save

    Concerning the shortcomings of unbalanced search, easy to fall into local optimum and weak comprehensive solution performance of Teaching-Learning-Based Optimization (TLBO) algorithm in dealing with optimization problems, an improved TLBO based on equilibrium optimization and Lévy flight strategy, namely ELMTLBO (Equilibrium-Lévy-Mutation TLBO), was proposed. Firstly, an elite equilibrium guidance strategy was designed to improve the global optimization ability of the algorithm through the equilibrium guidance of multiple elite individuals in the population. Secondly, a strategy combining Lévy flight with adaptive weight was added after the learner phase of TLBO algorithm, and adaptive scaling was performed by the weight to the step size generated by Lévy flight, which improved the population's local optimization ability and enhanced the self-adaptability of individuals to complex environments. Finally, a mutation operator pool escape strategy was designed to improve the population diversity of the algorithm by the cooperative guidance of multiple mutation operators. To verify the effectiveness of the algorithm improvement, the comprehensive convergence performance of the ELMTLBO algorithm was compared with 7 state-of-the-art intelligent optimization algorithms such as the Dwarf Mongoose Optimization Algorithm (DMOA), as well as the same type of algorithms such as Balanced TLBO (BTLBO) and standard TLBO on 15 international test functions. The statistical experiment results show that compared with advanced intelligent optimization algorithms and TLBO algorithm variants, ELMTLBO algorithm can effectively balance its search ability, not only solving both unimodal and multimodal problems, but also having significant optimization ability in complex multimodal problems. It can be seen that with the combined effect of different strategies, ELMTLBO algorithm has outstanding comprehensive optimization performance and stable global convergence performance. In addition, ELMTLBO algorithm was successfully applied to the Multiple Sequence Alignment (MSA) problem based on Hidden Markov Model (HMM), and the high-quality aligned sequences obtained by this algorithm can be used in disease diagnosis, gene tracing and some other fields, which can provide good algorithmic support for the development of bioinformatics.

    Table and Figures | Reference | Related Articles | Metrics
    J-SGPGN: paraphrase generation network based on joint learning of sequence and graph
    Zhirong HOU, Xiaodong FAN, Hua ZHANG, Xiaonan MA
    Journal of Computer Applications    2023, 43 (5): 1365-1371.   DOI: 10.11772/j.issn.1001-9081.2022040626
    Abstract229)   HTML8)    PDF (951KB)(140)       Save

    Paraphrase generation is a text data argumentation method based on Natural Language Generation (NLG). Concerning the problems of repetitive generation, semantic errors and poor diversity in paraphrase generation methods based on the Sequence-to-Sequence (Seq2Seq) framework, a Paraphrase Generation Network based on Joint learning of Sequence and Graph (J-SGPGN) was proposed. Graph encoding and sequence encoding were fused in the encoder of J-SGPGN for feature enhancement, and two decoding methods including sequence generation and graph generation were designed in the decoder of J-SGPGN for parallel decoding. Then the joint learning method was used to train the model, aiming to combine syntactic supervision with semantic supervision to simultaneously improve the accuracy and diversity of generation. Experimental results on Quora dataset show that the generation accuracy evaluation indicator METEOR (Metric for Evaluation of Translation with Explicit ORdering) of J-SGPGN is 3.44 percentage points higher than that of the baseline model with optimal accuracy — RNN+GCN, and the generation diversity evaluation indicator Self-BLEU (Self-BiLingual Evaluation Understudy) of J-SGPGN is 12.79 percentage points lower than that of the baseline model with optimal diversity — Back-Translation guided multi-round Paraphrase Generation (BTmPG) model. It is verified that J-SGPGN can generate paraphrase text with more accurate semantics and more diverse expressions.

    Table and Figures | Reference | Related Articles | Metrics
    Pedestrian head tracking model based on full-body appearance features
    Guangyao ZHANG, Chunfeng SONG
    Journal of Computer Applications    2023, 43 (5): 1372-1377.   DOI: 10.11772/j.issn.1001-9081.2022030377
    Abstract287)   HTML19)    PDF (2258KB)(211)       Save

    The existing pedestrian multi-object tracking algorithms have the problems of undetectable pedestrians and inter-frame association confusion in dense scenes. In order to improve the precision of pedestrian tracking in dense scenes, a head tracking model based on full-body appearance features was proposed, namely HT-FF (Head Tracking with Full-body Features). Firstly, the head detector was used to replace the full-body detector to improve the detection rate of pedestrians in dense scenes. Secondly, using the information of human posture estimation as a guide, the noise-removed full-body appearance features were obtained as tracking clues, which greatly reduced the confusion in the association among multiple frames. HT-FF model achieves the best results on multiple indicators such as MOTA (Multiple Object Tracking Accuracy) and IDF1 (ID F1 Score) on benchmark dataset of pedestrian tracking in dense scenes — Head Tracking 21 (HT21). The HT-FF model can effectively alleviate the problem of lost and confused pedestrian tracking in dense scenes, and the proposed tracking model combining multiple clues is a new paradigm of pedestrian tracking model.

    Table and Figures | Reference | Related Articles | Metrics
    Stock movement prediction with market dynamic hierarchical macro information
    Yafei ZHANG, Jing WANG, Yaoshuai ZHAO, Zhihao WU, Youfang LIN
    Journal of Computer Applications    2023, 43 (5): 1378-1384.   DOI: 10.11772/j.issn.1001-9081.2022030400
    Abstract264)   HTML9)    PDF (1401KB)(142)       Save

    The complex structure and diverse imformation of stock markets make stock movement prediction extremely challenging. However, most of the existing studies treat each stock as an individual or use graph structures to model complex higher-order relationships in stock markets, without considering the hierarchy and dynamics among stocks, industries and markets. Aiming at the above problems, a Dynamic Macro Memory Network (DMMN) was proposed, and price movement prediction was performed for multiple stocks simultaneously based on DMMN. In this method, the market macro-environmental information was modeled by the hierarchies of “stock-industry-market”, and long-term dependences of this information on time series were captured. Then, the market macro-environmental information was integrated with stock micro-characteristic information dynamically to enhance the ability of each stock to perceive the overall state of the market and capture the interdependences among stocks, industries, and markets indirectly. Experimental results on the collected CSI300 dataset show that compared with stock prediction methods based on Attentive Long Short-Term Memory (ALSTM) network, GCN-LSTM (Graph Convolutional Network with Long Short-Term Memory), Convolutional Neural Network (CNN) and other models, the DMMN-based method achieves better results in F1-score and Sharpe ratio, which are improved by 4.87% and 31.90% respectively compared with ALSTM, the best model among all comparison methods. This indicates that DMMN has better prediction performance and better practicability.

    Table and Figures | Reference | Related Articles | Metrics
    Stock return prediction via multi-scale kernel adaptive filtering
    Xingheng TANG, Qiang GUO, Tianhui XU, Caiming ZHANG
    Journal of Computer Applications    2023, 43 (5): 1385-1393.   DOI: 10.11772/j.issn.1001-9081.2022030401
    Abstract269)   HTML7)    PDF (1992KB)(133)       Save

    In stock market, investors can predict the future stock return by capturing the potential trading patterns in historical data. The key issue for predicting stock return is how to find out the trading patterns accurately. However, it is generally difficult to capture them due to the influence of uncertain factors such as corporate performance, financial policies, and national economic growth. To solve this problem, a Multi-Scale Kernel Adaptive Filtering (MSKAF) method was proposed to capture the multi-scale trading patterns from past market data. In this method, in order to describe the multi-scale features of stocks, Stationary Wavelet Transform (SWT) was employed to obtain data components with different scales. The different trading patterns hidden in stock price fluctuations were contained in these data components. Then, the Kernel Adaptive Filtering (KAF) was used to capture the trading patterns with different scales to predict the future stock return. Experimental results show that compared with those of the prediction model based on Two-Stage KAF (TSKAF), the Mean Absolute Error (MAE) of the results generated by the proposed method is reduced by 10%, and the Sharpe Ratio (SR) of the results generated by the proposed method is increased by 8.79%, verifying that the proposed method achieves better stock return prediction performance.

    Table and Figures | Reference | Related Articles | Metrics
    Extraction of PM 2.5 diffusion characteristics based on candlestick pattern matching
    Rui XU, Shuang LIANG, Hang WAN, Yimin WEN, Shiming SHEN, Jian LI
    Journal of Computer Applications    2023, 43 (5): 1394-1400.   DOI: 10.11772/j.issn.1001-9081.2022030437
    Abstract195)   HTML12)    PDF (2423KB)(73)       Save

    Most existing air quality prediction methods focus on simple time series data for trend prediction, and ignore the pollutant transport and diffusion laws and corresponding classified pattern features. In order to solve the above problem, a PM2.5 diffusion characteristic extraction method based on Candlestick Pattern Matching (CPM) was proposed. Firstly, the basic periodic candlestick charts from a large number of historical PM2.5 sequences were generated by using the convolution idea of Convolutional Neural Network (CNN). Then, the concentration patterns of different candlestick chart feature vectors were clustered and analyzed by using the distance formula. Finally, combining the unique advantages of CNN in image recognition, a hybrid model integrating graphical features and time series features sequences was formed, and the trend reversal that would be caused by candlestick charts with reversal signals was judged. Experimental results on the monitoring time series dataset of Guilin air quality online monitoring stations show that compared with the VGG (Visual Geometry Group)-based method which uses the single time series data, the accuracy of the CPM-based method is improved by 1.9 percentage points. It can be seen that the CPM-based method can effectively extract the trend features of PM2.5 and be used for predicting the periodic change of pollutant concentration in the future.

    Table and Figures | Reference | Related Articles | Metrics
2024 Vol.44 No.4

Current Issue
Archive
Honorary Editor-in-Chief: ZHANG Jingzhong
Editor-in-Chief: XU Zongben
Associate Editor: SHEN Hengtao XIA Zhaohui
Domestic Post Distribution Code: 62-110
Foreign Distribution Code: M4616
Address:
No. 9, 4th Section of South Renmin Road, Chengdu 610041, China
Tel: 028-85224283-803
  028-85222239-803
Website: www.joca.cn
E-mail: bjb@joca.cn
WeChat
Join CCF