Most Read articles

    Published in last 1 year |  In last 2 years |  In last 3 years |  All

    All
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Survey and prospect of large language models
    Xiaolin QIN, Xu GU, Dicheng LI, Haiwen XU
    Journal of Computer Applications    2025, 45 (3): 685-696.   DOI: 10.11772/j.issn.1001-9081.2025010128
    Abstract1201)   HTML93)    PDF (2035KB)(2350)       Save

    Large Language Models (LLMs) are a class of language models composed of artificial neural networks with a vast number of parameters (typically billions of weights or more). They are trained on a large amount of unlabeled text using self-supervised or semi-supervised learning and are the core of current generative Artificial Intelligence (AI) technologies. Compared to traditional language models, LLMs demonstrate stronger language understanding and generation capabilities, supported by substantial computational power, extensive parameters, and large-scale data. They are widely applied in tasks such as machine translation, question answering systems, and dialogue generation with good performance. Most of the existing surveys focus on the theoretical construction and training techniques of LLMs, while systematic exploration of LLMs’ industry-level application practices and evolution of the technological ecosystem remains insufficient. Therefore, based on introducing the foundational architecture, training techniques, and development history of LLMs, the current general key technologies in LLMs and advanced integration technologies with LLMs bases were analyzed. Then, by summarizing the existing research, challenges faced by LLMs in practical applications were further elaborated, including problems such as data bias, model hallucination, and computational resource consumption, and an outlook was provided on the ongoing development trends of LLMs.

    Table and Figures | Reference | Related Articles | Metrics
    Review on bimodal emotion recognition based on speech and text
    Lingmin HAN, Xianhong CHEN, Wenmeng XIONG
    Journal of Computer Applications    2025, 45 (4): 1025-1034.   DOI: 10.11772/j.issn.1001-9081.2024030319
    Abstract688)   HTML68)    PDF (1625KB)(2124)       Save

    Emotion recognition is a technology that allows computers to recognize and understand human emotions. It plays an important role in many fields and is an important development direction in the field of artificial intelligence. Therefore, the research status of bimodal emotion recognition based on speech and text was summarized. Firstly, the representation space of emotion was classified and elaborated. Secondly, the emotion databases were classified according to their emotion representation space, and the common multi-modal emotion databases were summed up. Thirdly, the methods of bimodal emotion recognition based on speech and text were introduced, including feature extraction, modal fusion, and decision classification. Specifically, the modal fusion methods were highlighted and divided into four categories, namely feature level fusion, decision level fusion, model level fusion and multi-level fusion. In addition, results of a series of bimodal emotion recognition methods based on speech and text were compared and analyzed. Finally, the application scenarios, challenges, and future development directions of emotion recognition were introduced. The above aims to analyze and review the work of multi-modal emotion recognition, especially bimodal emotion recognition based on speech and text, providing valuable information for emotion recognition.

    Table and Figures | Reference | Related Articles | Metrics
    ScholatGPT: a large language model for academic social networks and its intelligent applications
    Chengzhe YUAN, Guohua CHEN, Dingding LI, Yuan ZHU, Ronghua LIN, Hao ZHONG, Yong TANG
    Journal of Computer Applications    2025, 45 (3): 755-764.   DOI: 10.11772/j.issn.1001-9081.2024101477
    Abstract483)   HTML28)    PDF (2602KB)(1568)       Save

    To address the limitations of the existing Large Language Models (LLMs) in processing cross-domain knowledge, updating real-time academic information, and ensuring output quality, ScholatGPT, a scholar LLM based on Academic Social Networks (ASNs), was proposed. In ScholatGPT, the abilities of precise semantic retrieval and dynamic knowledge update were enhanced by integrating Knowledge-Graph Augmented Generation (KGAG) and Retrieval-Augmented Generation (RAG), and optimization and fine-tuning were used to improve the generation quality of academic text. Firstly, a scholar knowledge graph was constructed based on relational data from SCHOLAT, with LLMs employed to enrich the graph semantically. Then, a KGAG-based retrieval model was introduced, combined with RAG to realize multi-path hybrid retrieval, thereby enhancing the model’s precision in search. Finally, fine-tuning techniques were applied to optimize the model’s generation quality in academic fields. Experimental results demonstrate that ScholatGPT achieves the precision of 83.2% in academic question answering tasks, outperforming GPT-4o and AMiner AI by 69.4 and 11.5 percentage points, and performs well in all the tasks such as scholar profiling, representative work identification, and research field classification. Furthermore, ScholatGPT obtains stable and competitive results in answer relevance, coherence, and readability, achieving a good balance between specialization and readability. Additionally, ScholatGPT-based intelligent applications such as scholar think tank and academic information recommendation system improve academic resource acquisition efficiency effectively.

    Table and Figures | Reference | Related Articles | Metrics
    Crop disease recognition method based on multi-modal data fusion
    Wei CHEN, Changyong SHI, Chuanxiang MA
    Journal of Computer Applications    2025, 45 (3): 840-848.   DOI: 10.11772/j.issn.1001-9081.2024091297
    Abstract450)   HTML9)    PDF (2997KB)(696)       Save

    Current deep learning-based methods for crop disease recognition rely on specific image datasets of crop diseases for image representation learning, and do not consider the importance of text features in assisting image feature learning. To enhance feature extraction and disease recognition capabilities of the model for crop disease images more effectively, a Crop Disease Recognition method through multi-modal data fusion based on Contrastive Language-Image Pre-training (CDR-CLIP) was proposed. Firstly, high-quality disease recognition image-text pair datasets were constructed to enhance image feature representation through textual information. Then, a multi-modal fusion strategy was applied to integrate text and image features effectively, which strengthened the model capability of distinguishing diseases. Finally, specialized pre-training and fine-tuning strategies were designed to optimize the model’s performance in specific crop disease recognition tasks. Experimental results demonstrate that CDR-CLIP achieves the disease recognition accuracies of 99.31% and 87.66% with F1 values of 99.04% and 87.56%, respectively, on PlantVillage and AI Challenger 2018 crop disease datasets. On PlantDoc dataset, CDR-CLIP achieves the mean Average Precision mAP@0.5 of 51.10%, showing the strong performance advantage of CDR-CLIP.

    Table and Figures | Reference | Related Articles | Metrics
    Personalized learning recommendation in collaboration of knowledge graph and large language model
    Xuefei ZHANG, Liping ZHANG, Sheng YAN, Min HOU, Yubo ZHAO
    Journal of Computer Applications    2025, 45 (3): 773-784.   DOI: 10.11772/j.issn.1001-9081.2024070971
    Abstract444)   HTML15)    PDF (1570KB)(1032)       Save

    As an important research topic in the field of smart education, personalized learning recommendation has a core goal of using recommendation algorithms and models to provide learners with effective learning resources that match their individual learning needs, interests, abilities, and histories, so as to improve learners’ learning effects. Current recommendation methods have problems such as cold start, data sparsity, poor interpretability, and over-personalization, and the combination of knowledge graph and Large Language Model (LLM) provides strong support to solve the above problems. Firstly, the contents such as concepts and current research status of personalized learning recommendation were overviewed. Secondly, the concepts of knowledge graph and LLM and their specific applications in personalized learning recommendation were discussed respectively. Thirdly, the collaborative application methods of knowledge graph and LLM in personalized learning recommendation were summarized. Finally, the future development directions of knowledge graph and LLM in personalized learning recommendation were prospected to provide reference and inspiration for continuous development and innovative practice in the field of personalized learning recommendation.

    Table and Figures | Reference | Related Articles | Metrics
    Cross-domain few-shot classification model based on relation network and Vision Transformer
    Yiqin YAN, Chuan LUO, Tianrui LI, Hongmei CHEN
    Journal of Computer Applications    2025, 45 (4): 1095-1103.   DOI: 10.11772/j.issn.1001-9081.2023121852
    Abstract443)   HTML10)    PDF (2414KB)(115)       Save

    Aiming at the problem of poor classification accuracy of few-shot learning models in domain shift conditions, a cross-domain few-shot model based on relation network and ViT (Vision Transformer) — ReViT (Relation ViT) was proposed. Firstly, ViT was introduced as a feature extractor, and the pre-trained deep neural network was employed to solve the problem of insufficient feature expression ability of shallow neural network. Secondly, a shallow convolutional network was used as a task adapter to enhance the knowledge transfer ability of the model, and a non-linear classifier was constructed on the basis of the relation network and the channel attention mechanism. Thirdly, the feature extractor and the task adapter were integrated to enhance the generalization ability of the model. Finally, a four-stage learning strategy of “pre-training — meta-training — fine-tuning — meta-testing” was adopted to train the model, which further improved the cross-domain classification performance of ReViT by effective integration of transfer learning and meta learning. Experimental results using average classification accuracy as evaluation metric show that ReViT has good performance on cross-domain few-shot classification problems. Specifically, the classification accuracies of ReViT under in-domain scenarios and out-of-domain scenarios are improved by 5.82 and 1.71 percentage points, respectively, compared to the sub-optimal model on Meta-Dataset. The classification accuracies of ReViT are improved by 1.00, 1.54 and 2.43 percentage points, respectively, compared to the sub-optimal model on 5-way 5-shot for three sub-problems EuroSAT (European SATellite data), CropDisease, and ISIC (International Skin Imaging Collaboration) of BCDFSL (Broader study of Cross-Domain Few-Shot Learning) dataset. The classification accuracies of ReViT are improved by 0.13, 0.97, and 3.40 percentage points, respectively, compared to the sub-optimal model on 5-way 20-shot for EuroSAT, CropDisease, and ISIC. The classification accuracy of ReViT is improved by 0.36 percentage point compared to the sub-optimal model on 5-way 50-shot for CropDisease. It can be seen that ReViT have good classification accuracy in image classification tasks with sparse samples.

    Table and Figures | Reference | Related Articles | Metrics
    Federated parameter-efficient fine-tuning technology for large model based on pruning
    Hui ZENG, Shiyu XIONG, Yongzheng DI, Hongzhou SHI
    Journal of Computer Applications    2025, 45 (3): 715-724.   DOI: 10.11772/j.issn.1001-9081.2024030322
    Abstract401)   HTML13)    PDF (2395KB)(392)       Save

    With the continues increasing importance of data privacy, fine-tuning Pre-trained Foundational Model (PFM) for downstream tasks has become increasingly challenging, leading to the emergence of federated learning research based on PFM. However, PFM poses significant challenges to federated learning systems, especially in terms of local computation and communication. Therefore, the corresponding solution schemes were proposed for the two main stages of federated learning: local computing and aggregation communication, namely the local efficient fine-tuning mode and the ring-shaped local aggregation mode. In the first mode, a model pruning algorithm based on Parameter-Efficient Fine-Tuning (PEFT) was employed to reduce local computation and communication costs. In the second mode, the centralized aggregation method was replaced with a distributed local aggregation scheme to enhance communication efficiency during the aggregation stage. Experimental results demonstrate that the proposed federated parameter-efficient fine-tuning framework for large model performs well in terms of both final performance and efficiency.

    Table and Figures | Reference | Related Articles | Metrics
    Review of interpretable deep knowledge tracing methods
    Jinxian SUO, Liping ZHANG, Sheng YAN, Dongqi WANG, Yawen ZHANG
    Journal of Computer Applications    2025, 45 (7): 2043-2055.   DOI: 10.11772/j.issn.1001-9081.2024070970
    Abstract390)   HTML31)    PDF (2726KB)(1424)       Save

    Knowledge Tracing (KT) is a cognitive diagnostic method aimed at simulating learner's mastery level of learned knowledge by analyzing learner's historical question answering records, ultimately predicting learner's future question answering performance. Knowledge tracing techniques based on deep neural network models have become a hot research topic in knowledge tracing field due to their strong feature extraction capabilities and superior prediction performance. However, deep learning-based knowledge tracing models often lack good interpretability. Clear interpretability enable learners and teachers to fully understand the reasoning process and prediction results of knowledge tracing models, thus facilitating the formulation of learning plans tailored to the current knowledge state for future learning, and enhance the trust of learners and teachers in knowledge tracing models at the same time. Therefore, interpretable Deep Knowledge Tracing (DKT) methods were reviewed. Firstly, the development of knowledge tracing and the definition as well as necessity of interpretability were introduced. Secondly, improvement methods proposed for solving the lack of interpretability in DKT models were summarized and listed from the perspectives of feature extraction and internal model enhancement. Thirdly, the related publicly available datasets for researchers were introduced, and the influences of dataset features on interpretability were analyzed, discussing how to evaluate knowledge tracing models from both performance and interpretability perspectives, and sorting out the performance of DKT models on different datasets. Finally, some possible future research directions to address current issues in DKT models were proposed.

    Table and Figures | Reference | Related Articles | Metrics
    Recognition and optimization of hallucination phenomena in large language models
    Jing HE, Yang SHEN, Runfeng XIE
    Journal of Computer Applications    2025, 45 (3): 709-714.   DOI: 10.11772/j.issn.1001-9081.2024081190
    Abstract388)   HTML18)    PDF (1539KB)(739)       Save

    Focusing on problems that Large Language Models (LLMs) may generate hallucinations and are difficult to be fully applied to various fields of real life, especially medical field, as well as there is no high-quality LLM hallucination evaluation dataset and corresponding LLM hallucination degree evaluation, a method for identifying and optimizing LLM hallucinations in medical question answering field was proposed. Firstly, based on the publicly available dataset Huatuo, an LLM hallucination evaluation dataset in medical question answering field was constructed by combining GPT-4 generated question answers and manual annotation. Secondly, based on the constructed hallucination evaluation dataset, the concept of “hallucination rate” was defined. By designing prompts for the models to be tested answering “yes” or “no”, the degree of hallucination of each LLM was tested and quantified, and the “YES MAN” hallucination phenomenon of LLM was discovered. Thirdly, a low hallucination rate LLM, GPT-4, was used as LeaderAI to provide prior knowledge to assist LLMs with high hallucination rate in making judgments. Finally, to explore whether multiple different LLMs will make mistakes on the same problem, the concept of “hallucination collision” was defined, and based on probability statistical method, the hallucination collision situations of different LLMs in medical question answering field were revealed. Experimental results show that the introduction of LeaderAI can improve the performance of LLMs with high hallucination rate, so that LLMs can handle with the “YES MAN” hallucination phenomenon in medical question answering with low hallucination rate. Moreover, the current LLMs have a low probability of having hallucinations on a single question (collisions).

    Table and Figures | Reference | Related Articles | Metrics
    Self-supervised learning method using minimal prior knowledge
    Junyi ZHU, Leilei CHANG, Xiaobin XU, Zhiyong HAO, Haiyue YU, Jiang JIANG
    Journal of Computer Applications    2025, 45 (4): 1035-1041.   DOI: 10.11772/j.issn.1001-9081.2024030366
    Abstract377)   HTML29)    PDF (1521KB)(451)       Save

    In order to make up for the high demand of supervised information in supervised learning, a self-supervised learning method based on minimal prior knowledge was proposed. Firstly, the unlabeled data were clustered on the basis of the prior knowledge of data, or the initial labels were generated for unlabeled data based on center distances of labeled data. Secondly, the data were selected randomly after labeling, and the machine learning method was selected to build sub-models. Thirdly, the weight and error of each data extraction were calculated to obtain average error of the data as the data label degree for each dataset, and set an iteration threshold based on the initial data label degree. Finally, the termination condition was determined on the basis of comparing the data-label degree and the threshold during the iteration process. Experimental results on 10 UCI public datasets show that compared with unsupervised learning algorithms such as K-means, supervised learning methods such as Support Vector Machine (SVM) and mainstream self-supervised learning methods such as TabNet (Tabular Network), the proposed method achieves high classification accuracy on unbalanced datasets without using labels or on balanced datasets using limited labels.

    Table and Figures | Reference | Related Articles | Metrics
    Clustering federated learning algorithm for heterogeneous data
    Qingli CHEN, Yuanbo GUO, Chen FANG
    Journal of Computer Applications    2025, 45 (4): 1086-1094.   DOI: 10.11772/j.issn.1001-9081.2024010132
    Abstract376)   HTML10)    PDF (2335KB)(2429)       Save

    Federated Learning (FL) is a new machine learning model construction paradigm with great potential in privacy preservation and communication efficiency, but in real Internet of Things (IoT) scenarios, there is data heterogeneity between client nodes, and learning a unified global model will lead to a decrease in model accuracy. To solve this problem, a Clustering Federated Learning based on Feature Distribution (CFLFD) algorithm was proposed. In this algorithm, the results obtained through Principal Component Analysis (PCA) of the features extracted from the model by each client node were clustered in order to cluster client nodes with similar data distribution to collaborate with each other, so as to achieve higher model accuracy. In order to demonstrate the effectiveness of the algorithm, extensive experiments were conducted on three datasets and four benchmark algorithms. The results show that the algorithm improves model accuracy by 1.12 and 3.76 percentage points respectively compared to the FedProx on CIFAR10 dataset and Office-Caltech10 dataset.

    Table and Figures | Reference | Related Articles | Metrics
    Source code vulnerability detection method based on Transformer-GCN
    Chen LIANG, Yisen WANG, Qiang WEI, Jiang DU
    Journal of Computer Applications    2025, 45 (7): 2296-2303.   DOI: 10.11772/j.issn.1001-9081.2024070998
    Abstract376)   HTML3)    PDF (3389KB)(878)       Save

    The existing deep learning-based methods for source code vulnerability detection often suffer from severe loss of syntax and semantics in target code, and neural network models allocating weights to the graph nodes (edges) in target code unreasonably. To address these issues, a method named VulATGCN for detecting source code vulnerabilities was proposed on the basis of Code Property Graph (CPG) and Adaptive Transformer-Graph Convolutional Network (AT-GCN). In the method, CPG was used to represent source code, CodeBERT was combined for node vectorization, and graph centrality analysis was employed to extract deep structural features, thereby capturing the code’s syntax and semantic information in multi-dimensional way. After that, AT-GCN model was designed by integrating strengths of Transformer-based self-attention mechanism, which excels at capturing long-range dependencies, and Graph Convolutional Network (GCN), which is proficient at capturing local features, thereby realizing fusion learning and precise extraction of features from regions with different importance. Experimental results on real vulnerability datasets Big-Vul and SARD show that the proposed method VulATGCN achieves an average F1 score of 82.9%, which is 10.4% to 132.9% higher than deep learning-based vulnerability detection methods such as VulSniper, VulMPFF, and MGVD, with an average increase of approximately 52.9%.

    Table and Figures | Reference | Related Articles | Metrics
    Recommendation algorithm of graph contrastive learning based on hybrid negative sampling
    Renjie TIAN, Mingli JING, Long JIAO, Fei WANG
    Journal of Computer Applications    2025, 45 (4): 1053-1060.   DOI: 10.11772/j.issn.1001-9081.2024040419
    Abstract375)   HTML21)    PDF (1954KB)(744)       Save

    Contrastive Learning (CL) has the ability to extract self-supervised signals from raw data, providing strong support for addressing data sparsity in recommender systems. However, most existing CL-based recommendation algorithms focus on improving model structures and data augmentation methods, and ignoring the importance of enhancing negative sample quality and uncovering potential implicit relationships between users and items in recommendation tasks. To address this issue, a Hybrid negative Sampling-based Graph Contrastive Learning recommendation algorithm (HSGCL) was proposed. Firstly, differing from the uniform sampling method to sample from real data, a positive sample hybrid method was used by the proposed algorithm to integrate positive sample information into negative samples. Secondly, informative hard negative samples were created through a skip-mix method. Meanwhile, multiple views were generated by altering the graph structure using Node Dropout (ND), and controlled uniform noise smoothing was introduced in the embedding space to adjust the uniformity of learning representations. Finally, the main recommendation task and CL task were trained jointly. Numerical experiments were conducted on three public datasets: Douban-Book, Yelp2018, and Amazon-Kindle. The results show that compared to the baseline model Light Graph Convolution Network (LightGCN), the proposed algorithm improves the Recall@20 by 23%, 13%, and 7%, respectively, and Normalized Discounted Cumulative Gain (NDCG@20) by 32%, 14%, and 5%, respectively, and performs excellently in enhancing the diversity of negative sample embedding information. It can be seen that by improving negative sampling method and data augmentation, the proposed algorithm improves negative sample quality, the uniformity of representation distribution, and accuracy of the recommendation algorithm.

    Table and Figures | Reference | Related Articles | Metrics
    Developer recommendation for open-source projects based on collaborative contribution network
    Lan YOU, Yuang ZHANG, Yuan LIU, Zhijun CHEN, Wei WANG, Xing ZENG, Zhangwei HE
    Journal of Computer Applications    2025, 45 (4): 1213-1222.   DOI: 10.11772/j.issn.1001-9081.2024040454
    Abstract375)   HTML3)    PDF (4564KB)(58)       Save

    Recommending developers for open-source projects is of great significance to the construction of open-source ecology. Different from traditional software development, developers, projects, organizations and correlations in the open-source field reflect the characteristics of open collaborative projects, and their embedded semantics help to recommend developers accurately for open-source projects. Therefore, a Developer Recommendation method based on Collaborative Contribution Network (DRCCN) was proposed. Firstly, a CCN was constructed by utilizing the contribution relationships among Open-Source Software (OSS) developers, OSS projects and OSS organizations. Then, based on CCN, a three-layer deep heterogeneous GraphSAGE (Graph SAmple and aggreGatE) Graph Neural Network (GNN) model was constructed to predict the links between developer nodes and open-source project nodes, so as to generate the corresponding embedding pairs. Finally, according to the prediction results, the K-Nearest Neighbor (KNN) algorithm was adopted to complete the developer recommendation. The proposed model was trained and tested on GitHub dataset, and the experimental results show that compared to the contrastive learning model for sequential recommendation CL4SRec (Contrastive Learning for Sequential Recommendation), DRCCN improves the precision, recall, and F1 score by approximately 10.7%, 2.6%, and 4.2%, respectively. It can be seen that the proposed model can provide important reference for the developer recommendation of open-source community projects.

    Table and Figures | Reference | Related Articles | Metrics
    Bias challenges of large language models: identification, evaluation, and mitigation
    Yuemei XU, Yuqi YE, Xueyi HE
    Journal of Computer Applications    2025, 45 (3): 697-708.   DOI: 10.11772/j.issn.1001-9081.2024091350
    Abstract374)   HTML16)    PDF (2112KB)(732)       Save

    Aiming at the unsafety and being out of control problems caused by biases in the output of Large Language Model (LLM), research status, techniques, and limitations related to biases in the existing LLMs were sorted deeply and analyzed from three aspects: bias identification, evaluation, and mitigation. Firstly, three key techniques of LLM were summed up to study the basic reasons of LLMs’ inevitable intrinsic biases. Secondly, three types of biases in LLMs were categorized into linguistic bias, demographic bias, and evaluation bias, and characteristics and causes of the biases were explored. Thirdly, a systematic review of the existing LLM bias evaluation benchmarks was carried out, and the strengths and weaknesses of these general-purpose, language-specific, and task-specific benchmarks were discussed. Finally, current LLM bias mitigation techniques were analyzed in depth from both model bias mitigation and data bias mitigation perspectives, and directions for their future refinement were pointed out. At the same time, the research directions for biases in LLMs were indicated by analysis: multi-cultural attribute evaluation of bias, lightweight bias mitigation techniques, and enhancement of the interpretability of biases.

    Table and Figures | Reference | Related Articles | Metrics
    Multi-strategy retrieval-augmented generation method for military domain knowledge question answering systems
    Yanping ZHANG, Meifang CHEN, Changhai TIAN, Zibo YI, Wenpeng HU, Wei LUO, Zhunchen LUO
    Journal of Computer Applications    2025, 45 (3): 746-754.   DOI: 10.11772/j.issn.1001-9081.2024060833
    Abstract373)   HTML16)    PDF (1254KB)(553)       Save

    The military domain knowledge question answering system based on Retrieval-Augmented Generation (RAG) has become an important tool for modern intelligence personnel to collect and analyze intelligence gradually. Focusing on the issue that the application strategies of RAG methods currently suffer from poor portability in hybrid retrieval as well as the problem of semantic drift caused by unnecessary query rewriting easily, a Multi-Strategy Retrieval-Augmented Generation (MSRAG) method was proposed. Firstly, the retrieval model was matched adaptively to recall relevant text based on query characteristics of the user input. Secondly, a text filter was utilized to extract the key text fragments that can answer the question. Thirdly, the content validity was assessed by the text filter to trigger query rewriting based on synonym expansion, and the initial query was merged with the rewritten information and used as input of the retrieval controller for more targeted re-retrieval. Finally, the key text fragments that can answer the question were merged with the question, prompt engineering input was used to generate answer model, and the response generated by the model was returned to the user. Experimental results show that compared to the convex linear combination RAG method, MSRAG method improves the ROUGE-L (Recall-Oriented Understudy for Gisting Evaluation Longest common subsequence) by 14.35 percentage points on the Military domain dataset (Military) and by 5.83 percentage points on the Medical dataset. It can be seen that MSRAG method has strong universality and portability, enables the reduction of the semantic drift caused by unnecessary query rewriting, and effectively helps large language models generate more accurate answers.

    Table and Figures | Reference | Related Articles | Metrics
    YOLOv5s-MRD: efficient fire and smoke detection algorithm for complex scenarios based on YOLOv5s
    Yang HOU, Qiong ZHANG, Zixuan ZHAO, Zhengyu ZHU, Xiaobo ZHANG
    Journal of Computer Applications    2025, 45 (4): 1317-1324.   DOI: 10.11772/j.issn.1001-9081.2024040527
    Abstract371)   HTML16)    PDF (4304KB)(736)       Save

    Current fire and smoke detection methods mainly rely on site inspection by staff, which results in low efficiency and poor real-time performance, so an efficient fire and smoke detection algorithm for complex scenarios based on YOLOv5s, called YOLOv5s-MRD (YOLOv5s-MPDIoU-RevCol-Dyhead), was proposed. Firstly, the MPDIoU (Maximized Position-Dependent Intersection over Union) method was employed to modify the border loss function, thereby enhancing the accuracy and efficiency of Bounding Box Regression (BBR) by adapting to BBR in overlapping or non-overlapping scenarios. Secondly, the RevCol (Reversible Column) network model concept was applied to reconstruct the backbone of YOLOv5s, transforming it into a backbone network with multi-column network architecture. At the same time, by incorporating reversible links across various layers of the model, so that the retention of feature information was maximized, thereby improving the network’s feature extraction capability. Finally, with the integration of Dynamic head detection heads, scale awareness, spatial awareness, and task awareness were unified, thereby improving detection heads’ accuracy and effectiveness significantly without additional computational cost. Experimental results demonstrate that on DFS (Data of Fire and Smoke) dataset, compared to the original YOLOv5s algorithm, the proposed algorithm achieves a 9.3% increase in mAP@0.5 (mean Average Precision), a 6.6% improvement in prediction accuracy, and 13.8% increase in recall. It can be seen that the proposed algorithm can meet the requirements of current fire and smoke detection application scenarios.

    Table and Figures | Reference | Related Articles | Metrics
    Dual-population dual-stage evolutionary algorithm for complex constrained multi-objective optimization problems
    Zhichao YUAN, Lei YANG, Jinglin TIAN, Xiaowei WEI, Kangshun LI
    Journal of Computer Applications    2025, 45 (8): 2656-2665.   DOI: 10.11772/j.issn.1001-9081.2024081130
    Abstract369)   HTML0)    PDF (2608KB)(397)       Save

    For Constrained Multi-Objective Optimization Problem (CMOP) with complex constraints, balancing the algorithm's convergence and diversity effectively while ensuring strict constraint satisfaction is a significant challenge. Therefore, a Dual-Population Dual-Stage Evolutionary Algorithm (DPDSEA) was proposed. In this algorithm, two independently evolving populations were introduced: the main and secondary populations, and the feasibility rules and an improved epsilon constraint handling method were used for updating, respectively. In the first stage, the main and secondary populations were employed to explore the Constrained Pareto Front (CPF) and the Unconstrained Pareto Front (UPF), respectively, to obtain positional information about the UPF and the CPF. In the second stage, a classification method was designed to classify CMOPs based on positions of the UPF and the CPF, thereby executing specific evolutionary strategies for different types of CMOPs. Additionally, a random perturbation strategy was proposed to perturb the secondary population evolved near the CPF randomly to generate some individuals on the CPF, thereby promoting convergence and distribution of the main population on the CPF. Finally, experiments were conducted on LIRCMOP and DASCMOP test sets to compare the proposed algorithm with six representative algorithms: CMOES (Constrained Multi-Objective Optimization based on Even Search), dp-ACS (dual-population evolutionary algorithm based on Adaptive Constraint Strength), c-DPEA (Dual-population based Evolutionary Algorithm for constrained multi-objective optimization), CAEAD (Constrained Evolutionary Algorithm based on Alternative Evolution and Degeneration), BiCo (evolutionary algorithm with Bidirectional Coevolution), and DDCMOEA (Dual-stage Dual-Population Evolutionary Algorithm for Constrained Multiobjective Optimization). The results show that DPDSEA achieves 15 best Inverted Generational Distance (IGD) values and 12 best Hyper Volume (HV) values in 23 problems, demonstrating DPDSEA’s performance advantages in handling complex CMOPs significantly.

    Table and Figures | Reference | Related Articles | Metrics
    Chinese spelling correction algorithm based on multi-modal information fusion
    Qing ZHANG, Fan YANG, Yuhan FANG
    Journal of Computer Applications    2025, 45 (5): 1528-1534.   DOI: 10.11772/j.issn.1001-9081.2024050628
    Abstract366)   HTML5)    PDF (1480KB)(123)       Save

    The goal of Chinese Spelling Correction (CSC) is to detect and correct character or word-level errors in user-input Chinese text, which commonly arise from semantic, phonetic, or glyphic similarities among Chinese characters. However, existing models often neglect local information, and fail to fully capture phonetic and glyphic similarities among different Chinese characters, as well as effectively integrate these similarities with semantic information. To address these issues, a new CSC algorithm based on multimodal information fusion was proposed, namely PWSpell. This algorithm utilized a convolutional attention mechanism to focus on local semantic information, employed Pinyin encoding to capture phonetic similarities among characters, and, for the first time, introduced Wubi encoding into the CSC domain for capturing glyphic similarities among Chinese characters. Additionally, it selectively integrated these two types of similarity information with semantic information processed by BERT (Bidirectional Encoder Representation from Transformers). Experimental results demonstrate that PWSpell improves error detection accuracy, precision, F1-score, as well as correction precision and F1-score on SIGHAN 2015 test set, with at least one percentage point increase in correction precision. Ablation experimental results also validate that the design of each module in PWSpell effectively improves its performance.

    Table and Figures | Reference | Related Articles | Metrics
    Dynamic detection method of eclipse attacks for blockchain node analysis
    Shuo ZHANG, Guokai SUN, Yuan ZHUANG, Xiaoyu FENG, Jingzhi WANG
    Journal of Computer Applications    2025, 45 (8): 2428-2436.   DOI: 10.11772/j.issn.1001-9081.2024081101
    Abstract361)   HTML9)    PDF (1546KB)(69)       Save

    Eclipse attacks, as a significant threat to blockchain network layer, can isolate the attacked node from entire network by controlling its network connections, thus affecting its ability to receive block and transaction information. On this basis, attackers can also launch double-spending and other attacks, which causes substantial damage to blockchain system. To address this issue, a dynamic detection method of eclipse attacks for blockchain node analysis was proposed by incorporating deep learning models. Firstly, Node Comprehensive Resilience Index (NCRI) was utilized to represent multidimensional attribute features of the nodes, and Graph ATtention network (GAT) was introduced to update the node features of network topology dynamically. Secondly, Convolutional Neural Network (CNN) was employed to fuse multidimensional features of the nodes. Finally, a Multi-Layer Perceptron (MLP) was used to predict vulnerability of the entire network. Experimental results indicate that an accuracy of up to 89.80% is achieved by the method under varying intensities of eclipse attacks, and that the method maintains stable performance in continuously changing blockchain networks.

    Table and Figures | Reference | Related Articles | Metrics
    Multivariate time series prediction method combining local and global correlation
    Xiang WANG, Zhixiang CHEN, Guojun MAO
    Journal of Computer Applications    2025, 45 (9): 2806-2816.   DOI: 10.11772/j.issn.1001-9081.2024091267
    Abstract361)   HTML9)    PDF (2188KB)(96)       Save

    Concerning the insufficient integration of local and global dependencies in the existing time series models, a method integrating local and global correlations for multivariate time series prediction, namely PatchLG (Patch-integrated Local-Global correlation method) was proposed. The proposed method was based on three key components: 1) segmenting the time series into multiple patches, thereby preserving the locality of the time series while making it easier for the model to capture global dependencies; 2) utilizing the depthwise separable convolution and self-attention mechanism to model local and global correlations; 3) decomposing the time series into trend and seasonal items to perform predictions simultaneously, and combining the prediction results of these two items to obtain the final result. Experimental results on seven benchmark datasets demonstrate that PatchLG achieves average improvements of 3.0% and 2.9% in Mean-Square Error (MSE) and Mean Absolute Error (MAE), respectively, compared to the optimal baseline method PatchTST (Patch Time Series Transformer), and has low actual running time and memory usage, validating the effectiveness of PatchLG in time series prediction.

    Table and Figures | Reference | Related Articles | Metrics
    Efficient fine-tuning method of large language models for test case generation
    Peng CAO, Guangqi WEN, Jinzhu YANG, Gang CHEN, Xinyi LIU, Xuechun JI
    Journal of Computer Applications    2025, 45 (3): 725-731.   DOI: 10.11772/j.issn.1001-9081.2024111598
    Abstract360)   HTML20)    PDF (1215KB)(526)       Save

    Data-driven automated generation technology of unit test cases has problems of low coverage and poor readability, struggling to meet the increasing demand for testing. Recently, Large Language Model (LLM) has shown great potential in code generation tasks. However, due to the differences in functional and coding styles of code data, LLMs face the challenges of catastrophic forgetting and resource constraints. To address these problems, a transfer learning idea was proposed by fine-tuning coding and functional styles simultaneously, and an efficient fine-tuning training method was developed for LLMs in generating unit test cases. Firstly, the widely used instruction datasets were adopted to align LLM with instructions, and the instruction sets were divided by task types. At the same time, the weight increments with task-specific features were extracted and stored. Secondly, an adaptive style extraction module was designed for dealing with various coding styles with noise-resistant learning and coding style backtracking learning in the module. Finally, joint training of the functional and coding style increments was performed respectively on the target domain, thereby realizing efficient adaptation and fine-tuning on the target domains with limited resources. Experimental results of test case generation on SF110 Corpus of Classes dataset indicate that the proposed method outperforms the methods for comparison. Compared to the mainstream code generation LLMs — Codex, Code Llama and DeepSeek-Coder, the proposed method has the compilation rate increased by 0.8%, 43.5% and 33.8%, respectively; the branch coverage increased by 3.1%, 1.0%, and 17.2% respectively; and the line coverage increased by 4.1%, 6.5%, and 15.5% respectively; verifying the superiority of the proposed method in code generation tasks.

    Table and Figures | Reference | Related Articles | Metrics
    Review of optimization methods for end-to-end speech-to-speech translation
    Wei ZONG, Yue ZHAO, Yin LI, Xiaona XU
    Journal of Computer Applications    2025, 45 (5): 1363-1371.   DOI: 10.11772/j.issn.1001-9081.2024050666
    Abstract357)   HTML28)    PDF (2566KB)(233)       Save

    Speech-to-Speech Translation (S2ST) is an emerging research direction in intelligent speech field, aiming to seamlessly translate spoken language from one language into another language. With increasing demands for cross-linguistic communication, S2ST has garnered significant attention, driving continuous research. Traditional cascaded models face numerous challenges in S2ST, including error propagation, inference latency, and inability to translate languages without a writing system. To address these issues, achieving direct S2ST using end-to-end models has become a key research focus. Based on a comprehensive survey of end-to-end S2ST models, a detailed analysis and summary of various end-to-end S2ST models was provided, the existing related technologies were reviewed, and the challenges were summarized into three categories: modeling burden, data scarcity, and real-world application, with a focus on how existing work has addressed these three categories. The extensive comprehension and generative capabilities of Large Language Models (LLMs) offer new possibilities for S2ST, while simultaneously presenting additional challenges. Exploring effective applications of LLMs in S2ST was also discussed, and potential future development directions were looked forward.

    Table and Figures | Reference | Related Articles | Metrics
    Multi-view and multi-scale contrastive learning for graph collaborative filtering
    Weichao DANG, Xinyu WEN, Gaimei GAO, Chunxia LIU
    Journal of Computer Applications    2025, 45 (4): 1061-1068.   DOI: 10.11772/j.issn.1001-9081.2024030393
    Abstract356)   HTML13)    PDF (1493KB)(427)       Save

    A Multi-View and Multi-Scale Contrastive Learning for graph collaborative filtering (MVMSCL) model was proposed to address the limitations of single view and the data sparsity in graph collaborative filtering recommendation methods. Firstly, an initial interaction diagram was constructed on the basis of user-item interactions, and multiple potential intentions in user-item interactions were considered to build multi-intention decomposition view. Secondly, the adjacency matrix was improved using high-order relationships to construct a collaborative neighbor view. Thirdly, the irrelevant noise interactions were removed to construct the adaptively enhanced initial interaction diagram and multi-intention decomposition view. Finally, contrastive learning paradigms with local, cross-layer, and global scales were introduced to generate self-supervised signals, thereby improving the recommendation performance. Experimental results on three public datasets, Gowalla, Amazon-book and Tmall, demonstrate that the recommendation performance of MVMSCL surpasses that of the comparison models. Compared with the optimal baseline model DCCF (Disentangled Contrastive Collaborative Filtering framework), MVMSCL has the Recall@20 increased by 5.7%, 14.5% and 10.0%, respectively, and the Normalized Discounted Cumulative Gain NDCG@20 increased by 4.6%, 17.9% and 11.5%, respectively.

    Table and Figures | Reference | Related Articles | Metrics
    Dynamic UAV path planning based on modified whale optimization algorithm
    Xingwang WANG, Qingyang ZHANG, Shouyong JIANG, Yongquan DONG
    Journal of Computer Applications    2025, 45 (3): 928-936.   DOI: 10.11772/j.issn.1001-9081.2024030370
    Abstract354)   HTML8)    PDF (7205KB)(1830)       Save

    A dynamic Unmanned Aerial Vehicle (UAV) path planning method based on Modified Whale Optimization Algorithm (MWOA) was proposed for the problem of UAV path planning in environments with complex terrains. Firstly, by analyzing the mountain terrain, dynamic targets, and threat zones, a three-dimensional dynamic environment and a UAV route model were established. Secondly, an adaptive step size Gaussian walk strategy was proposed to balance the algorithm’s abilities of global exploration and local exploitation. Finally, a supplementary correction strategy was proposed to correct the optimal individual in the population, and combined with differential evolution strategy, the population was avoided from falling into local optimum while improving convergence accuracy of the algorithm. To verify the effectiveness of MWOA, MWOA and intelligent algorithms such as Whale Optimization Algorithm (WOA), and Artificial Hummingbird Algorithm (AHA) were used to solve the CEC2022 test functions, and validated in designed UAV dynamic environment model. The comparative analysis of simulation results shows that compared with the traditional WOA, MWOA improves the convergence accuracy by 6.1%, and reduces the standard deviation by 44.7%. The above proves that the proposed MWOA has faster convergence and higher accuracy, and can handle UAV path planning problems effectively.

    Table and Figures | Reference | Related Articles | Metrics
    Multi-scale information fusion time series long-term forecasting model based on neural network
    Lanhao LI, Haojun YAN, Haoyi ZHOU, Qingyun SUN, Jianxin LI
    Journal of Computer Applications    2025, 45 (6): 1776-1783.   DOI: 10.11772/j.issn.1001-9081.2024070930
    Abstract354)   HTML9)    PDF (1260KB)(82)       Save

    Time series data come from a wide range of social fields, from meteorology to finance and to medicine. Accurate long-term prediction is a key issue in time series data analysis, processing, and research. Aiming at exploitation and utilization of the correlation of different scales in time series data, a multi-scale information fusion time series long-term forecasting model based on neural network — ScaleNN was proposed with the purpose of better handling multi-scale problem in time series data to achieve more accurate long-term forecast. Firstly, fully connected neural network and convolutional neural network were combined to extract both global and local information effectively, and the two were aggregated for prediction. Then, by introducing a compression mechanism in the global information representation module, longer sequence input was accepted with a lighter structure, which increased perceptual range of the model and improved the model’s performance. Numerous experimental results demonstrate that ScaleNN outperforms the current excellent model in this field — PatchTST (Patch Time Series Transformer) on multiple real-world datasets. In specific, the running time is shortened by 35% with only 19% parameters required. It can be seen that ScaleNN can be applied to time series prediction problems in various fields widely, providing a foundation for forecasting in areas such as traffic flow prediction and weather forecasting.

    Table and Figures | Reference | Related Articles | Metrics
    Group recommendation model by graph neural network based on multi-perspective learning
    Cong WANG, Yancui SHI
    Journal of Computer Applications    2025, 45 (4): 1205-1212.   DOI: 10.11772/j.issn.1001-9081.2024030337
    Abstract352)   HTML3)    PDF (2528KB)(163)       Save

    Focusing on the problem that it is difficult for the existing group recommendation models based on Graph Neural Networks (GNNs) to fully utilize explicit and implicit interaction information, a Group Recommendation by GNN based on Multi-perspective learning (GRGM) model was proposed. Firstly, hypergraphs, bipartite graphs, as well as hypergraph projections were constructed according to the group interaction data, and the corresponding GNN was adopted aiming at the characteristics of each graph to extract node features of the graph, thereby fully expressing the explicit and implicit relationships among users, groups, and items. Then, a multi-perspective information fusion strategy was proposed to obtain the final group and item representations. Experimental results on Mafengwo, CAMRa2011, and Weeplases datasets show that compared to the baseline model ConsRec, GRGM model improves the Hit Ratio (HR@5, HR@1) and Normalized Discounted Cumulative Gain (NDCG@5, NDCG@10) by 3.38%, 1.96% and 3.67%, 3.84%, respectively, on Mafengwo dataset, 2.87%, 1.18% and 0.96%, 1.62%, respectively, on CAMRa2011 dataset, and 2.41%, 1.69% and 4.35%, and 2.60%, respectively, on Weeplaces dataset. It can be seen that GRGM model has better recommendation performance compared with the baseline models.

    Table and Figures | Reference | Related Articles | Metrics
    Design and practice of intelligent tutoring algorithm based on personalized student capability perception
    Yanmin DONG, Jiajia LIN, Zheng ZHANG, Cheng CHENG, Jinze WU, Shijin WANG, Zhenya HUANG, Qi LIU, Enhong CHEN
    Journal of Computer Applications    2025, 45 (3): 765-772.   DOI: 10.11772/j.issn.1001-9081.2024101550
    Abstract346)   HTML3)    PDF (2239KB)(381)       Save

    With the rapid development of Large Language Models (LLMs), dialogue assistants based on LLM have emerged as a new learning method for students. These assistants generate answers through interactive Q&A, helping students solve problems and improve learning efficiency. However, the existing conversational assistants ignore students’ personalized needs, failing to provide personalized answers for “tailored instruction”. To address this, a personalized conversational assistant framework based on student capability perception was proposed, which is consisted of two main modules: a capability perception module that analyzes students’ exercise records to explore the knowledge proficiency of the students, and a personalized answer generation module that creates personalized answers based on the capabilities of the students. Three implementation paradigms — instruction-based, data-driven, and agent-based ones were designed to explore the framework’s practical effects. In the instruction-based assistant, the inference capabilities of LLMs were used to explore knowledge proficiency of the students from students’ exercise records to help generate personalized answers; in the small model-driven assistant, a Deep Knowledge Tracing (DKT) model was employed to generate students’ knowledge proficiency; in the agent-based assistant, tools such as student capability perception, personalized detection, and answer correction were integrated using LLM agent method for assistance of answer generation. Comparison experiments using Chat General Language Model (ChatGLM) and GPT4o_mini demonstrate that LLMs applying all three paradigms can provide personalized answers for students, the accuracy of the agent-based paradigm is higher, indicating the superior student capability perception and personalized answer generation of this paradigm.

    Table and Figures | Reference | Related Articles | Metrics
    Encrypted traffic classification method based on Attention-1DCNN-CE
    Haijun GENG, Yun DONG, Zhiguo HU, Haotian CHI, Jing YANG, Xia YIN
    Journal of Computer Applications    2025, 45 (3): 872-882.   DOI: 10.11772/j.issn.1001-9081.2024030325
    Abstract345)   HTML5)    PDF (2750KB)(1926)       Save

    To address the problems of low multi-classification accuracy, poor generalization, and easy privacy invasion in traditional encrypted traffic identification methods, a multi-classification deep learning model that combines Attention mechanism (Attention) with one-Dimensional Convolutional Neural Network (1DCNN) was proposed, namely Attention-1DCNN-CE. This model consists of three core components: 1) in the dataset preprocessing stage, the spatial relationship among packets in the original data stream was retained, and a cost-sensitive matrix was constructed on the basis of the sample distribution; 2) based on the preliminary extraction of encrypted traffic features, the Attention and 1DCNN models were used to mine deeply and compress the global and local features of the traffic; 3) in response to the challenge of data imbalance, by combining the cost-sensitive matrix with the Cross Entropy (CE) loss function, the sample classification accuracy of minority class was improved significantly, thereby optimizing the overall performance of the model. Experimental results show that on BOT-IOT and TON-IOT datasets, the overall identification accuracy of this model is higher than 97%. Additionally, on public datasets ISCX-VPN and USTC-TFC, this model performs excellently, and achieves performance similar to that of ET-BERT (Encrypted Traffic BERT) without the need for pre-training. Compared to Payload Encoding Representation from Transformer (PERT) on ISCX-VPN dataset, this model improves the F1 score in application type detection by 29.9 percentage points. The above validates the effectiveness of this model, so that this model provides a solution for encrypted traffic identification and malicious traffic detection.

    Table and Figures | Reference | Related Articles | Metrics
    Two-stage data selection method for classifier with low energy consumption and high performance
    Shuangshuang CUI, Hongzhi WANG, Jiahao ZHU, Hao WU
    Journal of Computer Applications    2025, 45 (6): 1703-1711.   DOI: 10.11772/j.issn.1001-9081.2024060883
    Abstract343)   HTML33)    PDF (2107KB)(140)       Save

    Aiming at the problems of large training data size, long training time and high carbon emission when constructing classification models using massive data, a two-stage data selection method TSDS (Two-Stage Data Selection) was proposed for low energy consumption and high classifier performance. Firstly, the clustering center was determined by modifying the cosine similarity, and the sample data was split and hierarchically clustered on the basis of dissimilar points. Then, the clustering results were sampled adaptively according to the data distribution, so as to obtain a high-quality subset. Finally, the subset was used to train on the classification model, which accelerated the training process and improved the model accuracy at the same time. Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP) classification models were constructed on six datasets, including Spambase, Bupa and Phoneme, to verify the performance of TSDS. Experimental results show that when the sample data compression ratio reaches 85.00%, TSDS can improve the classification model accuracy by 3 to 10 percentage points, and accelerates model training at the same time, with reducing the energy consumption of SVM classifiers by average 93.76%, and reducing that of MLP classifiers by average 75.41%. It can be seen that TSDS can shorten the training time and reduce the energy consumption, as well as improve the performance of classifiers in classification tasks in big data scenarios, thereby helping to achieve the “carbon peaking and carbon neutrality” goal.

    Table and Figures | Reference | Related Articles | Metrics
    Review of research on efficiency of federated learning
    Lina GE, Mingyu WANG, Lei TIAN
    Journal of Computer Applications    2025, 45 (8): 2387-2398.   DOI: 10.11772/j.issn.1001-9081.2024081119
    Abstract341)   HTML115)    PDF (702KB)(678)       Save

    Federated learning is a distributed machine learning framework that effectively addresses the data silo problem and is crucial for ensuring privacy protection for individuals and organizations. However, enhancing the efficiency of federated learning remains a pressing issue due to the unsatisfactory high cost of federated learning caused by the characteristics of this learning. Therefore, a comprehensive summary and investigation of current mainstream research on improving the efficiency of federated learning was provided. Firstly, the background of efficient federated learning, including its origins and core ideas, was reviewed, and the concepts as well as classification of federated learning were explained. Secondly, the efficiency challenges generated by federated learning were discussed and categorized into heterogeneous problems, personalized problems, and communication cost issues. Thirdly, on the above basis, detailed solutions to these efficiency problems were analyzed and discussed, and the research on efficiency of federated learning was categorized into two areas: model compression optimization methods and communication optimization methods, and investigated. Fourthly, by comparison analysis, the advantages and disadvantages of each federated learning method were summarized, and the challenges still exist in efficient federated learning were expounded. Finally, the future research directions in efficient federated learning field were given.

    Table and Figures | Reference | Related Articles | Metrics
    DU-FastGAN: lightweight generative adversarial network based on dynamic-upsample
    Guoyu XU, Xiaolong YAN, Yidan ZHANG
    Journal of Computer Applications    2025, 45 (10): 3067-3073.   DOI: 10.11772/j.issn.1001-9081.2024101535
    Abstract340)   HTML68)    PDF (3450KB)(291)       Save

    In recent years, Generative Adversarial Networks (GANs) have been widely used for data augmentation, which can solve the problem of insufficient training samples effectively and has important research significance for model training. However, the existing GAN models for data augmentation have problems such as high requirements for datasets and unstable model convergence, which can lead to distortion and deformation of the generated images. Therefore, a lightweight GAN based on dynamic-upsample — DU-FastGAN (Dynamic-Upsample-FastGAN) was proposed for data augmentation. Firstly, a generator was constructed through a dynamic-upsample module, which enables the generator to use upsampling methods of different granularities based on the size of the current feature map, thereby reconstructing textures, and enhancing overall structure and local detail quality of the synthesis. Secondly, in order to enable the model to better obtain global information flow of images, a weight information skip connection module was proposed to reduce the disturbance of convolution and pooling operations on features, thereby improving the model’s learning ability for different features, and making details of the generated images more realistic. Finally, a feature loss function was given to improve the quality of the model generation by calculating relative distance between the corresponding feature maps during the sampling process. Experimental results show that compared with methods such as FastGAN, MixDL (Mixup-based Distance Learning), and RCL-master (Reverse Contrastive Learning-master), DU-FastGAN achieves a maximum reduction of 23.47% in FID (Fréchet Inception Distance) on 10 small datasets, thereby reducing distortion and deformation problems in the generated images effectively, and improving the quality of the generated images. At the same time, DU-FastGAN achieves lightweight overhead with model training time within 600 min.

    Table and Figures | Reference | Related Articles | Metrics
    Deep symbolic regression method based on Transformer
    Pengcheng XU, Lei HE, Chuan LI, Weiqi QIAN, Tun ZHAO
    Journal of Computer Applications    2025, 45 (5): 1455-1463.   DOI: 10.11772/j.issn.1001-9081.2024050609
    Abstract337)   HTML6)    PDF (3565KB)(695)       Save

    To address the challenges of reduced population diversity and sensitivity to hyperparameters in solving Symbolic Regression (SR) problems by using genetic evolutionary algorithms, a Deep Symbolic Regression Technique (DSRT) method based on Transformer was proposed. This method employed autoregressive capability of Transformer to generate expression symbol sequence. Subsequently, the transformation of the fitness value between the data and the expression symbol sequence was served as a reward value, and the model parameters were updated through deep reinforcement learning, so that the model was able to output expression sequence that fitted the data better, and with the model’s continuous converging, the optimal expression was identified. The effectiveness of the DSRT method was validated on the SR benchmark dataset Nguyen, and it was compared with DSR (Deep Symbolic Regression) and GP (Genetic Programming) algorithms within 200 iterations. Experimental results confirm the validity of DSRT method. Additionally, the influence of various parameters on DSRT method was discussed, and an experiment to predict the formula for surface pressure coefficient of an aircraft airfoil using NACA4421 dataset was performed. The obtained formula was compared with the Kármán-Tsien formula, yielding a mathematical formula with a lower Root Mean Square Error (RMSE).

    Table and Figures | Reference | Related Articles | Metrics
    Vehicular digital evidence preservation and access control based on consortium blockchain
    Xin SHAO, Zigang CHEN, Xingchun YANG, Haihua ZHU, Wenjun LUO, Long CHEN, Yousheng ZHOU
    Journal of Computer Applications    2025, 45 (6): 1902-1910.   DOI: 10.11772/j.issn.1001-9081.2024030263
    Abstract335)   HTML25)    PDF (2356KB)(332)       Save

    In today’s society, the issue of frequent vehicle traffic accidents is still a serious practical problem. In order to ensure the trusted preservation and legal use of vehicle digital evidence, it is necessary to adopt advanced security technologies and strict access control mechanisms. Aiming at the preservation and sharing requirements of digital evidence on vehicle devices, an evidence preservation and access control scheme based on consortium blockchain was proposed. Firstly, based on consortium blockchain technology and InterPlanetary File System (IPFS), on-chain and off-chain storage of the digital evidence was realized, while confidentiality of the evidence was guaranteed by symmetric key and integrity of the evidence was verified by hash value. Secondly, in the process of uploading, managing and downloading the digital evidence, an access control mechanism combining attributes and roles was introduced to realize fine-grained and dynamic access control management, thereby ensuring legal access and sharing of the evidence. Finally, comparison and performance analysis of the schemes were conducted. Experimental results show that the proposed scheme has confidentiality, integrity and non-repudiation with stability in the case of large number of concurrent requests.

    Table and Figures | Reference | Related Articles | Metrics
    Emotion recognition method compatible with missing modal reasoning
    Bing YIN, Zhenhua LING, Yin LIN, Changfeng XI, Ying LIU
    Journal of Computer Applications    2025, 45 (9): 2764-2772.   DOI: 10.11772/j.issn.1001-9081.2024091262
    Abstract333)   HTML7)    PDF (1596KB)(127)       Save

    Aiming at the problem of model compatibility caused by modality absence in real complex scenes, an emotion recognition method was proposed, supporting input from any available modality. Firstly, during the pre-training and fine-tuning stages, a modality-random-dropout training strategy was adopted to ensure model compatibility during reasoning. Secondly, a spatio-temporal masking strategy and a feature fusion strategy based on cross-modal attention mechanism were proposed respectively, so as to reduce risk of the model over-fitting and enhance cross-modal feature fusion effects. Finally, to solve the noise label problem brought by inconsistent emotion labels across various modalities, an adaptive denoising strategy based on multi-prototype clustering was proposed. In the strategy, class centers were set for different modalities, and noisy labels were removed by comparing the consistency between clustering categories of each modality’ features and their labels. Experimental results show that on a self-built dataset, compared with the baseline Audio-Visual Hidden unit Bidirectional Encoder Representation from Transformers (AV-HuBERT), the proposed method improves the Weighted Average Recall rate (WAR) index by 6.98 percentage points of modality alignment reasoning, 4.09 percentage points while video modality is absent, and 33.05 percentage points while audio modality is absent; compared with AV-HuBERT on public video dataset DFEW, the proposed method achieves the highest WAR, reaching 68.94%.

    Table and Figures | Reference | Related Articles | Metrics
    Multi-target detection algorithm for traffic intersection images based on YOLOv9
    Yanhua LIAO, Yuanxia YAN, Wenlin PAN
    Journal of Computer Applications    2025, 45 (8): 2555-2565.   DOI: 10.11772/j.issn.1001-9081.2024071020
    Abstract333)   HTML26)    PDF (5505KB)(867)       Save

    Aiming at the problem of complex traffic intersection images, the difficulty in detecting small targets, and the tendency for occlusion between targets, as well as the color distortion, noise, and blurring caused by changes in weather and lighting, a multi-target detection algorithm ITD-YOLOv9(Intersection Target Detection-YOLOv9) for traffic intersection images based on YOLOv9 (You Only Look Once version 9) was proposed. Firstly, the CoT-CAFRNet (Chain-of-Thought prompted Content-Aware Feature Reassembly Network) image enhancement network was designed to improve image quality and optimize input features. Secondly, the iterative Channel Adaptive Feature Fusion (iCAFF) module was added to enhance feature extraction for small targets as well as overlapped and occluded targets. Thirdly, the feature fusion pyramid structure BiHS-FPN (Bi-directional High-level Screening Feature Pyramid Network) was proposed to enhance multi-scale feature fusion capability. Finally, the IF-MPDIoU (Inner-Focaler-Minimum Point Distance based Intersection over Union) loss function was designed to focus on key samples and enhance generalization ability by adjusting variable factors. Experimental results show that on the self-made dataset and SODA10M dataset, ITD-YOLOv9 algorithm achieves 83.8% and 56.3% detection accuracies and 64.8 frame/s and 57.4 frame/s detection speeds, respectively; compared with YOLOv9 algorithm, the detection accuracies are improved by 3.9 and 2.7 percentage points respectively. It can be seen that the proposed algorithm realizes multi-target detection at traffic intersections effectively.

    Table and Figures | Reference | Related Articles | Metrics
    Multi-source data representation learning model based on tensorized graph convolutional network and contrastive learning
    Yufei LONG, Yuchen MOU, Ye LIU
    Journal of Computer Applications    2025, 45 (5): 1372-1378.   DOI: 10.11772/j.issn.1001-9081.2024071001
    Abstract328)   HTML17)    PDF (821KB)(267)       Save

    To address the issues of existing multi-source data representation learning models in processing large-scale, complex, and high-dimensional data, specifically the tendency to overlook high-order association among different sources, and susceptibility to noise, a Multi-Source data representation learning model based on Tensorized Graph convolutional network and Contrastive learning, namely MS-TGC, was proposed. Firstly, the K-Nearest Neighbors (KNN) algorithm and Graph Convolutional Network (GCN) were used to unify multi-source data dimensions, forming tensorized multi-source data. Then, a defined tensor graph convolution operator was applied to perform high-dimensional graph convolution operations, enabling simultaneous learning of intra-source and inter-source information. Finally, a multi-source contrastive learning paradigm was constructed to enhance the accuracy of representation learning in noisy data and improve robustness against noise by incorporating contrastive constraints based on semantic consistency and label consistency. Experimental results show that when the labeled sample ratio is 0.3, MS-TGC achieves 1.36 and 5.53 percentage points higher semi-supervised classification accuracy than CONMF (Co-consensus Orthogonal Non-negative Matrix Factorization) on BDGP and 20newsgroup datasets, respectively. These results indicate that MS-TGC effectively captures inter-source correlations, reduces noise interference, and achieves high-quality multi-source data representations.

    Table and Figures | Reference | Related Articles | Metrics
    Entity-relation extraction strategy in Chinese open-domains based on large language model
    Yonggang GONG, Shuhan CHEN, Xiaoqin LIAN, Qiansheng LI, Hongming MO, Hongyu LIU
    Journal of Computer Applications    2025, 45 (10): 3121-3130.   DOI: 10.11772/j.issn.1001-9081.2024101536
    Abstract327)   HTML12)    PDF (3025KB)(165)       Save

    Large Language Models (LLMs) face issues of unstable extraction performance in Entity-Relation Extraction (ERE) tasks in Chinese open-domains, and have low precision in recognizing texts and annotated categories in certain specific fields. Therefore, a Chinese open-domain entity-relation extraction strategy based on LLM, called Multi-Level Dialog Strategy for Large Language Model (MLDS-LLM), was proposed. In the strategy, the superior semantic understanding and transfer learning capabilities of LLMs were used to achieve entity-relation extraction through multi-turn dialogues of different tasks. Firstly, structured summaries were generated by using LLM based on the structured logic of open-domain text and a Chain-of-Thought (CoT) mechanism, thereby avoiding relational and factual hallucinations generated by model as well as the problem of inability to consider subsequent information. Then, the limitations of the context window were reduced through the use of a text simplification strategy and the introduction of a replaceable vocabulary. Finally, multi-level prompt templates were constructed on the basis of structured summaries and simplified texts, the influence of the parameter temperature on ERE was explored using LLaMA-2-70B model, and the Precision, Recall, F1 value (F1), and Exact Match (EM) values of entity-relation extraction by LLaMA-2-70B model were tested before and after applying the proposed strategy. Experimental results demonstrate that the proposed strategy enhances the performance of LLM in Named Entity Recognition (NER) and Relation Extraction (RE) on five different domain Chinese datasets such as CL-NE-DS, DiaKG, and CCKS2021. Particularly on the DiaKG and IEPA datasets, which are highly specialized with poor zero-shot test results of model, compared to few-shot prompt test, the model has the precision of NER improved by 9.3 and 6.7 percentage points respectively with EM values increased by 2.7 and 2.2 percentage points respectively, and has the precision of RE improved by 12.2 and 16.0 percentage points respectively with F1 values increased by 10.7 and 10.0 percentage points respectively, proving that the proposed strategy enhances performance of LLM in ERE effectively and solves problem of unstable model performance.

    Table and Figures | Reference | Related Articles | Metrics
    Federated learning fairness algorithm based on personalized submodel and K-means clustering
    Zhongrui JING, Xuebin CHEN, Yinlong JIAN, Qi ZHONG, Zhenbo ZHANG
    Journal of Computer Applications    2025, 45 (12): 3747-3756.   DOI: 10.11772/j.issn.1001-9081.2024121794
    Abstract324)   HTML37)    PDF (995KB)(177)       Save

    Traditional Federated Learning (FL) does not consider collaborative fairness, leading to a mismatch between the reward obtained by the client and its actual contribution. To address this issue, a Federated learning fairness algorithm based on Personalized Submodel and K-means clustering (FedPSK) was proposed. Firstly, the neurons in the neural network were clustered according to their activation patterns, and only the importance of the cluster center neurons after clustering was evaluated. And the scores of the cluster center neurons were used to represent the scores of other neurons in the cluster, which reduced the time consumption of neuron evaluation. Then, the number of neurons and their labeling included in the client submodel were selected through hierarchical selection method, and a submodel with a complete neural network structure was constructed for each client. Finally, collaborative fairness was achieved by distributing submodels to the clients. Experimental results on different datasets show that FedPSK improves the correlation coefficient of fairness measurement by 2.70% compared with FedSAC (Federated learning framework with dynamic Submodel Allocation for Collaborative fairness). In terms of time overhead, FedPSK reduces by at least 84.12% compared with FedSAC. It can be seen that FedPSK improves the fairness of FL algorithm, and reduces the time overhead of the algorithm execution greatly, verifying the efficiency of the proposed algorithm.

    Table and Figures | Reference | Related Articles | Metrics
    Speaker-emotion voice conversion method with limited corpus based on large language model and pre-trained model
    Chaofeng LU, Ye TAO, Lianqing WEN, Fei MENG, Xiugong QIN, Yongjie DU, Yunlong TIAN
    Journal of Computer Applications    2025, 45 (3): 815-822.   DOI: 10.11772/j.issn.1001-9081.2024010013
    Abstract321)   HTML4)    PDF (1966KB)(703)       Save

    Aiming at the problems that few people have combined research on speaker conversion and emotional voice conversion, and the emotional corpora of a target speaker in actual scenes are usually small, which are not enough to train strong generalization models from scratch, a Speaker-Emotion Voice Conversion with Limited corpus (LSEVC) was proposed with fusion of large language model and pre-trained emotional speech synthesis model. Firstly, a large language model was used to generate text with required emotion tags. Secondly, a pre-trained emotional speech synthesis model was fine-tuned by using the target speaker corpus to embed into the target speaker. Thirdly, the emotional speech was synthesized from the generated text for data augmentation. Fourthly, the synthesized speech and source target speech were used to co-train speaker-emotion voice conversion model. Finally, to further enhance speaker similarity and emotional similarity of converted speech, the model was fine-tuned by using source target speaker’s emotional speech. Experiments were conducted on publicly available corpora and a Chinese fiction corpus. Experimental results show that the proposed method outperforms CycleGAN-EVC, Seq2Seq-EVC-WA2, SMAL-ET2 and other methods when considering evaluation indicators — Emotional similarity Mean Opinion Score (EMOS), Speaker similarity Mean Opinion Score (SMOS), Mel Cepstral Distortion (MCD), and Word Error Rate (WER).

    Table and Figures | Reference | Related Articles | Metrics
2026 Vol.46 No.1

Current Issue
Archive
Honorary Editor-in-Chief: ZHANG Jingzhong
Editor-in-Chief: XU Zongben
Associate Editor: SHEN Hengtao XIA Zhaohui
Domestic Post Distribution Code: 62-110
Foreign Distribution Code: M4616
Address:
No. 9, 4th Section of South Renmin Road, Chengdu 610041, China
Tel: 028-85224283-803
  028-85222239-803
Website: www.joca.cn
E-mail: bjb@joca.cn
WeChat
Join CCF