Frontier research and typical applications of large models

Project Articles

Frontier research and typical applications of large models

“大模型”（大规模预训练模型）自面世以来，其技术发展日新月异。大模型是引领人工智能领域的变革式新范式，能够提供强大的知识表示与聚合能力，已展现出可与人脑媲美的视觉感知与逻辑推理能力。在超大参数规模、海量训练数据和强大计算资源支撑下，大规模基础模型（CLIP、SAM、GPT-4、Sora、LLaMA等）已成为一系列下游任务的基石，展示出强大的任务性能与卓越的泛化能力，在自然语言处理、计算机视觉、工业数字化和智能化等多个领域有着重要突破和广泛应用，其技术影响力已超越计算机领域，成为多学科交叉创新的关键驱动力。

鉴于大模型技术对计算机应用领域的巨大影响，《计算机应用》组织编委及其研究团队撰写并向社会广泛征集相关论文，最后从中遴选20篇从多个角度全面介绍大模型的前沿研究现状和典型应用场景。

在这些文章中，有关于语言大模型在自然语言理解、知识图谱构建、知识问答系统、个性化学习系统等方面的典型应用，也有用于图像处理和分析的视觉大模型应用实例，还有多模态大模型的使用探索。既有关于预训练、数据增强、参数微调、提示学习等大模型核心技术的研究和讨论，也有将大模型成果用于各行各业及各种应用场景的介绍，特别是还有对大模型带来的偏见和幻觉等问题的分析和对策。

我刊编委——四川大学吕建成教授，中国科学技术大学刘淇教授、陈恩红教授，哈尔滨工业大学（深圳）陈斌教授，华南师范大学汤庸教授，中山大学操小春教授及其团队对大模型在自然语言理解、目标检测、智能教学和社交网络等方面应用的技术路线及解决方案进行了较为具体和全面的介绍。另外，特别邀请了中国科学院大学秦小林教授撰写综述文章，“介绍大模型的基础架构、训练技术及发展历程，分析当前通用大模型关键技术及以大模型为底座的先进融合技术，进一步阐述了大模型在实际应用中面临的挑战，包括数据偏差、模型幻觉、计算资源消耗等问题，并对大模型的未来进行了展望”。

《计算机应用》特别策划的“大模型前沿研究与典型应用”专题，聚焦大模型研究的最新进展、技术创新及典型应用，旨在促进学术界、工业界与跨学科领域之间的知识分享与合作。

该专题作者单位中既有中国科学院计算技术研究所、中国科学院成都计算机应用研究所、人工智能国家重点实验室、哈尔滨工业大学（深圳）国际人工智能研究院、人工智能与数字经济广东省实验室、安徽省人工智能实验室、大数据智能分析与行业应用湖北省重点实验室等从事“大模型”方面研究和应用的科研院所，也有中国科学技术大学、四川大学、中山大学、华南师范大学、北京航空航天大学、东北大学、河海大学、山西大学、苏州大学、北京外国语大学、海军工程大学、湖北大学、贵州大学等从事“大模型”方面研究和应用的高等院校，还有“大模型”的研制和应用企业如科大讯飞人工智能研究院、国网信息通信产业集团、国网电力科学研究院、国能运输技术研究院。作者及其团队在人工智能特别是大模型研究领域具有较高造诣，来自全国各地的权威研究机构，地域分布也具有广泛的代表性，相信他们在大模型方面的研究成果对相关领域研究和应用的从业者有较大参考价值。

虽然由于组织时间仓促及编辑水平有限，该专题的内容对正处于迅猛发展过程和广泛应用范围中的大模型的介绍难免“管中窥豹”，但希望能“抛砖引玉”，提高大家对该项重要技术的重视程度，加强该方面成果的传播，促进大模型的技术创新和应用落地。

Default Latest Most Read

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Survey and prospect of large language models

Xiaolin QIN, Xu GU, Dicheng LI, Haiwen XU

Journal of Computer Applications 2025, 45 (3): 685-696. DOI: 10.11772/j.issn.1001-9081.2025010128

Abstract （584）

HTML （44）

PDF （2035KB）（479）

Save

Large Language Models （LLMs） are a class of language models composed of artificial neural networks with a vast number of parameters （typically billions of weights or more）. They are trained on a large amount of unlabeled text using self-supervised or semi-supervised learning and are the core of current generative Artificial Intelligence （AI） technologies. Compared to traditional language models， LLMs demonstrate stronger language understanding and generation capabilities， supported by substantial computational power， extensive parameters， and large-scale data. They are widely applied in tasks such as machine translation， question answering systems， and dialogue generation with good performance. Most of the existing surveys focus on the theoretical construction and training techniques of LLMs， while systematic exploration of LLMs’ industry-level application practices and evolution of the technological ecosystem remains insufficient. Therefore， based on introducing the foundational architecture， training techniques， and development history of LLMs， the current general key technologies in LLMs and advanced integration technologies with LLMs bases were analyzed. Then， by summarizing the existing research， challenges faced by LLMs in practical applications were further elaborated， including problems such as data bias， model hallucination， and computational resource consumption， and an outlook was provided on the ongoing development trends of LLMs.

Table and Figures | Reference | Related Articles | Metrics

Select

Bias challenges of large language models： identification， evaluation， and mitigation

Yuemei XU, Yuqi YE, Xueyi HE

Journal of Computer Applications 2025, 45 (3): 697-708. DOI: 10.11772/j.issn.1001-9081.2024091350

Abstract （99）

HTML （10）

PDF （2112KB）（70）

Save

Aiming at the unsafety and being out of control problems caused by biases in the output of Large Language Model （LLM）， research status， techniques， and limitations related to biases in the existing LLMs were sorted deeply and analyzed from three aspects： bias identification， evaluation， and mitigation. Firstly， three key techniques of LLM were summed up to study the basic reasons of LLMs’ inevitable intrinsic biases. Secondly， three types of biases in LLMs were categorized into linguistic bias， demographic bias， and evaluation bias， and characteristics and causes of the biases were explored. Thirdly， a systematic review of the existing LLM bias evaluation benchmarks was carried out， and the strengths and weaknesses of these general-purpose， language-specific， and task-specific benchmarks were discussed. Finally， current LLM bias mitigation techniques were analyzed in depth from both model bias mitigation and data bias mitigation perspectives， and directions for their future refinement were pointed out. At the same time， the research directions for biases in LLMs were indicated by analysis： multi-cultural attribute evaluation of bias， lightweight bias mitigation techniques， and enhancement of the interpretability of biases.

Table and Figures | Reference | Related Articles | Metrics

Select

Recognition and optimization of hallucination phenomena in large language models

Jing HE, Yang SHEN, Runfeng XIE

Journal of Computer Applications 2025, 45 (3): 709-714. DOI: 10.11772/j.issn.1001-9081.2024081190

Abstract （102）

HTML （7）

PDF （1539KB）（67）

Save

Focusing on problems that Large Language Models （LLMs） may generate hallucinations and are difficult to be fully applied to various fields of real life， especially medical field， as well as there is no high-quality LLM hallucination evaluation dataset and corresponding LLM hallucination degree evaluation， a method for identifying and optimizing LLM hallucinations in medical question answering field was proposed. Firstly， based on the publicly available dataset Huatuo， an LLM hallucination evaluation dataset in medical question answering field was constructed by combining GPT-4 generated question answers and manual annotation. Secondly， based on the constructed hallucination evaluation dataset， the concept of “hallucination rate” was defined. By designing prompts for the models to be tested answering “yes” or “no”， the degree of hallucination of each LLM was tested and quantified， and the “YES MAN” hallucination phenomenon of LLM was discovered. Thirdly， a low hallucination rate LLM， GPT-4， was used as LeaderAI to provide prior knowledge to assist LLMs with high hallucination rate in making judgments. Finally， to explore whether multiple different LLMs will make mistakes on the same problem， the concept of “hallucination collision” was defined， and based on probability statistical method， the hallucination collision situations of different LLMs in medical question answering field were revealed. Experimental results show that the introduction of LeaderAI can improve the performance of LLMs with high hallucination rate， so that LLMs can handle with the “YES MAN” hallucination phenomenon in medical question answering with low hallucination rate. Moreover， the current LLMs have a low probability of having hallucinations on a single question （collisions）.

Table and Figures | Reference | Related Articles | Metrics

Select

Federated parameter-efficient fine-tuning technology for large model based on pruning

Hui ZENG, Shiyu XIONG, Yongzheng DI, Hongzhou SHI

Journal of Computer Applications 2025, 45 (3): 715-724. DOI: 10.11772/j.issn.1001-9081.2024030322

Abstract （96）

HTML （5）

PDF （2395KB）（316）

Save

With the continues increasing importance of data privacy， fine-tuning Pre-trained Foundational Model （PFM） for downstream tasks has become increasingly challenging， leading to the emergence of federated learning research based on PFM. However， PFM poses significant challenges to federated learning systems， especially in terms of local computation and communication. Therefore， the corresponding solution schemes were proposed for the two main stages of federated learning： local computing and aggregation communication， namely the local efficient fine-tuning mode and the ring-shaped local aggregation mode. In the first mode， a model pruning algorithm based on Parameter-Efficient Fine-Tuning （PEFT） was employed to reduce local computation and communication costs. In the second mode， the centralized aggregation method was replaced with a distributed local aggregation scheme to enhance communication efficiency during the aggregation stage. Experimental results demonstrate that the proposed federated parameter-efficient fine-tuning framework for large model performs well in terms of both final performance and efficiency.

Table and Figures | Reference | Related Articles | Metrics

Select

Efficient fine-tuning method of large language models for test case generation

Peng CAO, Guangqi WEN, Jinzhu YANG, Gang CHEN, Xinyi LIU, Xuechun JI

Journal of Computer Applications 2025, 45 (3): 725-731. DOI: 10.11772/j.issn.1001-9081.2024111598

Abstract （68）

HTML （7）

PDF （1215KB）（28）

Save

Data-driven automated generation technology of unit test cases has problems of low coverage and poor readability， struggling to meet the increasing demand for testing. Recently， Large Language Model （LLM） has shown great potential in code generation tasks. However， due to the differences in functional and coding styles of code data， LLMs face the challenges of catastrophic forgetting and resource constraints. To address these problems， a transfer learning idea was proposed by fine-tuning coding and functional styles simultaneously， and an efficient fine-tuning training method was developed for LLMs in generating unit test cases. Firstly， the widely used instruction datasets were adopted to align LLM with instructions， and the instruction sets were divided by task types. At the same time， the weight increments with task-specific features were extracted and stored. Secondly， an adaptive style extraction module was designed for dealing with various coding styles with noise-resistant learning and coding style backtracking learning in the module. Finally， joint training of the functional and coding style increments was performed respectively on the target domain， thereby realizing efficient adaptation and fine-tuning on the target domains with limited resources. Experimental results of test case generation on SF110 Corpus of Classes dataset indicate that the proposed method outperforms the methods for comparison. Compared to the mainstream code generation LLMs — Codex， Code Llama and DeepSeek-Coder， the proposed method has the compilation rate increased by 0.8%， 43.5% and 33.8%， respectively； the branch coverage increased by 3.1%， 1.0%， and 17.2% respectively； and the line coverage increased by 4.1%， 6.5%， and 15.5% respectively； verifying the superiority of the proposed method in code generation tasks.

Table and Figures | Reference | Related Articles | Metrics

Select

Commonsense question answering model based on cross-modal contrastive learning

Yuanlong WANG, Tinghua LIU, Hu ZHANG

Journal of Computer Applications 2025, 45 (3): 732-738. DOI: 10.11772/j.issn.1001-9081.2024081139

Abstract （71）

HTML （4）

PDF （772KB）（40）

Save

Commonsense Question Answering （CQA） aims to use commonsense knowledge to answer questions described in natural language automatically to obtain accurate answer， and it belongs to intelligent question answering field. Typically， this task demands background commonsense knowledge to enhance the model in problem-solving capability. While most related methods rely on extracting and utilizing commonsense from textual data， however， commonsense is often implicit and not always represented in the text directly， which affects the application range and effectiveness of these methods. Therefore， a cross-modal contrastive learning-based CQA model was proposed to fully utilize cross-modal information for enriching the expression of commonsense knowledge. Firstly， a cross-modal commonsense representation module was designed to integrate the commonsense bases and a cross-modal large model， thereby obtaining a cross-modal commonsense representation. Secondly， in order to enhance the ability of the model to distinguish among different options， contrastive learning was carried out on the cross-modal representations of problems and options. Finally， the softmax layer was used to generate relevance scores for the problem option pairs， and the option with the highest score was taken as the final predicted answer. Experimental results on public datasets CommonSenseQA （CSQA） and OpenBookQA （OBQA） show that compared to DEKCOR （DEscriptive Knowledge for COmmonsense question answeRing）， the proposed model is improved by 1.46 and 0.71 percentage points respectively in accuracy.

Table and Figures | Reference | Related Articles | Metrics

Select

Visual question answering model based on association and fusion of multiple semantic features

Hao ZHOU, Chao WANG, Guoheng CUI, Tingjin LUO

Journal of Computer Applications 2025, 45 (3): 739-745. DOI: 10.11772/j.issn.1001-9081.2024050660

Abstract （62）

HTML （3）

PDF （3044KB）（29）

Save

Bridging the semantic gaps among visual images and text-based questions is the key to improve the reasoning accuracy of Visual Question Answering （VQA） models. However， most the existing related models rely on extracting low-level image features and using attention mechanisms to reason and obtain answers of questions， while ignoring the important role of high-level image semantic features in visual reasoning， such as relationship features and attribute features. In order to solve the above problems， a VQA model based on multi-semantic association and fusion was proposed to establish semantic association among questions and images. Firstly， based on scene graph generation framework， multiple semantic features in images were extracted and refined as the feature input of VQA model to fully explore the information in visual scenes. Secondly， to enhance the semantic value of image features， an information filter was designed to remove noise and redundant information in the image features. Finally， a multi-layer attention fusion and reasoning module was designed to fuse multiple image semantics with question features， respectively， and strengthen the semantic association among the important regions of images and the questions. Experimental results show that compared with Bilinear Attention Network （BAN） and Coarse-to-Fine Reasoning （CFR） models， the proposed model has the accuracy on VQA2.0 test set increased by 2.9 and 0.4 percentage points respectively， and the accuracy on GQA test set increased by 17.2 and 0.3 percentage points respectively， demonstrating that the proposed model can better understand the semantics in image scenes and answer compositional visual questions.

Table and Figures | Reference | Related Articles | Metrics

Select

Multi-strategy retrieval-augmented generation method for military domain knowledge question answering systems

Yanping ZHANG, Meifang CHEN, Changhai TIAN, Zibo YI, Wenpeng HU, Wei LUO, Zhunchen LUO

Journal of Computer Applications 2025, 45 (3): 746-754. DOI: 10.11772/j.issn.1001-9081.2024060833

Abstract （135）

HTML （7）

PDF （1254KB）（82）

Save

The military domain knowledge question answering system based on Retrieval-Augmented Generation （RAG） has become an important tool for modern intelligence personnel to collect and analyze intelligence gradually. Focusing on the issue that the application strategies of RAG methods currently suffer from poor portability in hybrid retrieval as well as the problem of semantic drift caused by unnecessary query rewriting easily， a Multi-Strategy Retrieval-Augmented Generation （MSRAG） method was proposed. Firstly， the retrieval model was matched adaptively to recall relevant text based on query characteristics of the user input. Secondly， a text filter was utilized to extract the key text fragments that can answer the question. Thirdly， the content validity was assessed by the text filter to trigger query rewriting based on synonym expansion， and the initial query was merged with the rewritten information and used as input of the retrieval controller for more targeted re-retrieval. Finally， the key text fragments that can answer the question were merged with the question， prompt engineering input was used to generate answer model， and the response generated by the model was returned to the user. Experimental results show that compared to the convex linear combination RAG method， MSRAG method improves the ROUGE-L （Recall-Oriented Understudy for Gisting Evaluation Longest common subsequence） by 14.35 percentage points on the Military domain dataset （Military） and by 5.83 percentage points on the Medical dataset. It can be seen that MSRAG method has strong universality and portability， enables the reduction of the semantic drift caused by unnecessary query rewriting， and effectively helps large language models generate more accurate answers.

Table and Figures | Reference | Related Articles | Metrics

Select

ScholatGPT： a large language model for academic social networks and its intelligent applications

Chengzhe YUAN, Guohua CHEN, Dingding LI, Yuan ZHU, Ronghua LIN, Hao ZHONG, Yong TANG

Journal of Computer Applications 2025, 45 (3): 755-764. DOI: 10.11772/j.issn.1001-9081.2024101477

Abstract （182）

HTML （16）

PDF （2602KB）（76）

Save

To address the limitations of the existing Large Language Models （LLMs） in processing cross-domain knowledge， updating real-time academic information， and ensuring output quality， ScholatGPT， a scholar LLM based on Academic Social Networks （ASNs）， was proposed. In ScholatGPT， the abilities of precise semantic retrieval and dynamic knowledge update were enhanced by integrating Knowledge-Graph Augmented Generation （KGAG） and Retrieval-Augmented Generation （RAG）， and optimization and fine-tuning were used to improve the generation quality of academic text. Firstly， a scholar knowledge graph was constructed based on relational data from SCHOLAT， with LLMs employed to enrich the graph semantically. Then， a KGAG-based retrieval model was introduced， combined with RAG to realize multi-path hybrid retrieval， thereby enhancing the model’s precision in search. Finally， fine-tuning techniques were applied to optimize the model’s generation quality in academic fields. Experimental results demonstrate that ScholatGPT achieves the precision of 83.2% in academic question answering tasks， outperforming GPT-4o and AMiner AI by 69.4 and 11.5 percentage points， and performs well in all the tasks such as scholar profiling， representative work identification， and research field classification. Furthermore， ScholatGPT obtains stable and competitive results in answer relevance， coherence， and readability， achieving a good balance between specialization and readability. Additionally， ScholatGPT-based intelligent applications such as scholar think tank and academic information recommendation system improve academic resource acquisition efficiency effectively.

Table and Figures | Reference | Related Articles | Metrics

Select

Design and practice of intelligent tutoring algorithm based on personalized student capability perception

Yanmin DONG, Jiajia LIN, Zheng ZHANG, Cheng CHENG, Jinze WU, Shijin WANG, Zhenya HUANG, Qi LIU, Enhong CHEN

Journal of Computer Applications 2025, 45 (3): 765-772. DOI: 10.11772/j.issn.1001-9081.2024101550

Abstract （72）

HTML （3）

PDF （2239KB）（24）

Save

With the rapid development of Large Language Models （LLMs）， dialogue assistants based on LLM have emerged as a new learning method for students. These assistants generate answers through interactive Q&A， helping students solve problems and improve learning efficiency. However， the existing conversational assistants ignore students’ personalized needs， failing to provide personalized answers for “tailored instruction”. To address this， a personalized conversational assistant framework based on student capability perception was proposed， which is consisted of two main modules： a capability perception module that analyzes students’ exercise records to explore the knowledge proficiency of the students， and a personalized answer generation module that creates personalized answers based on the capabilities of the students. Three implementation paradigms — instruction-based， data-driven， and agent-based ones were designed to explore the framework’s practical effects. In the instruction-based assistant， the inference capabilities of LLMs were used to explore knowledge proficiency of the students from students’ exercise records to help generate personalized answers； in the small model-driven assistant， a Deep Knowledge Tracing （DKT） model was employed to generate students’ knowledge proficiency； in the agent-based assistant， tools such as student capability perception， personalized detection， and answer correction were integrated using LLM agent method for assistance of answer generation. Comparison experiments using Chat General Language Model （ChatGLM） and GPT4o_mini demonstrate that LLMs applying all three paradigms can provide personalized answers for students， the accuracy of the agent-based paradigm is higher， indicating the superior student capability perception and personalized answer generation of this paradigm.

Table and Figures | Reference | Related Articles | Metrics

Select

Personalized learning recommendation in collaboration of knowledge graph and large language model

Xuefei ZHANG, Liping ZHANG, Sheng YAN, Min HOU, Yubo ZHAO

Journal of Computer Applications 2025, 45 (3): 773-784. DOI: 10.11772/j.issn.1001-9081.2024070971

Abstract （98）

HTML （6）

PDF （1570KB）（53）

Save

As an important research topic in the field of smart education， personalized learning recommendation has a core goal of using recommendation algorithms and models to provide learners with effective learning resources that match their individual learning needs， interests， abilities， and histories， so as to improve learners’ learning effects. Current recommendation methods have problems such as cold start， data sparsity， poor interpretability， and over-personalization， and the combination of knowledge graph and Large Language Model （LLM） provides strong support to solve the above problems. Firstly， the contents such as concepts and current research status of personalized learning recommendation were overviewed. Secondly， the concepts of knowledge graph and LLM and their specific applications in personalized learning recommendation were discussed respectively. Thirdly， the collaborative application methods of knowledge graph and LLM in personalized learning recommendation were summarized. Finally， the future development directions of knowledge graph and LLM in personalized learning recommendation were prospected to provide reference and inspiration for continuous development and innovative practice in the field of personalized learning recommendation.

Table and Figures | Reference | Related Articles | Metrics

Select

Construction of digital twin water conservancy knowledge graph integrating large language model and prompt learning

Yan YANG, Feng YE, Dong XU, Xuejie ZHANG, Jin XU

Journal of Computer Applications 2025, 45 (3): 785-793. DOI: 10.11772/j.issn.1001-9081.2024050570

Abstract （71）

HTML （6）

PDF （2950KB）（36）

Save

Constructing digital twin water conservancy construction knowledge graph to mine the potential relationships between water conservancy construction objects can help the relevant personnel to optimize the water conservancy construction design scheme and decision-making process. Aiming at the interdisciplinary and complex knowledge structure of digital twin water conservancy construction， and the problems such as insufficient learning and low extraction accuracy of knowledge of general knowledge extraction models in water conservancy domain， a Digital Twin water conservancy construction Knowledge Extraction method based on Large Language Model （DTKE-LLM） was proposed to improve the accuracy of knowledge extraction. In this method， by deploying local Large Language Model （LLM） through LangChain and integrating digital twin water conservancy domain knowledge， prompt learning was used to fine-tune the LLM. In the LLM， semantic understanding and generation capabilities were utilized to extract knowledge. At the same time， a heterogeneous entity alignment strategy was designed to optimize the entity extraction results. Comparison experiments and ablation experiments were carried out on the water conservancy domain corpus to verify the effectiveness of DTKE-LLM. Results of the comparison experiments demonstrate that DTKE-LLM outperforms the deep learning-based BiLSTM-CRF （Bidirectional Long Short-Term Memory Conditional Random Field） named entity recognition model and the general Information extraction model UIE （Universal Information Extraction） in precision. Results of the ablation experiments show that compared with the ChatGLM2-6B （Chat Generative Language Model 2.6 Billion）， DTKE-LLM has the F1 scores of entity extraction and relation extraction improved by 5.5 and 3.2 percentage points respectively. It can be seen that the proposed method realizes the construction of digital twin water conservancy construction knowledge graph on the basis of ensuring the quality of knowledge graph construction.

Table and Figures | Reference | Related Articles | Metrics

Select

Synaesthesia metaphor analysis based on large language model and data augmentation

Kun SHENG, Zhongqing WANG

Journal of Computer Applications 2025, 45 (3): 794-800. DOI: 10.11772/j.issn.1001-9081.2024091251

Abstract （75）

HTML （1）

PDF （1164KB）（27）

Save

Task of Chinese synaesthesia metaphor analysis is a specific subtask in metaphor domain. The uneven distribution of sensory words in synaesthesia corpora leads to data sparsity in the Chinese synaesthesia metaphor datasets. To address this issue， sparse sensory word data from real training data were used as prompts， and additional synthetic samples were generated by large language model for data augmentation. To avoid additional noise caused by introduced synthetic data from affecting model performance， a data augmentation framework based on large language model was constructed. Besides， a scoring mechanism and a label error optimization mechanism were applied to reduce the distribution differences between synthetic and real data. Experimental results show that the proposed framework can generate high-quality synthetic data to expand the dataset， and achieves an overall F1 value of 68.5% in sensory word extraction and sensory domain classification tasks， which is 2.7 percentage point improved compared to the baseline model T5 （Text-To-Text Transfer Transformer） trained only on real training data.

Table and Figures | Reference | Related Articles | Metrics

Select

Large language model prompt generation method for engineering drawing understanding

Chenwei SUN, Junli HOU, Xianggen LIU, Jiancheng LYU

Journal of Computer Applications 2025, 45 (3): 801-807. DOI: 10.11772/j.issn.1001-9081.2024101537

Abstract （65）

HTML （1）

PDF （1540KB）（44）

Save

In recent years， Large Language Models （LLMs） have demonstrated excellent language understanding and dialogue capabilities in fields such as natural language processing and computer vision. However， they can produce inference results that are inconsistent with the correct answers in professional fields. This situation brings significant challenges to the application of LLMs in precise and accurate decision-making tasks. To solve this problem， a rule-guided Post Prompt of Large Language Model （PP-LLM） generation method was proposed. In this method， by generating post prompts， the original problem was transformed into two sub-problems that are easier to solve， thereby achieving the purposes of introducing expert knowledge and reducing the difficulty of task learning. Specifically， the knowledge-guided specific rules were used to transform the output part of the supervised dataset into a combination of post prompts and the output portion. PP-LLM method does not change the training and inference processes of the model， and does not add computational cost. Experimental results show that PP-LLM method significantly improves the accuracy of inference results and narrows the gap between model predictions and actual answers. Compared with the results without using the proposed method， the F1 value and Recall-Oriented Understudy for Gisting Evaluation （ROUGE） of the PP-LLM method have significantly improved. It can be seen that the above work improves the reliability of LLMs in professional applications and provides new ideas for LLM generation technology.

Table and Figures | Reference | Related Articles | Metrics

Select

Text-based person retrieval method based on multi-granularity shared semantic center association

Bin KANG, Bin CHEN, Junjie WANG, Yulin LI, Junzhi ZHAO, Weizhi XIAN

Journal of Computer Applications 2025, 45 (3): 808-814. DOI: 10.11772/j.issn.1001-9081.2024101434

Abstract （47）

HTML （1）

PDF （1617KB）（27）

Save

Text-based person retrieval aims to identify specific person using textual descriptions as queries. The existing state-of-the-art methods typically design multiple alignment mechanisms to achieve correspondence among cross-modal data at both global and local levels， but they neglect the mutual influence among these mechanisms. To address this， a multi-granularity shared semantic center association mechanism was proposed to explore the promoting and inhibiting effects between global and local alignments. Firstly， a multi-granularity cross-alignment module was introduced to enhance interactions of image-sentence and local region-word， achieving multi-level alignment of the cross-modal data in a joint embedding space. Then， a shared semantic center was established and served as a learnable semantic hub， and associations among global and local features were used to enhance semantic consistency among different alignment mechanisms and promote the collaborative effect of global and local features. In the shared semantic center， the local and global cross-modal similarity relationships among image and text features were calculated， providing a complementary measure from both global and local perspectives and maximizing positive effects among multiple alignment mechanisms. Finally， experiments were carried out on CUHK-PEDES dataset. Results show that the proposed method improves the Rank-1 by 8.69 percentage points and the mean Average Precision （mAP） by 6.85 percentage points compared to the baseline method significantly. The proposed method also achieves excellent performance on ICFG-PEDES and RSTPReid datasets， significantly surpassing all the compared methods.

Table and Figures | Reference | Related Articles | Metrics

Select

Speaker-emotion voice conversion method with limited corpus based on large language model and pre-trained model

Chaofeng LU, Ye TAO, Lianqing WEN, Fei MENG, Xiugong QIN, Yongjie DU, Yunlong TIAN

Journal of Computer Applications 2025, 45 (3): 815-822. DOI: 10.11772/j.issn.1001-9081.2024010013

Abstract （177）

HTML （2）

PDF （1966KB）（365）

Save

Aiming at the problems that few people have combined research on speaker conversion and emotional voice conversion， and the emotional corpora of a target speaker in actual scenes are usually small， which are not enough to train strong generalization models from scratch， a Speaker-Emotion Voice Conversion with Limited corpus （LSEVC） was proposed with fusion of large language model and pre-trained emotional speech synthesis model. Firstly， a large language model was used to generate text with required emotion tags. Secondly， a pre-trained emotional speech synthesis model was fine-tuned by using the target speaker corpus to embed into the target speaker. Thirdly， the emotional speech was synthesized from the generated text for data augmentation. Fourthly， the synthesized speech and source target speech were used to co-train speaker-emotion voice conversion model. Finally， to further enhance speaker similarity and emotional similarity of converted speech， the model was fine-tuned by using source target speaker’s emotional speech. Experiments were conducted on publicly available corpora and a Chinese fiction corpus. Experimental results show that the proposed method outperforms CycleGAN-EVC， Seq2Seq-EVC-WA2， SMAL-ET2 and other methods when considering evaluation indicators — Emotional similarity Mean Opinion Score （EMOS）， Speaker similarity Mean Opinion Score （SMOS）， Mel Cepstral Distortion （MCD）， and Word Error Rate （WER）.

Table and Figures | Reference | Related Articles | Metrics

Select

Vision foundation model-driven pixel-level image anomaly detection method

Zhenhua XUE, Qiang LI, Chao HUANG

Journal of Computer Applications 2025, 45 (3): 823-831. DOI: 10.11772/j.issn.1001-9081.2024091398

Abstract （35）

HTML （6）

PDF （3364KB）（27）

Save

While previous anomaly detection methods have achieved high-precision detection in specific scenarios， but their applicability is constrained by their lack of generalizability and automation. Thus， a Vision Foundation Model （VFM）-driven pixel-level image anomaly detection method， namely SSMOD-Net （State Space Model driven-Omni Dimensional Net）， was proposed with the aim of achieving more accurate industrial defect detection. Unlike the existing methods， SSMOD-Net achieved automated prompting of SAM （Segment Anything Model） without the need for fine-tuning SAM， making it particularly suitable for scenarios that require processing large-scale industrial visual data. The core of SSMOD-Net is a novel prompt encoder driven by a state space model， which was able to generate prompts dynamically based on the input image of SAM. With this design， the model was allowed to introduce additional guidance information through the prompt encoder while preserving SAM’s architecture， thereby enhancing detection accuracy. A residual multi-scale module was integrated in the prompt encoder， and this module was constructed based on the state space model and was able to use multi-scale and global information comprehensively. Through iterative search， the module found optimal prompts in the prompt space and provided the prompts to SAM as high-dimensional tensors， thereby strengthening the model’s ability to recognize industrial anomalies. Moreover， the proposed method did not require any modifications to SAM， thereby avoiding the need for complex fine-tuning of the training schedules. Experimental results on several datasets show that the proposed method has excellent performance， and achieves better results in mE （mean E-measure） and Mean Absolute Error （MAE）， Dice， and Intersection over Union （IoU） compared to methods such as AutoSAM and SAM-EG （SAM with Edge Guidance framework for efficient polyp segmentation）.

Table and Figures | Reference | Related Articles | Metrics

Select

Privacy preserving localization of surveillance images based on large vision models

Qiang LI, Shaoxiong BAI, Yuan XIONG, Wei YUAN

Journal of Computer Applications 2025, 45 (3): 832-839. DOI: 10.11772/j.issn.1001-9081.2024101538

Abstract （55）

HTML （3）

PDF （3015KB）（29）

Save

Visual localization of surveillance images is an important technology in industrial intelligence. The existing visual localization algorithms lack the protection of the privacy information in the image and may lead to the leakage of sensitive content during data transmission. To address the problem， a localization method of surveillance images based on Large Vision Models （LVMs） was proposed. Firstly， the architecture of LVM privacy preserving-based visual localization was designed to transfer the style of input images by using a few prompts and reference images. Then， a feature matching algorithm for the image with style transfer was designed to estimate the camera pose. Experimental results on public datasets show that the localization error of the proposed algorithm is relatively small， demonstrating that the algorithm reduces the privacy leakage significantly while ensuring the localization accuracy.

Table and Figures | Reference | Related Articles | Metrics

Select

Crop disease recognition method based on multi-modal data fusion

Wei CHEN, Changyong SHI, Chuanxiang MA

Journal of Computer Applications 2025, 45 (3): 840-848. DOI: 10.11772/j.issn.1001-9081.2024091297

Abstract （72）

HTML （3）

PDF （2997KB）（48）

Save

Current deep learning-based methods for crop disease recognition rely on specific image datasets of crop diseases for image representation learning， and do not consider the importance of text features in assisting image feature learning. To enhance feature extraction and disease recognition capabilities of the model for crop disease images more effectively， a Crop Disease Recognition method through multi-modal data fusion based on Contrastive Language-Image Pre-training （CDR-CLIP） was proposed. Firstly， high-quality disease recognition image-text pair datasets were constructed to enhance image feature representation through textual information. Then， a multi-modal fusion strategy was applied to integrate text and image features effectively， which strengthened the model capability of distinguishing diseases. Finally， specialized pre-training and fine-tuning strategies were designed to optimize the model’s performance in specific crop disease recognition tasks. Experimental results demonstrate that CDR-CLIP achieves the disease recognition accuracies of 99.31% and 87.66% with F1 values of 99.04% and 87.56%， respectively， on PlantVillage and AI Challenger 2018 crop disease datasets. On PlantDoc dataset， CDR-CLIP achieves the mean Average Precision mAP@0.5 of 51.10%， showing the strong performance advantage of CDR-CLIP.

Table and Figures | Reference | Related Articles | Metrics

Select

Chinese spelling correction method based on LLM with multiple inputs

Can MA, Ruizhang HUANG, Lina REN, Ruina BAI, Yaoyao WU

Journal of Computer Applications 2025, 45 (3): 849-855. DOI: 10.11772/j.issn.1001-9081.2024091325

Abstract （59）

HTML （3）

PDF （946KB）（17）

Save

Chinese Spelling Correction （CSC） is an important research task in Natural Language Processing （NLP）. The existing CSC methods based on Large Language Models （LLMs） may generate semantic discrepancies between the corrected results and the original content. Therefore， a CSC method based on LLM with multiple inputs was proposed. The method consists of two stages： multi-input candidate set construction and LLM correction. In the first stage， a multi-input candidate set was constructed using error correction results of several small models. In the second stage， LoRA （Low-Rank Adaptation） was employed to fine-tune the LLM， which means that with the aid of reasoning capabilities of the LLM， sentences without spelling errors were deduced from the multi-input candidate set and used as the final error correction results. Experimental results on the public datasets SIGHAN13， SIGHAN14， SIGHAN15 and revised SIGHAN15 show that the proposed method has the correction F1 value improved by 9.6， 24.9， 27.9， and 34.2 percentage points， respectively， compared to the method Prompt-GEN-1， which generates error correction results directly using an LLM. Compared with the sub-optimal error correction small model， the proposed method has the correction F1 value improved by 1.0， 1.1， 0.4， and 2.4 percentage points， respectively， verifying the proposed method’s ability to enhance the effect of CSC tasks.

Table and Figures | Reference | Related Articles | Metrics

2025 Vol.45 No.4

Current Issue
Archive

Superintended by:
Sichuan Associations for Science and Technology
Sponsored by:
Sichuan Computer Federation
Chengdu Branch, Chinese Academy of Sciences

Honorary Editor-in-Chief: ZHANG Jingzhong
Editor-in-Chief: XU Zongben
Associate Editor: SHEN Hengtao XIA Zhaohui

Domestic Post Distribution Code: 62-110
Foreign Distribution Code: M4616

Address:
No. 9, 4th Section of South Renmin Road, Chengdu 610041, China
Tel: 028-85224283-803
　　028-85222239-803

Website: www.joca.cn
E-mail: bjb@joca.cn

WeChat

Join CCF