Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (8): 2442-2447.DOI: 10.11772/j.issn.1001-9081.2024081083

• The 21th CCF Conference on Web Information Systems and Applications (WISA 2024) • Previous Articles    

Dual-stage prompt tuning method for automated preference alignment

Tao FENG1,2, Chen LIU1,2()   

  1. 1.School of Information Science and Technology,North China University of Technology,Beijing 100144,China
    2.Beijing Key Laboratory of Large-scale Streaming Data Integration and Analysis Technology,North China University of Technology,Beijing 100144,China
  • Received:2024-08-02 Revised:2024-09-02 Accepted:2024-09-05 Online:2024-09-12 Published:2025-08-10
  • Contact: Chen LIU
  • About author:FENG Tao, born in 2000, M. S. candidate. His research interests include distributed system and cloud computing.
  • Supported by:
    National Natural Science Foundation of China(62061136006);Guangzhou Science and Technology Plan — Key Research and Development Program(202206030009)

自动化偏好对齐的双阶段提示调优方法

冯涛1,2, 刘晨1,2()   

  1. 1.北方工业大学 信息学院,北京 100144
    2.大规模流数据集成与分析技术北京市重点实验室(北方工业大学),北京 100144
  • 通讯作者: 刘晨
  • 作者简介:冯涛(2000—),男,山西大同人,硕士研究生,主要研究方向:分布式系统与云计算
  • 基金资助:
    国家自然科学基金资助项目(62061136006);广州市科技计划-重点研发计划项目(202206030009)

Abstract:

Because user prompts often lack professionalism in specific fields and the use of terminology, it is difficult for Large Language Models (LLM) to understand the intentions accurately and generate information that meets requirements of the field. Therefore, an Automated Preference Alignment Dual-Stage Prompt Tuning (APADPT) method was proposed to solve the preference alignment problem faced by LLM when applied in vertical fields. In APADPT, the refinement adjustments of input prompts were realized by constructing a supervised fine-tuning dataset containing human preferences and using LLM for semantic analysis and evaluation of pairwise replies. After dual-stage training, the prompt optimization rules in the general field were mastered by the model, and specialized adjustments based on characteristics of the vertical fields were performed by the model. Experimental results in the medical field show that APADPT improves the preference alignment consistency of API-based LLM and open-source LLM significantly, with the average winning rate increased by 9.5% to 20.5% under the condition of the same model parameter count. In addition, this method shows good robustness and generalization ability on all the open-source models with different parameter scales, providing a new optimization strategy for the application of LLM in vertical specialized fields, and contributing to improving model performance while maintaining generalization and adaptability of the model.

Key words: Large Language Model (LLM), vertical field optimization, preference alignment, prompt optimization

摘要:

用户提示通常缺乏特定领域的专业性和术语的使用,这导致大型语言模型(LLM)难以准确理解意图并生成符合领域要求的信息。因此,提出一种自动化偏好对齐的双阶段提示调优方法(APADPT),解决LLM在垂直领域应用时面临的偏好对齐问题。APADPT通过构建包含人类偏好的监督微调数据集,并利用LLM进行成对回复的语义分析和评估,实现对输入提示的精细化调整。经过双阶段训练,模型不仅能掌握通用领域的提示优化规律,还可以针对垂直领域特性进行专业化调整。在医疗领域的实验结果表明,APADPT显著提升了基于API的LLM与开源LLM的偏好对齐一致性,在相同模型参数量条件下,平均胜率提高了9.5%~20.5%。此外,所提方法在不同参数规模的开源模型上均展现出了良好的鲁棒性和泛化能力,为LLM在垂直专业化领域中的应用提供了新的优化策略,并有助于在保持模型的泛化性和适应性的前提下提高模型的性能。

关键词: 大语言模型, 垂直领域优化, 偏好对齐, 提示优化

CLC Number: