Dual-stage prompt tuning method for automated preference alignment

doi:10.11772/j.issn.1001-9081.2024081083

Abstract

Abstract:

Because user prompts often lack professionalism in specific fields and the use of terminology， it is difficult for Large Language Models （LLM） to understand the intentions accurately and generate information that meets requirements of the field. Therefore， an Automated Preference Alignment Dual-Stage Prompt Tuning （APADPT） method was proposed to solve the preference alignment problem faced by LLM when applied in vertical fields. In APADPT， the refinement adjustments of input prompts were realized by constructing a supervised fine-tuning dataset containing human preferences and using LLM for semantic analysis and evaluation of pairwise replies. After dual-stage training， the prompt optimization rules in the general field were mastered by the model， and specialized adjustments based on characteristics of the vertical fields were performed by the model. Experimental results in the medical field show that APADPT improves the preference alignment consistency of API-based LLM and open-source LLM significantly， with the average winning rate increased by 9.5% to 20.5% under the condition of the same model parameter count. In addition， this method shows good robustness and generalization ability on all the open-source models with different parameter scales， providing a new optimization strategy for the application of LLM in vertical specialized fields， and contributing to improving model performance while maintaining generalization and adaptability of the model.

Key words: Large Language Model (LLM), vertical field optimization, preference alignment, prompt optimization

摘要：

用户提示通常缺乏特定领域的专业性和术语的使用，这导致大型语言模型（LLM）难以准确理解意图并生成符合领域要求的信息。因此，提出一种自动化偏好对齐的双阶段提示调优方法（APADPT），解决LLM在垂直领域应用时面临的偏好对齐问题。APADPT通过构建包含人类偏好的监督微调数据集，并利用LLM进行成对回复的语义分析和评估，实现对输入提示的精细化调整。经过双阶段训练，模型不仅能掌握通用领域的提示优化规律，还可以针对垂直领域特性进行专业化调整。在医疗领域的实验结果表明，APADPT显著提升了基于API的LLM与开源LLM的偏好对齐一致性，在相同模型参数量条件下，平均胜率提高了9.5%~20.5%。此外，所提方法在不同参数规模的开源模型上均展现出了良好的鲁棒性和泛化能力，为LLM在垂直专业化领域中的应用提供了新的优化策略，并有助于在保持模型的泛化性和适应性的前提下提高模型的性能。

关键词: 大语言模型, 垂直领域优化, 偏好对齐, 提示优化

CLC Number:

TP399

Tao FENG, Chen LIU. Dual-stage prompt tuning method for automated preference alignment[J]. Journal of Computer Applications, 2025, 45(8): 2442-2447.

冯涛, 刘晨. 自动化偏好对齐的双阶段提示调优方法[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2442-2447.

Figures/Tables 8

References 22

[1]	陈浩泷，陈罕之，韩凯峰，等. 垂直领域大模型的定制化：理论基础与关键技术［J］. 数据采集与处理， 2024， 39（3）： 524-546.
	CHEN H L， CHEN H Z， HAN K F， et al. Customization of vertical domain large model： theoretical basis and key technology［J］. Journal of Data Acquisition and Processing， 2024， 39（3）： 524-546.
[2]	郭华源，刘盼，卢若谷，等. 人工智能大模型医学应用研究［J］. 中国科学：生命科学， 2024， 54（3）：482-506.
	GUO H Y， LIU P， LU R G， et al. Research on a massively large artificial intelligence model and its application in medicine［J］. SCIENTIA SINICA Vitae， 2024， 54（3）：482-506.
[3]	OUYANG L， WU J， JIANG X， et al. Training language models to follow instructions with human feedback［C］// Proceedings of the 36th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2022： 27730-27744.
[4]	BAI Y， KADAVATH S， KUNDU S， et al. Constitutional AI： harmlessness from AI feedback［EB/OL］. ［2024-08-15］..
[5]	LEE H， PHATALE S， MANSOOR H， et al. RLAIF： scaling reinforcement learning from human feedback with AI feedback［EB/OL］. ［2024-08-15］..
[6]	RAFAILOV R， SHARMA A， MITCHELL E， et al. Direct preference optimization： your language model is secretly a reward model［C］// Proceedings of the 37th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2023： 53728-53741.
[7]	CHENG J， LIU X， ZHENG K， et al. Black-box prompt optimization： aligning large language models without model training［C］// Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2024： 3201-3219.
[8]	WANG Y， KORDI Y， MISHRA S， et al. Self-Instruct： aligning language models with self-generated instructions［C］// Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2023： 13484-13508.
[9]	SUN Z， SHEN Y， ZHOU Q， et al. Principle-driven self-alignment of language models from scratch with minimal human supervision［C］// Proceedings of the 37th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2023： 2511-2565.
[10]	SHIN T， RAZEGHI Y， LOGAN R L， Ⅳ， et al. AutoPrompt： eliciting knowledge from language models with automatically generated prompts［C］// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2020： 4222-4235.
[11]	ZHOU Y， MURESANU A I， HAN Z， et al. Large language models are human-level prompt engineers［EB/OL］. ［2024-07-21］..
[12]	YANG C， WANG X， LU Y， et al. Large language models as optimizers［EB/OL］. ［2024-07-21］..
[13]	SUZGUN M， SCALES N， SCHÄRLI N， et al. Challenging big-bench tasks and whether chain-of-thought can solve them［C］// Findings of the Association for Computational Linguistics： ACL 2023. Stroudsburg： ACL， 2023： 13003-13051.
[14]	LIU X， ZHENG Y， DU Z， et al. GPT understands， too［J］. AI Open， 2024， 5： 208-215.
[15]	LESTER B， AL-RFOU R， CONSTANT N. The power of scale for parameter-efficient prompt tuning［C］// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2021： 3045-3059.
[16]	LI X L， LIANG P. Prefix-tuning： optimizing continuous prompts for generation［C］// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing （Volume 1： Long Papers）. Stroudsburg： ACL， 2021： 4582-4597.
[17]	WANG Y， YU Z， ZENG Z， et al. PandaLM： an automatic evaluation benchmark for LLM instruction tuning optimization［EB/OL］. ［2024-07-21］..
[18]	ZHENG L， CHIANG W L， SHENG Y， et al. Judging LLM-as-a-judge with MT-bench and Chatbot Arena［C］// Proceedings of the 37th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2023： 46595-46623.
[19]	CHEN X， LIANG C， HUANG D， et al. Symbolic discovery of optimization algorithms［C］// Proceedings of the 37th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2024： 49205-49233.
[20]	WOLF T， DEBUT L， SANH V， et al. Transformers： state-of-the-art natural language processing［C］// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing： System Demonstrations. Stroudsburg： ACL， 2020： 38-45.
[21]	HU E J， SHEN Y， WALLIS P， et al. LoRA： low-rank adaptation of large language models［EB/OL］. ［2024-07-21］..
[22]	REN Q， LI K， YANG D， et al. TCM function multi-classification approach using deep learning models［C］// Proceedings of the 2023 International Conference on Web Information Systems and Applications， LNCS 14094. Singapore： Springer， 2023： 246-258.

基础模型	方法		APADPT-test 验证			MMTD验证			ΔWR/%
基础模型	A	B	A_WIN	TIE	B_WIN	A_WIN	TIE	B_WIN	ΔWR/%
ChatGPT-3.5-turbo	APADPT	ORI.	55.0	11.0	34.0	50.0	13.0	37.0	+17.0
ChatGPT-4	APADPT	ORI.	43.0	21.0	36.0	49.0	18.0	33.0	+11.5
ERNIE-Bot 4.0	APADPT	ORI.	47.0	14.0	39.0	41.0	29.0	30.0	+9.5
ChatGLM4	APADPT	ORI.	52.0	10.0	38.0	46.0	19.0	35.0	+12.5
Qwen2.1	APADPT	ORI.	52.0	5.0	43.0	55.0	3.0	42.0	+11.0

基础模型	方法		APADPT-test 验证			MMTD验证			ΔWR/%
基础模型	A	B	A_WIN	TIE	B_WIN	A_WIN	TIE	B_WIN	ΔWR/%
ChatGPT-3.5-turbo	APADPT	ORI.	55.0	11.0	34.0	50.0	13.0	37.0	+17.0
ChatGPT-4	APADPT	ORI.	43.0	21.0	36.0	49.0	18.0	33.0	+11.5
ERNIE-Bot 4.0	APADPT	ORI.	47.0	14.0	39.0	41.0	29.0	30.0	+9.5
ChatGLM4	APADPT	ORI.	52.0	10.0	38.0	46.0	19.0	35.0	+12.5
Qwen2.1	APADPT	ORI.	52.0	5.0	43.0	55.0	3.0	42.0	+11.0

基础模型	方法		APADPT-test验证			MMTD验证			ΔWR/%
基础模型	A	B	A_WIN	TIE	B_WIN	A_WIN	TIE	B_WIN	ΔWR/%
Qwen1.5-chat	7B+APADPT	7B	56.0	13.0	31.0	52.0	12.0	36.0	+20.5
	14B+APADPT	14B	53.0	15.0	32.0	50.0	13.0	37.0	+17.0
	7B+APADPT	72B	46.0	6.0	48.0	40.0	5.0	55.0	-8.5
	14B+APADPT	72B	50.0	9.0	41.0	54.0	7.0	39.0	+12.0
	72B+APADPT	72B	51.0	7.0	42.0	53.0	11.0	36.0	+13.0
	72B+APADPT	Qwen2.1	44.0	9.0	47.0	45.0	2.0	53.0	-5.5

基础模型	方法		APADPT-test验证			MMTD验证			ΔWR/%
基础模型	A	B	A_WIN	TIE	B_WIN	A_WIN	TIE	B_WIN	ΔWR/%
Qwen1.5-chat	7B+APADPT	7B	56.0	13.0	31.0	52.0	12.0	36.0	+20.5
	14B+APADPT	14B	53.0	15.0	32.0	50.0	13.0	37.0	+17.0
	7B+APADPT	72B	46.0	6.0	48.0	40.0	5.0	55.0	-8.5
	14B+APADPT	72B	50.0	9.0	41.0	54.0	7.0	39.0	+12.0
	72B+APADPT	72B	51.0	7.0	42.0	53.0	11.0	36.0	+13.0
	72B+APADPT	Qwen2.1	44.0	9.0	47.0	45.0	2.0	53.0	-5.5

基础模型	方法		APADPT-test 验证			MMTD验证			ΔWR/%
基础模型	A	B	A_WIN	TIE	B_WIN	A_WIN	TIE	B_WIN	ΔWR/%
ChatGPT-4	APADPT-1	ORI.	46.0	14.0	40.0	49.0	11.0	40.0	+7.5
	APADPT-2	ORI.	50.0	8.0	42.0	47.0	10.0	43.0	+6.0
	APADPT	ORI.	43.0	21.0	36.0	49.0	18.0	33.0	+11.5