面向工程图纸理解的大语言模型提示生成方法

doi:10.11772/j.issn.1001-9081.2024101537

《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (3): 801-807.DOI: 10.11772/j.issn.1001-9081.2024101537

• 大模型前沿研究与典型应用 • 上一篇下一篇

面向工程图纸理解的大语言模型提示生成方法

孙晨伟¹, 侯俊利², 刘祥根¹, 吕建成¹()

^1.四川大学计算机学院，成都 610065
^2.西南电子设备研究所，成都 610036

收稿日期:2024-10-15 修回日期:2024-12-20 接受日期:2024-12-26 发布日期:2025-02-07 出版日期:2025-03-10
通讯作者: 吕建成
作者简介:孙晨伟（2000—），男，山东济南人，硕士研究生，主要研究方向：自然语言处理、人工智能
侯俊利（1970—），男，河南鹤壁人，高级工程师，博士，主要研究方向：人工智能
刘祥根（1993—），男，四川成都人，副教授，博士，主要研究方向：自然语言处理
基金资助:
国家自然科学基金资助项目(62206192);国家重点研发计划项目(2024YFB3312503);四川省重大专项(2024ZDZX0003)

Large language model prompt generation method for engineering drawing understanding

Chenwei SUN¹, Junli HOU², Xianggen LIU¹, Jiancheng LYU¹()

^1.College of Computer Science，Sichuan University，Chengdu Sichuan 610065，China
^2.Southwest Institute of Electronic Equipment，Chengdu Sichuan 610036，China

Received:2024-10-15 Revised:2024-12-20 Accepted:2024-12-26 Online:2025-02-07 Published:2025-03-10
Contact: Jiancheng LYU
About author:SUN Chenwei， born in 2000， M. S. candidate. His research interests include natural language processing， artificial intelligence.
HOU Junli， born in 1970， Ph. D.， senior engineer. His research interests include artificial intelligence.
LIU Xianggen， born in 1993， Ph. D.， associate professor. His research interests include natural language processing.
Supported by:
National Natural Science Foundation of China(62206192);National Key Research and Development Program of China(2024YFB3312503);Major Science and Technology Project of Sichuan Province(2024ZDZX0003)

摘要/Abstract

摘要：

近年来，大语言模型（LLM）在自然语言处理、计算机视觉等领域都展示出卓越的语言理解和对话能力。然而，它们常常会在专业领域中产生与正确答案不相符的推理结果。这为LLM在精确和准确的决策任务中的应用带来了重大挑战。为了解决这个问题，提出一种规则指导的后提示词大模型（PP-LLM）生成方法。该方法通过生成后提示词可以将原问题转化为2个更容易解决的子问题，从而引入专家知识、降低任务学习难度。具体来说，使用知识指导的特定规则将监督数据集的输出部分转化为后提示词与输出部分的组合。PP-LLM方法不改变模型的训练和推理过程，并且不增加计算量。实验结果表明，PP-LLM方法显著提高了推理结果的准确性，缩小了模型预测与实际答案之间的差距，与不使用所提方法的结果相比，F1值、ROUGE（Recall-Oriented Understudy for Gisting Evaluation）等都有显著提高。可见，以上工作提高了LLM在专业应用上的可靠性，并为LLM生成技术提供了新的思路。

关键词: 工程图纸, 大语言模型, 数据增强, 多模态, 提示词

Abstract:

In recent years， Large Language Models （LLMs） have demonstrated excellent language understanding and dialogue capabilities in fields such as natural language processing and computer vision. However， they can produce inference results that are inconsistent with the correct answers in professional fields. This situation brings significant challenges to the application of LLMs in precise and accurate decision-making tasks. To solve this problem， a rule-guided Post Prompt of Large Language Model （PP-LLM） generation method was proposed. In this method， by generating post prompts， the original problem was transformed into two sub-problems that are easier to solve， thereby achieving the purposes of introducing expert knowledge and reducing the difficulty of task learning. Specifically， the knowledge-guided specific rules were used to transform the output part of the supervised dataset into a combination of post prompts and the output portion. PP-LLM method does not change the training and inference processes of the model， and does not add computational cost. Experimental results show that PP-LLM method significantly improves the accuracy of inference results and narrows the gap between model predictions and actual answers. Compared with the results without using the proposed method， the F1 value and Recall-Oriented Understudy for Gisting Evaluation （ROUGE） of the PP-LLM method have significantly improved. It can be seen that the above work improves the reliability of LLMs in professional applications and provides new ideas for LLM generation technology.

Key words: engineering drawing, Large Language Model (LLM), data augmentation, multi-modal, prompt

中图分类号:

TP302.2

孙晨伟, 侯俊利, 刘祥根, 吕建成. 面向工程图纸理解的大语言模型提示生成方法[J]. 计算机应用, 2025, 45(3): 801-807.

Chenwei SUN, Junli HOU, Xianggen LIU, Jiancheng LYU. Large language model prompt generation method for engineering drawing understanding[J]. Journal of Computer Applications, 2025, 45(3): 801-807.

图/表 6

参考文献 22

1	马常霞，张晨. 中文对话理解中基于预训练的意图分类和槽填充联合模型［J］. 山东大学学报（工学版）， 2020， 50（6）：68-75.
	MA C X， ZHANG C. Pre-trained based joint model for intent classification and slot filling in Chinese spoken language understanding ［J］. Journal of Shandong University （Engineering Science）， 2020， 50（6）： 68-75.
2	HU E J， SHEN Y， WALLIS P， et al. LoRA： low-rank adaptation of large language models ［EB/OL］. ［2023-11-25］. .
3	PANDYA K， HOLIA M. Automating customer service using LangChain： building custom open-source GPT Chatbot for organizations ［EB/OL］. ［2023-10-09］. .
4	MORI S， SUEN C Y， YAMAMOTO K. Historical review of OCR research and development ［J］. Proceedings of the IEEE， 1992， 80（7）： 1029-1058.
5	YAMASHITA R， NISHIO M， DO R K G， et al. Convolutional neural networks： an overview and application in radiology ［J］. Insights into Imaging， 2018， 9： 611-629.
6	薛程伟. 基于卷积神经网络的电路图纸智能识别技术研究［D］. 太原：中北大学， 2023.
	XUE C W. Intelligent recognition technology of circuit drawings based on convolutional neural network［D］. Taiyuan： North University of China， 2023.
7	陈海燕，甄霞军，赵涛涛，等. 一种自适应图像融合数据增强的高原鼠兔目标检测方法［J］. 农业工程学报， 2022， 38（S1）： 170-175.
	CHEN H Y， ZHEN X J， ZHAO T T， et al. Adaptive image fusion data augmentation method for Ochotona curzoniae object detection ［J］. Transactions of the Chinese Society of Agricultural Engineering， 2022， 38（S1）： 170-175.
8	JIN W， CHENG Y， SHEN Y， et al. A good prompt is worth millions of parameters？ low-resource prompt-based learning for vision-language models ［C］// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2022： 2763-2775.
9	WHITE J， FU Q， HAYS S， et al. A prompt pattern catalog to enhance prompt engineering with ChatGPT ［EB/OL］. ［2023-02-11］. .
10	LUO H， LIU Y， LIU P， et al. Vector-quantized prompt learning for paraphrase generation ［C］// Findings of the Association for Computational Linguistics： EMNLP 2023. Stroudsburg： ACL， 2023： 13389-13398.
11	李勇，曾加贝，刘昕，等. 面部动作单元检测方法进展与挑战［J］. 中国图象图形学报， 2020， 25（11）：2293-2305.
	LI Y， ZENG J B， LIU X， et al. Progress and challenges in facial action unit detection ［J］. Journal of Image and Graphics， 2020， 25（11）： 2293-2305.
12	刘世界. 涉海翻译中的机器翻译应用效能：基于BLEU、chrF++和BERTScore指标的综合评估［J］. 中国海洋大学学报（社会科学版）， 2024（2）：21-31.
	LIU S J. Evaluating the application efficacy of machine translation in maritime contexts： a rigorous evaluation via BLEU， chrF+ +， and BERTScore metrics ［J］. Journal of Ocean University of China （Social Sciences）， 2024（2）： 21-31.
13	姚远. 基于深度学习的视觉蕴涵推理及解释生成［D］. 合肥：合肥工业大学， 2022.
	YAO Y. Deep learning-based visual entailment inference and explanation generation ［D］. Hefei： Hefei University of Technology， 2022.
14	刘晓明，张兆晗，杨晨阳，等. 在线社交网络文本内容对抗技术［J］. 计算机学报， 2022， 45（8）：1571-1597.
	LIU X M， ZHANG Z H， YANG C Y， et al. Adversarial technology of text content on online social networks ［J］. Chinese Journal of Computers， 2022， 45（8）： 1571-1597.
15	周书豪. 神经网络在基因型不确定数据和经济数据上的研究［D］. 桂林：广西师范大学， 2020.
	ZHOU S H. Research on neural network for uncertain genotype data and economic data ［D］. Guilin： Guangxi Normal University， 2020.
16	BAI J， BAI S， YANG S， et al. Qwen-VL： a versatile vision-language model for understanding， localization， text reading， and beyond［EB/OL］. ［2024-02-11］. .
17	DU Z， QIAN Y， LIU X， et al. GLM： general language model pretraining with autoregressive blank infilling ［C］// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2022： 320-335.
18	DING M， YANG Z， HONG W， et al. CogView： mastering text-to-image generation via Transformers ［C］// Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2021： 19822-19835.
19	CHEN Z， WU J， WANG W， et al. Intern VL： scaling up vision foundation models and aligning for generic visual-linguistic tasks［C］// Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2024： 24185-24198.
20	CHEN Z， WANG W， TIAN H， et al. How far are we to GPT-4V？ closing the gap to commercial multimodal models with open-source suites ［J］. SCIENCE CHINA Information Sciences， 2024， 67（12）： No.220101.
21	LIU H， LI C， LI Y， et al. Improved baselines with visual instruction tuning ［C］// Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2024： 26286-26296.
22	01.AI. Yi： open foundation models by 01.AI ［EB/OL］. ［2025-02-11］. .

模型	数据集	调质工艺			工艺级统计			句子级统计
模型	数据集	精确率	召回率	F1值	精确率	召回率	F1值	BLEU	ROUGE-1	ROUGE-2	ROUGE-L
Qwen-VL	w/ PP	64.3	90.0	75.0	64.2	51.7	57.3	47.4	58.0	36.0	58.0
Qwen-VL	w/o PP	45.5	41.7	43.5	58.8	53.1	55.8	45.3	56.3	35.3	56.3
VisualGLM	w/ PP	57.1	38.1	45.7	73.2	35.0	47.4	20.5	47.4	26.6	47.4
VisualGLM	w/o PP	13.3	17.5	15.1	77.0	30.8	44.0	22.5	43.8	30.8	43.8
CogVLM	w/ PP	81.0	77.3	79.1	54.4	59.3	56.7	51.2	59.0	32.0	59.0
CogVLM	w/o PP	78.9	65.2	71.4	53.6	51.6	52.6	45.4	56.2	30.0	56.2
InternVL	w/ PP	50.0	87.5	63.6	50.9	51.8	51.3	41.4	53.9	29.9	53.9
InternVL	w/o PP	33.3	66.7	44.4	48.9	53.1	50.9	41.0	53.4	28.8	53.4
LLaVA	w/ PP	13.3	100.0	23.5	58.2	41.9	48.7	36.5	51.4	27.5	51.4
LLaVA	w/o PP	8.3	14.3	10.5	44.6	51.9	47.9	39.3	50.7	26.3	50.7
Yi-VL	w/ PP	31.7	11.7	17.1	27.3	48.7	35.0	37.9	54.1	31.2	54.1
Yi-VL	w/o PP	7.7	20.0	11.1	18.3	18.1	18.2	26.0	40.2	16.8	40.2

模型	数据集	调质工艺			工艺级统计			句子级统计
模型	数据集	精确率	召回率	F1值	精确率	召回率	F1值	BLEU	ROUGE-1	ROUGE-2	ROUGE-L
Qwen-VL	w/ PP	64.3	90.0	75.0	64.2	51.7	57.3	47.4	58.0	36.0	58.0
Qwen-VL	w/o PP	45.5	41.7	43.5	58.8	53.1	55.8	45.3	56.3	35.3	56.3
VisualGLM	w/ PP	57.1	38.1	45.7	73.2	35.0	47.4	20.5	47.4	26.6	47.4
VisualGLM	w/o PP	13.3	17.5	15.1	77.0	30.8	44.0	22.5	43.8	30.8	43.8
CogVLM	w/ PP	81.0	77.3	79.1	54.4	59.3	56.7	51.2	59.0	32.0	59.0
CogVLM	w/o PP	78.9	65.2	71.4	53.6	51.6	52.6	45.4	56.2	30.0	56.2
InternVL	w/ PP	50.0	87.5	63.6	50.9	51.8	51.3	41.4	53.9	29.9	53.9
InternVL	w/o PP	33.3	66.7	44.4	48.9	53.1	50.9	41.0	53.4	28.8	53.4
LLaVA	w/ PP	13.3	100.0	23.5	58.2	41.9	48.7	36.5	51.4	27.5	51.4
LLaVA	w/o PP	8.3	14.3	10.5	44.6	51.9	47.9	39.3	50.7	26.3	50.7
Yi-VL	w/ PP	31.7	11.7	17.1	27.3	48.7	35.0	37.9	54.1	31.2	54.1
Yi-VL	w/o PP	7.7	20.0	11.1	18.3	18.1	18.2	26.0	40.2	16.8	40.2

模型	数据集	平均工艺步骤数	平均工艺字符长度
Qwen-VL	w/ PP	2.810	23.430
Qwen-VL	w/o PP	2.920	17.160
VisualGLM	w/ PP	1.840	19.170
VisualGLM	w/o PP	1.000	5.000
CogVLM	w/ PP	3.770	32.600
CogVLM	w/o PP	3.580	22.850
InternVL	w/ PP	4.110	31.840
InternVL	w/o PP	3.990	23.650
LLaVA	w/ PP	2.500	20.870
LLaVA	w/o PP	3.790	21.860
Yi-VL	w/ PP	5.860	26.690
Yi-VL	w/o PP	3.180	10.530

模型	数据集	平均工艺步骤数	平均工艺字符长度
Qwen-VL	w/ PP	2.810	23.430
Qwen-VL	w/o PP	2.920	17.160
VisualGLM	w/ PP	1.840	19.170
VisualGLM	w/o PP	1.000	5.000
CogVLM	w/ PP	3.770	32.600
CogVLM	w/o PP	3.580	22.850
InternVL	w/ PP	4.110	31.840
InternVL	w/o PP	3.990	23.650
LLaVA	w/ PP	2.500	20.870
LLaVA	w/o PP	3.790	21.860
Yi-VL	w/ PP	5.860	26.690
Yi-VL	w/o PP	3.180	10.530

[1]	何静, 沈阳, 谢润锋. 大语言模型幻觉现象的识别与优化[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 709-714.
[2]	秦小林, 古徐, 李弟诚, 徐海文. 大语言模型综述与展望[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 685-696.
[3]	袁成哲, 陈国华, 李丁丁, 朱源, 林荣华, 钟昊, 汤庸. ScholatGPT：面向学术社交网络的大语言模型及智能应用[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 755-764.
[4]	曹鹏, 温广琪, 杨金柱, 陈刚, 刘歆一, 季学纯. 面向测试用例生成的大模型高效微调方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 725-731.
[5]	徐月梅, 叶宇齐, 何雪怡. 大语言模型的偏见挑战：识别、评估与去除[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 697-708.
[6]	杨燕, 叶枫, 许栋, 张雪洁, 徐津. 融合大语言模型和提示学习的数字孪生水利知识图谱构建[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 785-793.
[7]	张学飞, 张丽萍, 闫盛, 侯敏, 赵宇博. 知识图谱与大语言模型协同的个性化学习推荐[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 773-784.
[8]	董艳民, 林佳佳, 张征, 程程, 吴金泽, 王士进, 黄振亚, 刘淇, 陈恩红. 个性化学情感知的智慧助教算法设计与实践[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 765-772.
[9]	盛坤, 王中卿. 基于大语言模型和数据增强的通感隐喻分析[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 794-800.
[10]	蔡启健, 谭伟. 语义图增强的多模态推荐算法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 421-427.
[11]	富坤, 应世聪, 郑婷婷, 屈佳捷, 崔静远, 李建伟. 面向小样本节点分类的图数据增强方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 392-402.
[12]	严雪文, 黄章进. 基于对比学习的小样本图像分类方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 383-391.
[13]	张嘉琳, 任庆桦, 毛启容. 利用全局-局部特征依赖的反欺骗说话人验证系统[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 308-317.
[14]	张睿, 张鹏云, 高美蓉. 自优化双模态多通路非深度前庭神经鞘瘤识别模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2975-2982.
[15]	黄颖, 杨佳宇, 金家昊, 万邦睿. 用于RGBT跟踪的孪生混合信息融合算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2878-2885.

面向工程图纸理解的大语言模型提示生成方法

Large language model prompt generation method for engineering drawing understanding

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 6

参考文献 22

相关文章 15

编辑推荐

Metrics