Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (3): 801-807.DOI: 10.11772/j.issn.1001-9081.2024101537
• Frontier research and typical applications of large models • Previous Articles Next Articles
Chenwei SUN1, Junli HOU2, Xianggen LIU1, Jiancheng LYU1()
Received:
2024-10-15
Revised:
2024-12-20
Accepted:
2024-12-26
Online:
2025-02-07
Published:
2025-03-10
Contact:
Jiancheng LYU
About author:
SUN Chenwei, born in 2000, M. S. candidate. His research interests include natural language processing, artificial intelligence.Supported by:
通讯作者:
吕建成
作者简介:
孙晨伟(2000—),男,山东济南人,硕士研究生,主要研究方向:自然语言处理、人工智能基金资助:
CLC Number:
Chenwei SUN, Junli HOU, Xianggen LIU, Jiancheng LYU. Large language model prompt generation method for engineering drawing understanding[J]. Journal of Computer Applications, 2025, 45(3): 801-807.
孙晨伟, 侯俊利, 刘祥根, 吕建成. 面向工程图纸理解的大语言模型提示生成方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 801-807.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2024101537
模型 | 数据集 | 调质工艺 | 工艺级统计 | 句子级统计 | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
精确率 | 召回率 | F1值 | 精确率 | 召回率 | F1值 | BLEU | ROUGE-1 | ROUGE-2 | ROUGE-L | ||
Qwen-VL | w/ PP | 64.3 | 90.0 | 75.0 | 64.2 | 51.7 | 57.3 | 47.4 | 58.0 | 36.0 | 58.0 |
w/o PP | 45.5 | 41.7 | 43.5 | 58.8 | 53.1 | 55.8 | 45.3 | 56.3 | 35.3 | 56.3 | |
VisualGLM | w/ PP | 57.1 | 38.1 | 45.7 | 73.2 | 35.0 | 47.4 | 20.5 | 47.4 | 26.6 | 47.4 |
w/o PP | 13.3 | 17.5 | 15.1 | 77.0 | 30.8 | 44.0 | 22.5 | 43.8 | 30.8 | 43.8 | |
CogVLM | w/ PP | 81.0 | 77.3 | 79.1 | 54.4 | 59.3 | 56.7 | 51.2 | 59.0 | 32.0 | 59.0 |
w/o PP | 78.9 | 65.2 | 71.4 | 53.6 | 51.6 | 52.6 | 45.4 | 56.2 | 30.0 | 56.2 | |
InternVL | w/ PP | 50.0 | 87.5 | 63.6 | 50.9 | 51.8 | 51.3 | 41.4 | 53.9 | 29.9 | 53.9 |
w/o PP | 33.3 | 66.7 | 44.4 | 48.9 | 53.1 | 50.9 | 41.0 | 53.4 | 28.8 | 53.4 | |
LLaVA | w/ PP | 13.3 | 100.0 | 23.5 | 58.2 | 41.9 | 48.7 | 36.5 | 51.4 | 27.5 | 51.4 |
w/o PP | 8.3 | 14.3 | 10.5 | 44.6 | 51.9 | 47.9 | 39.3 | 50.7 | 26.3 | 50.7 | |
Yi-VL | w/ PP | 31.7 | 11.7 | 17.1 | 27.3 | 48.7 | 35.0 | 37.9 | 54.1 | 31.2 | 54.1 |
w/o PP | 7.7 | 20.0 | 11.1 | 18.3 | 18.1 | 18.2 | 26.0 | 40.2 | 16.8 | 40.2 |
Tab. 1 Inference states after training on six different models using PP-LLM method or original dataset
模型 | 数据集 | 调质工艺 | 工艺级统计 | 句子级统计 | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
精确率 | 召回率 | F1值 | 精确率 | 召回率 | F1值 | BLEU | ROUGE-1 | ROUGE-2 | ROUGE-L | ||
Qwen-VL | w/ PP | 64.3 | 90.0 | 75.0 | 64.2 | 51.7 | 57.3 | 47.4 | 58.0 | 36.0 | 58.0 |
w/o PP | 45.5 | 41.7 | 43.5 | 58.8 | 53.1 | 55.8 | 45.3 | 56.3 | 35.3 | 56.3 | |
VisualGLM | w/ PP | 57.1 | 38.1 | 45.7 | 73.2 | 35.0 | 47.4 | 20.5 | 47.4 | 26.6 | 47.4 |
w/o PP | 13.3 | 17.5 | 15.1 | 77.0 | 30.8 | 44.0 | 22.5 | 43.8 | 30.8 | 43.8 | |
CogVLM | w/ PP | 81.0 | 77.3 | 79.1 | 54.4 | 59.3 | 56.7 | 51.2 | 59.0 | 32.0 | 59.0 |
w/o PP | 78.9 | 65.2 | 71.4 | 53.6 | 51.6 | 52.6 | 45.4 | 56.2 | 30.0 | 56.2 | |
InternVL | w/ PP | 50.0 | 87.5 | 63.6 | 50.9 | 51.8 | 51.3 | 41.4 | 53.9 | 29.9 | 53.9 |
w/o PP | 33.3 | 66.7 | 44.4 | 48.9 | 53.1 | 50.9 | 41.0 | 53.4 | 28.8 | 53.4 | |
LLaVA | w/ PP | 13.3 | 100.0 | 23.5 | 58.2 | 41.9 | 48.7 | 36.5 | 51.4 | 27.5 | 51.4 |
w/o PP | 8.3 | 14.3 | 10.5 | 44.6 | 51.9 | 47.9 | 39.3 | 50.7 | 26.3 | 50.7 | |
Yi-VL | w/ PP | 31.7 | 11.7 | 17.1 | 27.3 | 48.7 | 35.0 | 37.9 | 54.1 | 31.2 | 54.1 |
w/o PP | 7.7 | 20.0 | 11.1 | 18.3 | 18.1 | 18.2 | 26.0 | 40.2 | 16.8 | 40.2 |
模型 | 数据集 | 平均工艺步骤数 | 平均工艺字符长度 |
---|---|---|---|
Qwen-VL | w/ PP | 2.810 | 23.430 |
w/o PP | 2.920 | 17.160 | |
VisualGLM | w/ PP | 1.840 | 19.170 |
w/o PP | 1.000 | 5.000 | |
CogVLM | w/ PP | 3.770 | 32.600 |
w/o PP | 3.580 | 22.850 | |
InternVL | w/ PP | 4.110 | 31.840 |
w/o PP | 3.990 | 23.650 | |
LLaVA | w/ PP | 2.500 | 20.870 |
w/o PP | 3.790 | 21.860 | |
Yi-VL | w/ PP | 5.860 | 26.690 |
w/o PP | 3.180 | 10.530 |
Tab. 2 Comparison of average process steps and character lengths for six models
模型 | 数据集 | 平均工艺步骤数 | 平均工艺字符长度 |
---|---|---|---|
Qwen-VL | w/ PP | 2.810 | 23.430 |
w/o PP | 2.920 | 17.160 | |
VisualGLM | w/ PP | 1.840 | 19.170 |
w/o PP | 1.000 | 5.000 | |
CogVLM | w/ PP | 3.770 | 32.600 |
w/o PP | 3.580 | 22.850 | |
InternVL | w/ PP | 4.110 | 31.840 |
w/o PP | 3.990 | 23.650 | |
LLaVA | w/ PP | 2.500 | 20.870 |
w/o PP | 3.790 | 21.860 | |
Yi-VL | w/ PP | 5.860 | 26.690 |
w/o PP | 3.180 | 10.530 |
1 | 马常霞,张晨. 中文对话理解中基于预训练的意图分类和槽填充联合模型[J]. 山东大学学报(工学版), 2020, 50(6):68-75. |
MA C X, ZHANG C. Pre-trained based joint model for intent classification and slot filling in Chinese spoken language understanding [J]. Journal of Shandong University (Engineering Science), 2020, 50(6): 68-75. | |
2 | HU E J, SHEN Y, WALLIS P, et al. LoRA: low-rank adaptation of large language models [EB/OL]. [2023-11-25]. . |
3 | PANDYA K, HOLIA M. Automating customer service using LangChain: building custom open-source GPT Chatbot for organizations [EB/OL]. [2023-10-09]. . |
4 | MORI S, SUEN C Y, YAMAMOTO K. Historical review of OCR research and development [J]. Proceedings of the IEEE, 1992, 80(7): 1029-1058. |
5 | YAMASHITA R, NISHIO M, DO R K G, et al. Convolutional neural networks: an overview and application in radiology [J]. Insights into Imaging, 2018, 9: 611-629. |
6 | 薛程伟. 基于卷积神经网络的电路图纸智能识别技术研究[D]. 太原:中北大学, 2023. |
XUE C W. Intelligent recognition technology of circuit drawings based on convolutional neural network[D]. Taiyuan: North University of China, 2023. | |
7 | 陈海燕,甄霞军,赵涛涛,等. 一种自适应图像融合数据增强的高原鼠兔目标检测方法[J]. 农业工程学报, 2022, 38(S1): 170-175. |
CHEN H Y, ZHEN X J, ZHAO T T, et al. Adaptive image fusion data augmentation method for Ochotona curzoniae object detection [J]. Transactions of the Chinese Society of Agricultural Engineering, 2022, 38(S1): 170-175. | |
8 | JIN W, CHENG Y, SHEN Y, et al. A good prompt is worth millions of parameters? low-resource prompt-based learning for vision-language models [C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2022: 2763-2775. |
9 | WHITE J, FU Q, HAYS S, et al. A prompt pattern catalog to enhance prompt engineering with ChatGPT [EB/OL]. [2023-02-11]. . |
10 | LUO H, LIU Y, LIU P, et al. Vector-quantized prompt learning for paraphrase generation [C]// Findings of the Association for Computational Linguistics: EMNLP 2023. Stroudsburg: ACL, 2023: 13389-13398. |
11 | 李勇,曾加贝,刘昕,等. 面部动作单元检测方法进展与挑战[J]. 中国图象图形学报, 2020, 25(11):2293-2305. |
LI Y, ZENG J B, LIU X, et al. Progress and challenges in facial action unit detection [J]. Journal of Image and Graphics, 2020, 25(11): 2293-2305. | |
12 | 刘世界. 涉海翻译中的机器翻译应用效能:基于BLEU、chrF++和BERTScore指标的综合评估[J]. 中国海洋大学学报(社会科学版), 2024(2):21-31. |
LIU S J. Evaluating the application efficacy of machine translation in maritime contexts: a rigorous evaluation via BLEU, chrF+ +, and BERTScore metrics [J]. Journal of Ocean University of China (Social Sciences), 2024(2): 21-31. | |
13 | 姚远. 基于深度学习的视觉蕴涵推理及解释生成[D]. 合肥:合肥工业大学, 2022. |
YAO Y. Deep learning-based visual entailment inference and explanation generation [D]. Hefei: Hefei University of Technology, 2022. | |
14 | 刘晓明,张兆晗,杨晨阳,等. 在线社交网络文本内容对抗技术[J]. 计算机学报, 2022, 45(8):1571-1597. |
LIU X M, ZHANG Z H, YANG C Y, et al. Adversarial technology of text content on online social networks [J]. Chinese Journal of Computers, 2022, 45(8): 1571-1597. | |
15 | 周书豪. 神经网络在基因型不确定数据和经济数据上的研究[D]. 桂林:广西师范大学, 2020. |
ZHOU S H. Research on neural network for uncertain genotype data and economic data [D]. Guilin: Guangxi Normal University, 2020. | |
16 | BAI J, BAI S, YANG S, et al. Qwen-VL: a versatile vision-language model for understanding, localization, text reading, and beyond[EB/OL]. [2024-02-11]. . |
17 | DU Z, QIAN Y, LIU X, et al. GLM: general language model pretraining with autoregressive blank infilling [C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2022: 320-335. |
18 | DING M, YANG Z, HONG W, et al. CogView: mastering text-to-image generation via Transformers [C]// Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 19822-19835. |
19 | CHEN Z, WU J, WANG W, et al. Intern VL: scaling up vision foundation models and aligning for generic visual-linguistic tasks[C]// Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 24185-24198. |
20 | CHEN Z, WANG W, TIAN H, et al. How far are we to GPT-4V? closing the gap to commercial multimodal models with open-source suites [J]. SCIENCE CHINA Information Sciences, 2024, 67(12): No.220101. |
21 | LIU H, LI C, LI Y, et al. Improved baselines with visual instruction tuning [C]// Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 26286-26296. |
22 | 01.AI. Yi: open foundation models by 01.AI [EB/OL]. [2025-02-11]. . |
[1] | Xiaolin QIN, Xu GU, Dicheng LI, Haiwen XU. Survey and prospect of large language models [J]. Journal of Computer Applications, 2025, 45(3): 685-696. |
[2] | Chengzhe YUAN, Guohua CHEN, Dingding LI, Yuan ZHU, Ronghua LIN, Hao ZHONG, Yong TANG. ScholatGPT: a large language model for academic social networks and its intelligent applications [J]. Journal of Computer Applications, 2025, 45(3): 755-764. |
[3] | Peng CAO, Guangqi WEN, Jinzhu YANG, Gang CHEN, Xinyi LIU, Xuechun JI. Efficient fine-tuning method of large language models for test case generation [J]. Journal of Computer Applications, 2025, 45(3): 725-731. |
[4] | Xuefei ZHANG, Liping ZHANG, Sheng YAN, Min HOU, Yubo ZHAO. Personalized learning recommendation in collaboration of knowledge graph and large language model [J]. Journal of Computer Applications, 2025, 45(3): 773-784. |
[5] | Yanmin DONG, Jiajia LIN, Zheng ZHANG, Cheng CHENG, Jinze WU, Shijin WANG, Zhenya HUANG, Qi LIU, Enhong CHEN. Design and practice of intelligent tutoring algorithm based on personalized student capability perception [J]. Journal of Computer Applications, 2025, 45(3): 765-772. |
[6] | Jing HE, Yang SHEN, Runfeng XIE. Recognition and optimization of hallucination phenomena in large language models [J]. Journal of Computer Applications, 2025, 45(3): 709-714. |
[7] | Yuemei XU, Yuqi YE, Xueyi HE. Bias challenges of large language models: identification, evaluation, and mitigation [J]. Journal of Computer Applications, 2025, 45(3): 697-708. |
[8] | Yan YANG, Feng YE, Dong XU, Xuejie ZHANG, Jin XU. Construction of digital twin water conservancy knowledge graph integrating large language model and prompt learning [J]. Journal of Computer Applications, 2025, 45(3): 785-793. |
[9] | Kun FU, Shicong YING, Tingting ZHENG, Jiajie QU, Jingyuan CUI, Jianwei LI. Graph data augmentation method for few-shot node classification [J]. Journal of Computer Applications, 2025, 45(2): 392-402. |
[10] | Qijian CAI, Wei TAN. Semantic graph enhanced multi-modal recommendation algorithm [J]. Journal of Computer Applications, 2025, 45(2): 421-427. |
[11] | Xuewen YAN, Zhangjin HUANG. Few-shot image classification method based on contrast learning [J]. Journal of Computer Applications, 2025, 45(2): 383-391. |
[12] | Bin LI, Min LIN, Siriguleng, Yingjie GAO, Yurong WANG, Shujun ZHANG. Joint entity-relation extraction method for ancient Chinese books based on prompt learning and global pointer network [J]. Journal of Computer Applications, 2025, 45(1): 75-81. |
[13] | Jialin ZHANG, Qinghua REN, Qirong MAO. Speaker verification system utilizing global-local feature dependency for anti-spoofing [J]. Journal of Computer Applications, 2025, 45(1): 308-317. |
[14] | Yuxin HUANG, Jialong XU, Zhengtao YU, Shukai HOU, Jiaqi ZHOU. Unsupervised text sentiment transfer method based on generation prompt [J]. Journal of Computer Applications, 2024, 44(9): 2667-2673. |
[15] | Rui ZHANG, Pengyun ZHANG, Meirong GAO. Self-optimized dual-modal multi-channel non-deep vestibular schwannoma recognition model [J]. Journal of Computer Applications, 2024, 44(9): 2975-2982. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||