Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (6): 1801-1810.DOI: 10.11772/j.issn.1001-9081.2025060730
• Artificial intelligence • Previous Articles
Lili HE1,2,3, Meng CAO1,2,3, Lei ZHANG1,2,3, Hongjun PAN3,4(
), Yi LIU1,2,3, Chengxin SUN5
Received:2025-07-02
Revised:2025-08-28
Accepted:2025-09-02
Online:2025-09-12
Published:2026-06-10
Contact:
Hongjun PAN
About author:HE Lili, born in 1979, Ph. D., professor. Her research interests include privacy protection, information security.Supported by:
何丽丽1,2,3, 曹勐1,2,3, 张磊1,2,3, 潘洪军3,4(
), 刘义1,2,3, 孙成心5
通讯作者:
潘洪军
作者简介:何丽丽(1979—),女,黑龙江佳木斯人,教授,博士,CCF会员,主要研究方向:隐私保护、信息安全基金资助:CLC Number:
Lili HE, Meng CAO, Lei ZHANG, Hongjun PAN, Yi LIU, Chengxin SUN. Sign language generation model based on Kolmogorov-Arnold network and diffusion Transformer[J]. Journal of Computer Applications, 2026, 46(6): 1801-1810.
何丽丽, 曹勐, 张磊, 潘洪军, 刘义, 孙成心. 基于Kolmogorov-Arnold网络与扩散Transformer的手语生成模型[J]. 《计算机应用》唯一官方网站, 2026, 46(6): 1801-1810.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2025060730
| 生成模型 | 代表方法 | 生成质量 | 生成速度 | 生成机制 | 推荐场景 |
|---|---|---|---|---|---|
Transformer 模型 | PT[ T2S-GPT[ MoMP[ | 适合生成连贯序列,但容易丢失 细节(如手指微动作) | 自回归推理较慢, 需逐帧生成 | 自回归式 逐帧预测 | 实时性要求高但细节要求一般的场景、 资源受限的应用、需要快速原型验证的项目 |
扩散 模型 | G2P-DDM[ MS2SL[ SignDiff[ | 渐进式生成,能精细化还原局部 细节,但需要占用更多的计算资源 | 可并行去噪,但多步 迭代仍比单次推理慢 | 渐进式 去噪生成 | 高保真手语动画生成、情感和风格可控的 虚拟人交互、多模态输入的复杂场景 |
Tab. 1 Comparison between Transformer models and diffusion models
| 生成模型 | 代表方法 | 生成质量 | 生成速度 | 生成机制 | 推荐场景 |
|---|---|---|---|---|---|
Transformer 模型 | PT[ T2S-GPT[ MoMP[ | 适合生成连贯序列,但容易丢失 细节(如手指微动作) | 自回归推理较慢, 需逐帧生成 | 自回归式 逐帧预测 | 实时性要求高但细节要求一般的场景、 资源受限的应用、需要快速原型验证的项目 |
扩散 模型 | G2P-DDM[ MS2SL[ SignDiff[ | 渐进式生成,能精细化还原局部 细节,但需要占用更多的计算资源 | 可并行去噪,但多步 迭代仍比单次推理慢 | 渐进式 去噪生成 | 高保真手语动画生成、情感和风格可控的 虚拟人交互、多模态输入的复杂场景 |
| 数据集 | 评估指标得分 | ||||
|---|---|---|---|---|---|
| BLEU-1 | BLEU-4 | ROUGE | FID | WER | |
RWTH-Phoenix-2014T 验证集 | 26.14 | 8.65 | 26.34 | 2.05 | 74.24 |
RWTH-Phoenix-2014T 测试集 | 25.98 | 8.79 | 26.91 | 2.29 | 75.04 |
| CSL-Daily验证集 | 51.41 | 23.41 | 48.14 | 1.35 | 42.51 |
| CSL-Daily测试集 | 50.18 | 21.83 | 45.72 | 1.29 | 42.12 |
Tab. 2 Evaluation scores of KDT on different datasets
| 数据集 | 评估指标得分 | ||||
|---|---|---|---|---|---|
| BLEU-1 | BLEU-4 | ROUGE | FID | WER | |
RWTH-Phoenix-2014T 验证集 | 26.14 | 8.65 | 26.34 | 2.05 | 74.24 |
RWTH-Phoenix-2014T 测试集 | 25.98 | 8.79 | 26.91 | 2.29 | 75.04 |
| CSL-Daily验证集 | 51.41 | 23.41 | 48.14 | 1.35 | 42.51 |
| CSL-Daily测试集 | 50.18 | 21.83 | 45.72 | 1.29 | 42.12 |
| 模型 | RWTH-BOSTON-104数据集得分 | ||||
|---|---|---|---|---|---|
| BLEU-1 | BLEU-4 | ROUGE | FID | WER | |
| PT[ | 5.53 | 2.17 | 4.74 | 0.72 | 32.13 |
| KDT | 13.53 | 4.76 | 12.51 | 0.64 | 25.62 |
Tab. 3 Evaluation scores on RWTH-BOSTON-104 dataset
| 模型 | RWTH-BOSTON-104数据集得分 | ||||
|---|---|---|---|---|---|
| BLEU-1 | BLEU-4 | ROUGE | FID | WER | |
| PT[ | 5.53 | 2.17 | 4.74 | 0.72 | 32.13 |
| KDT | 13.53 | 4.76 | 12.51 | 0.64 | 25.62 |
| 模型 | 验证集得分 | 测试集得分 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| BLEU-1 | BLEU-4 | ROUGE | FID | WER | BLEU-1 | BLEU-4 | ROUGE | FID | WER | |
| PT(Baseline) | 11.62 | 3.76 | 11.74 | 2.85 | 98.53 | 12.12 | 3.54 | 10.16 | 3.22 | 98.36 |
| PT+KAN | 14.42 | 4.38 | 15.35 | 2.78 | 86.42 | 13.91 | 4.45 | 14.87 | 2.69 | 83.62 |
| KDT | 26.14 | 8.65 | 26.34 | 2.05 | 74.24 | 25.98 | 8.79 | 26.91 | 2.29 | 75.04 |
Tab. 4 Ablation experimental results
| 模型 | 验证集得分 | 测试集得分 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| BLEU-1 | BLEU-4 | ROUGE | FID | WER | BLEU-1 | BLEU-4 | ROUGE | FID | WER | |
| PT(Baseline) | 11.62 | 3.76 | 11.74 | 2.85 | 98.53 | 12.12 | 3.54 | 10.16 | 3.22 | 98.36 |
| PT+KAN | 14.42 | 4.38 | 15.35 | 2.78 | 86.42 | 13.91 | 4.45 | 14.87 | 2.69 | 83.62 |
| KDT | 26.14 | 8.65 | 26.34 | 2.05 | 74.24 | 25.98 | 8.79 | 26.91 | 2.29 | 75.04 |
| 模型 | 验证集得分 | 测试集得分 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| BLEU-1 | BLEU-4 | ROUGE | FID | WER | BLEU-1 | BLEU-4 | ROUGE | FID | WER | |
| PT[ | 11.62 | 3.76 | 10.74 | 2.85 | 98.53 | 12.14 | 3.53 | 10.16 | 3.22 | 98.36 |
| GCDM[ | 22.88 | 7.64 | 23.35 | — | 82.81 | 22.20 | 7.91 | 23.20 | — | 81.94 |
| Sign-IDD[ | 24.18 | 8.09 | 24.87 | 2.22 | 77.72 | 24.18 | 8.87 | 25.87 | 2.46 | 76.66 |
| KDT | 26.14 | 8.65 | 26.34 | 2.05 | 74.24 | 25.98 | 8.98 | 26.91 | 2.29 | 75.03 |
Tab. 5 Comparison of scores of different models on RWTH-Phoenix-2014T dataset
| 模型 | 验证集得分 | 测试集得分 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| BLEU-1 | BLEU-4 | ROUGE | FID | WER | BLEU-1 | BLEU-4 | ROUGE | FID | WER | |
| PT[ | 11.62 | 3.76 | 10.74 | 2.85 | 98.53 | 12.14 | 3.53 | 10.16 | 3.22 | 98.36 |
| GCDM[ | 22.88 | 7.64 | 23.35 | — | 82.81 | 22.20 | 7.91 | 23.20 | — | 81.94 |
| Sign-IDD[ | 24.18 | 8.09 | 24.87 | 2.22 | 77.72 | 24.18 | 8.87 | 25.87 | 2.46 | 76.66 |
| KDT | 26.14 | 8.65 | 26.34 | 2.05 | 74.24 | 25.98 | 8.98 | 26.91 | 2.29 | 75.03 |
| [1] | 张磊,王振宇,连帅帅,等. 基于深度学习的手语翻译:过去、现状与未来[J]. 计算机应用研究, 2025, 42(8): 2241-2254. |
| ZHANG L, WANG Z Y, LIAN S S, et al. Deep learning-based sign language translation: past, present, and future[J]. Application Research of Computers, 2025, 42(8): 2241-2254. | |
| [2] | 郭丹,唐申庚,洪日昌,等. 手语识别、翻译与生成综述[J]. 计算机科学, 2021, 48(3): 60-70. |
| GUO D, TANG S G, HONG R C, et al. Review of sign language recognition, translation and generation[J]. Computer Science, 2021, 48(3): 60-70. | |
| [3] | 杨晓文,张志纯,况立群,等. 基于虚拟手的人机交互关键技术[J]. 计算机应用, 2015, 35(10): 2945-2949. |
| YANG X W, ZHANG Z C, KUANG L Q, et al. Key technologies of human-computer interaction based on virtual hand[J]. Journal of Computer Applications, 2015, 35(10): 2945-2949. | |
| [4] | 薛羽,张逸轩. 深层神经网络架构搜索综述[J]. 信息网络安全, 2023, 23(9): 58-74. |
| XUE Y, ZHANG Y X. Survey on deep neural architecture search[J]. Netinfo Security, 2023, 23(9): 58-74. | |
| [5] | 龙广玉,陈益强,邢云冰. 连续手语识别中的文本纠正和补全方法[J]. 计算机应用, 2021, 41(3): 694-698. |
| LONG G Y, CHEN Y Q, XING Y B. Text correction and completion method in continuous sign language recognition[J]. Journal of Computer Applications, 2021, 41(3): 694-698. | |
| [6] | 罗元,李丹,张毅. 基于时空注意力网络的中国手语识别[J]. 半导体光电, 2020, 41(3): 414-419. |
| LUO Y, LI D, ZHANG Y. Chinese sign language recognition based on spatial-temporal attention network[J]. Semiconductor Optoelectronics, 2020, 41(3): 414-419. | |
| [7] | GLAUERT J R W, ELLIOTT R, COX S J, et al. VANESSA: a system for communication between deaf and hearing people[J]. Technology and Disability, 2006, 18(4): 207-216. |
| [8] | 王兆其,高文. 基于虚拟人合成技术的中国手语合成方法[J]. 软件学报, 2002, 13(10): 2051-2056. |
| WANG Z Q, GAO W. A method to synthesize Chinese sign language based on virtual human technologies[J]. Journal of Software, 2002, 13(10): 2051-2056. | |
| [9] | FANG S, CHEN C, WANG L, et al. SignLLM: sign language production large language models[C]// Proceedings of the 2025 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway: IEEE, 2025: 6681-6693. |
| [10] | SAUNDERS B, CAMGOZ N C, BOWDEN R. Progressive Transformers for end-to-end sign language production[C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12356. Cham: Springer, 2020: 687-705. |
| [11] | MA X, JIN R, WANG J, et al. Attentional bias for hands: cascade dual-decoder Transformer for sign language production[J]. IET Computer Vision, 2024, 18(5): 696-708. |
| [12] | XIE P, PENG T, DU Y, et al. Sign language production with latent motion Transformer[C]// Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2024: 3012-3022. |
| [13] | LIU Z, WANG Y, VAIDYA S, et al. KAN: Kolmogorov-Arnold networks[EB/OL]. [2025-05-09].. |
| [14] | 刘灿锋,孙浩,东辉. 结合Transformer与Kolmogorov Arnold网络的分子扩增时序预测研究[J]. 图学学报, 2024, 45(6): 1256-1265. |
| LIU C F, SUN H, DONG H. Molecular amplification time series prediction research combining Transformer with Kolmogorov-Arnold network[J]. Journal of Graphics, 2024, 45(6): 1256-1265. | |
| [15] | YANG X, WANG X. Kolmogorov-Arnold Transformer[EB/OL]. [2024-09-16].. |
| [16] | GUO X, WANG Y, DU T, et al. ContraNorm: a contrastive learning perspective on oversmoothing and beyond[EB/OL]. [2023-05-02].. |
| [17] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010. |
| [18] | 张艳,马春明,刘树东,等. 基于多尺度特征增强的高效Transformer语义分割网络[J]. 光电工程, 2024, 51(12): No.240237. |
| ZHANG Y, MA C M, LIU S D, et al. Multi-scale feature enhanced Transformer network for efficient semantic segmentation[J]. Opto-Electronic Engineering, 2024, 51(12): No.240237. | |
| [19] | 邢长友,王梓澎,张国敏,等. 基于预训练Transformers的物联网设备识别方法[J]. 信息网络安全, 2024, 24(8): 1277-1290. |
| XING C Y, WANG Z P, ZHANG G M, et al. IoT device identification method based on pre-trained Transformers[J]. Netinfo Security, 2024, 24(8): 1277-1290. | |
| [20] | KAPOOR P, MUKHOPADHYAY R, HEGDE S B, et al. Towards automatic speech to sign language generation[C]// Proceedings of the INTERSPEECH 2021. [S.l.]: International Speech Communication Association, 2021: 3700-3704. |
| [21] | HWANG E J, LEE H, PARK J C. A gloss-free approach with discrete representations[C]// Proceedings of the IEEE 18th International Conference on Automatic Face and Gesture Recognition. Piscataway: IEEE, 2024: 1-6. |
| [22] | YIN A, LI H, SHEN K, et al. T2S-GPT: dynamic vector quantization for autoregressive sign language production from text[C]// Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2024: 3345-3356. |
| [23] | SAUNDERS B, CAMGOZ N C, BOWDEN R. Mixed signals: sign language production via a mixture of motion primitives[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 1899-1909. |
| [24] | XIE P, ZHANG Q, PENG T, et al. G2P-DDM: generating sign pose sequence from gloss sequence with discrete diffusion model[C]// Proceedings of the 38th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2024: 6234-6242. |
| [25] | MUGHAL M H, DABRAL R, HABIBIE I, et al. ConvoFusion: multi-modal conversational diffusion for co-speech gesture synthesis[C]// Proceedings of the 2024 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 1388-1398. |
| [26] | CHEN J, LIU Y, WANG J, et al. DiffSHEG: a diffusion-based approach for real-time speech-driven holistic 3D expression and gesture generation[C]// Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 7352-7361. |
| [27] | TANG S, HE J, GUO D, et al. Sign-IDD: iconicity disentangled diffusion for sign language production[C]// Proceedings of the 39th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2025: 7266-7274. |
| [28] | MA J, WANG W, YANG Y, et al . M S2SL: multimodal spoken data-driven continuous sign language production[C]// Findings of the Association for Computational Linguistics: ACL 2024. Stroudsburg: ACL, 2024: 7241-7254. |
| [29] | FANG S, SUI C, ZHOU Y, et al. SignDiff: diffusion model for American sign language production[C]// Proceedings of the IEEE 19th International Conference on Automatic Face and Gesture Recognition. Piscataway: IEEE, 2025: 1-11. |
| [30] | CAMGOZ N C, HADFIELD S, KOLLER O, et al. Neural sign language translation[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7784-7793. |
| [31] | DREUW P, RYBACH D, DESELAERS T, et al. Speech recognition techniques for a sign language recognition system[C]// Proceedings of the INTERSPEECH 2007. [S.l.]: International Speech Communication Association, 2007: 2513-2516. |
| [32] | ZHOU H, ZHOU W, QI W, et al. Improving sign language translation with monolingual data by sign back-translation[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 1316-1325. |
| [33] | TANG S, XUE F, WU J, et al. Gloss-driven conditional diffusion models for sign language production[J]. ACM Transactions on Multimedia Computing Communications and Applications, 2025, 21(4): No.105. |
| [34] | 孙剑文,张斌,司念文,等. 基于知识蒸馏的轻量化恶意流量检测方法[J]. 信息网络安全, 2025, 25(6): 859-871. |
| SUN J W, ZHANG B, SI N W, et al. Lightweight malicious traffic detection method based on knowledge distillation[J]. Netinfo Security, 2025, 25(6): 859-871. |
| [1] | Yi DU, Mingjin XU, Jiayi KONG, Liyao WANG, Chen ZHAO. Low-rank adaptive parameter-efficient fine-tuning algorithm based on YOLOv11 [J]. Journal of Computer Applications, 2026, 46(6): 1738-1745. |
| [2] | Minqi WU, Yuanhua YANG, Hang LI, Yaqin HU, Zhihao TANG, Teng MEI. Lightweight underwater small object detection based on graph Transformer and RT-DETR [J]. Journal of Computer Applications, 2026, 46(5): 1586-1595. |
| [3] | Xinyao LIU, Jun LIANG, Jiahao LONG, Renliang YAN. Fine-grained Chinese herbal medicine image classification based on feature fusion and channel information compensation [J]. Journal of Computer Applications, 2026, 46(5): 1677-1683. |
| [4] | Huijie GUO, Tianfeng DOU, Zhenlin ZHANG, Kaiyuan QI, Dong WU, Zhijian QU, Zhao LI, Chongguang REN. Time-interdependency-aware dynamic Bayesian network for traffic prediction [J]. Journal of Computer Applications, 2026, 46(5): 1507-1517. |
| [5] | Yuanhao HE, Jun ZHAO. Defect detection algorithm for train bearing rollers based on FHC-DETR [J]. Journal of Computer Applications, 2026, 46(5): 1624-1633. |
| [6] | Xing SHENG, Sunxian WENG, Kuosong CHEN, Zhongping WANG, Ruifeng REN, Yong LIU. Deep learning-based patent value evaluation for power grid enterprises [J]. Journal of Computer Applications, 2026, 46(5): 1468-1474. |
| [7] | Shengwei XU, Jianbo WANG, Jijie HAN, Yijie BAI. Face forgery detection method based on tri-branch feature extraction [J]. Journal of Computer Applications, 2026, 46(4): 1292-1299. |
| [8] | Xinyi YAN, Linglong ZHU, Yonghong ZHANG. CDC-DETR: multi-scale real-time human-vehicle detection method for complex traffic scenarios [J]. Journal of Computer Applications, 2026, 46(4): 1283-1291. |
| [9] | Xiang BAI, Juchuan LI, Huimin WANG, Chao JING, Jian NIU, Xingzhong ZHANG, Yongqiang CHENG. Power image retrieval method based on improved Swin Transformer [J]. Journal of Computer Applications, 2026, 46(4): 1334-1343. |
| [10] | Jie HU, Pengcheng LI, Jun SUN, Jiaao ZHANG. Key phrase extraction model based on multi-perspective information enhancement and hierarchical weighting [J]. Journal of Computer Applications, 2026, 46(4): 1050-1057. |
| [11] | Haoxuan CHEN, Peichang YE, Lei LIU, Chengming LIU, Wenhua HU. Survey of automated code edit suggestion [J]. Journal of Computer Applications, 2026, 46(4): 1227-1237. |
| [12] | Ping HUANG, Qing LI, Haifeng QIU, Chengsi WANG, Anzi HUANG, Long FAN. Lightweight method for transmission line defect detection [J]. Journal of Computer Applications, 2026, 46(3): 969-979. |
| [13] | Hanqing LIU, Guoming SANG, Yijia ZHANG. Remote sensing image captioning model combining dense multi-scale feature fusion and feature knowledge-enhanced Transformer [J]. Journal of Computer Applications, 2026, 46(3): 741-749. |
| [14] | Jian ZHANG, Jianbo YU, Jian TANG. Municipal solid waste incineration state recognition method based on multilayer preprocessing [J]. Journal of Computer Applications, 2026, 46(3): 940-949. |
| [15] | Songsen YU, Huang HE, Guopeng XUE, Hengtuo CUI. Quantitation and grading method for ceramic tile chromatic aberration based on improved fractal encoding network [J]. Journal of Computer Applications, 2026, 46(3): 959-968. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||