Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (6): 1801-1810.DOI: 10.11772/j.issn.1001-9081.2025060730

• Artificial intelligence • Previous Articles    

Sign language generation model based on Kolmogorov-Arnold network and diffusion Transformer

Lili HE1,2,3, Meng CAO1,2,3, Lei ZHANG1,2,3, Hongjun PAN3,4(), Yi LIU1,2,3, Chengxin SUN5   

  1. 1.College of Information and Electronic Technology,Jiamusi University,Jiamusi Heilongjiang 154007,China
    2.Heilongjiang Provincial Key Laboratory of Autonomous Intelligence and Information Processing (Jiamusi University),Jiamusi Heilongjiang 154007,China
    3.Jiamusi Key Laboratory of Satellite Navigation Technology and Equipment Engineering Technology,Jiamusi University,Jiamusi Heilongjiang 154007,China
    4.Handan Vocational College of Science and Technology,Handan Hebei 056046,China
    5.Experimental Training and Equipment Management Center,Jiamusi University,Jiamusi Heilongjiang 154007,China
  • Received:2025-07-02 Revised:2025-08-28 Accepted:2025-09-02 Online:2025-09-12 Published:2026-06-10
  • Contact: Hongjun PAN
  • About author:HE Lili, born in 1979, Ph. D., professor. Her research interests include privacy protection, information security.
    CAO Meng, born in 2001, M. S. candidate. His research interests include natural language processing, computer vision.
    ZHANG Lei, born in 1982, Ph. D., professor. His research interests include information security, privacy protection.
    LIU Yi, born in 1979, M. S., associate professor. His research interests include privacy protection, image processing.
    SUN Chengxin, born in 1980, M. S. Her research interests include information construction, social management.
    First author contact:PAN Hongjun, born in 1973, lecturer. His research interests include natural language processing, image processing.
  • Supported by:
    Scientific Research Project of Fundamental Research Funds for the Heilongjiang Provincial Higher Education Institutions(18KYYWF0941);Research Special Project on Theoretical Course Teaching Reform of Ideological and Political Courses in Colleges and Universities in Heilongjiang Education and Teaching Reform Project(SJGSX2024008);Heilongjiang Provincial Undergraduate College Outstanding Young Teachers Basic Research Support Program(YQJH2024239);Joint Fund Cultivation Project of the Natural Science Foundation of Heilongjiang(PL2024F002);Excellent Innovation Team Construction Project of Fundamental Research Funds for the Heilongjiang Provincial Higher Education Institutions(2022-KYYWF-0654);Key Research Course of Economic and Social Development of Heilongjiang Province(WY2025012);Teaching Reform Project of Jiamusi University(2023JY6-36);“East Pole” Academic Team of Jiamusi University(DJXSTD202417)

基于Kolmogorov-Arnold网络与扩散Transformer的手语生成模型

何丽丽1,2,3, 曹勐1,2,3, 张磊1,2,3, 潘洪军3,4(), 刘义1,2,3, 孙成心5   

  1. 1.佳木斯大学 信息电子技术学院,黑龙江 佳木斯 154007
    2.黑龙江省自主智能与信息处理重点实验室(佳木斯大学),黑龙江 佳木斯 154007
    3.佳木斯大学 佳木斯市卫星导航技术与装备工程技术重点实验室,黑龙江 佳木斯 154007
    4.邯郸科技职业学院,河北 邯郸 056046
    5.佳木斯大学 实验实训及设备管理中心,黑龙江 佳木斯 154007
  • 通讯作者: 潘洪军
  • 作者简介:何丽丽(1979—),女,黑龙江佳木斯人,教授,博士,CCF会员,主要研究方向:隐私保护、信息安全
    曹勐(2001—),男,黑龙江大兴安岭人,硕士研究生,主要研究方向:自然语言处理、计算机视觉
    张磊(1982—),男,黑龙江绥化人,教授,博士,CCF会员,主要研究方向:信息安全、隐私保护
    刘义(1979—),男,黑龙江望奎人,副教授,硕士,CCF会员,主要研究方向:隐私保护、图像处理
    孙成心(1980—),女,黑龙江佳木斯人,硕士,主要研究方向:信息化建设、社会管理。
    第一联系人:潘洪军(1973—),河北大名人,讲师,主要研究方向:自然语言处理、图像处理
  • 基金资助:
    黑龙江省省属高等学校基本科研业务费科研项目(18KYYWF0941);黑龙江省教育教学改革项目高校思政课理论课教学改革研究专项(SJGSX2024008);黑龙江省省属本科高校优秀青年教师基础研究支持计划项目(YQJH2024239);黑龙江省自然科学基金联合基金培育项目(PL2024F002);黑龙江省省属高等学校基本科研业务费优秀创新团队建设项目(2022-KYYWF-0654);黑龙江省经济社会发展重点研究课(WY2025012);佳木斯大学教改项目(2023JY6-36);佳木斯大学“东极”学术团队(DJXSTD202417)

Abstract:

To address the problems of blurry generation results, detail loss, and uneven feature distribution caused by insufficient local information extraction of the existing models in sign language generation tasks, a sign language generation model based on Kolmogorov-Arnold Network (KAN) and Diffusion Transformer (KDT) was proposed. Firstly, the nonlinear approximation capability of the KAN was utilized to fit complex data distribution, so as to enhance the detail representation and motion fluency between video frames, thereby addressing the blurriness problem of videos generated by traditional Multilayer Perceptron (MLP) models. Then, Contrast Normalization (ContraNorm) was used to replace the original normalization, so as to address the uneven feature distribution problem by calibrating differences in feature scales, thereby ensuring the model’s stability with poor data quality and interference. Finally, diffusion Transformer was employed to achieve refined evolution from random noise to the target sequence through multi-step iterative optimization, thereby addressing the detail loss problem of traditional models. Experimental results on the validation set of RWTH-Phoenix-2014T continuous sign language dataset show that compared to the Sign-IDD (Sign-Iconicity Disentangled Diffusion) model, this model has the BLEU-1 (Bilingual Evaluation Understudy 1-gram) and ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metrics improved by 8.1% and 5.9%, respectively, and the Word Error Rate (WER) metric reduced by 4.5%. The above results verify the effectiveness of this model in enhancing the richness of video details and the fluency of sign language movements.

Key words: sign language video generation, machine translation, deep learning, Transformer, sequence modeling

摘要:

针对手语生成任务中现有模型在局部信息提取方面的不足导致的生成效果模糊、细节丢失和特征分布不均匀等问题,提出一种基于Kolmogorov-Arnold网络(KAN)与扩散Transformer的手语生成模型(KDT)。首先,利用KAN非线性逼近能力拟合复杂数据分布,提高视频帧间的细节表现力与运动流畅度,解决传统多层感知机(MLP)模型生成视频模糊的问题;其次,使用对比归一化(ContraNorm)替代原有归一化,通过校准特征尺度差异解决特征分布不均匀问题,在数据质量较差和存在干扰时使模型仍能保持稳定性;最后,通过扩散Transformer通过多步迭代优化实现从随机噪声出发向目标序列的精细化演化,解决传统模型丢失细节的问题。在RWTH-Phoenix-2014T连续手语数据集验证集上的实验结果表明,与Sign-IDD (Sign-Iconicity Disentangled Diffusion)模型相比,该模型在BLEU-1(Bilingual Evaluation Understudy 1-gram)和ROUGE (Recall-Oriented Understudy for Gisting Evaluation)指标上分别提高了8.1%和5.9%,错词率(WER)指标降低了4.5%。上述结果验证了该模型在提升视频细节丰富度与手语动作流畅性方面的有效性。

关键词: 手语视频生成, 机器翻译, 深度学习, Transformer, 序列建模

CLC Number: