Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Sign language generation model based on Kolmogorov-Arnold network and diffusion Transformer
Lili HE, Meng CAO, Lei ZHANG, Hongjun PAN, Yi LIU, Chengxin SUN
Journal of Computer Applications    2026, 46 (6): 1801-1810.   DOI: 10.11772/j.issn.1001-9081.2025060730
Abstract86)   HTML0)    PDF (1212KB)(31)       Save

To address the problems of blurry generation results, detail loss, and uneven feature distribution caused by insufficient local information extraction of the existing models in sign language generation tasks, a sign language generation model based on Kolmogorov-Arnold Network (KAN) and Diffusion Transformer (KDT) was proposed. Firstly, the nonlinear approximation capability of the KAN was utilized to fit complex data distribution, so as to enhance the detail representation and motion fluency between video frames, thereby addressing the blurriness problem of videos generated by traditional Multilayer Perceptron (MLP) models. Then, Contrast Normalization (ContraNorm) was used to replace the original normalization, so as to address the uneven feature distribution problem by calibrating differences in feature scales, thereby ensuring the model’s stability with poor data quality and interference. Finally, diffusion Transformer was employed to achieve refined evolution from random noise to the target sequence through multi-step iterative optimization, thereby addressing the detail loss problem of traditional models. Experimental results on the validation set of RWTH-Phoenix-2014T continuous sign language dataset show that compared to the Sign-IDD (Sign-Iconicity Disentangled Diffusion) model, this model has the BLEU-1 (Bilingual Evaluation Understudy 1-gram) and ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metrics improved by 8.1% and 5.9%, respectively, and the Word Error Rate (WER) metric reduced by 4.5%. The above results verify the effectiveness of this model in enhancing the richness of video details and the fluency of sign language movements.

Table and Figures | Reference | Related Articles | Metrics