Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Sign language generation model based on Kolmogorov-Arnold network and diffusion Transformer

Lili HE, Meng CAO, Lei ZHANG, Hongjun PAN, Yi LIU, Chengxin SUN

Journal of Computer Applications 2026, 46 (6): 1801-1810. DOI: 10.11772/j.issn.1001-9081.2025060730

Abstract （86）

HTML （0）

PDF （1212KB）（31）

Save

To address the problems of blurry generation results， detail loss， and uneven feature distribution caused by insufficient local information extraction of the existing models in sign language generation tasks， a sign language generation model based on Kolmogorov-Arnold Network （KAN） and Diffusion Transformer （KDT） was proposed. Firstly， the nonlinear approximation capability of the KAN was utilized to fit complex data distribution， so as to enhance the detail representation and motion fluency between video frames， thereby addressing the blurriness problem of videos generated by traditional Multilayer Perceptron （MLP） models. Then， Contrast Normalization （ContraNorm） was used to replace the original normalization， so as to address the uneven feature distribution problem by calibrating differences in feature scales， thereby ensuring the model’s stability with poor data quality and interference. Finally， diffusion Transformer was employed to achieve refined evolution from random noise to the target sequence through multi-step iterative optimization， thereby addressing the detail loss problem of traditional models. Experimental results on the validation set of RWTH-Phoenix-2014T continuous sign language dataset show that compared to the Sign-IDD （Sign-Iconicity Disentangled Diffusion） model， this model has the BLEU-1 （Bilingual Evaluation Understudy 1-gram） and ROUGE （Recall-Oriented Understudy for Gisting Evaluation） metrics improved by 8.1% and 5.9%， respectively， and the Word Error Rate （WER） metric reduced by 4.5%. The above results verify the effectiveness of this model in enhancing the richness of video details and the fluency of sign language movements.

Table and Figures | Reference | Related Articles | Metrics