计算机应用 ›› 2021, Vol. 41 ›› Issue (3): 694-698.DOI: 10.11772/j.issn.1001-9081.2020060798

所属专题: 人工智能

• 人工智能 • 上一篇    下一篇

连续手语识别中的文本纠正和补全方法

龙广玉1, 陈益强1,2, 邢云冰2   

  1. 1. 湘潭大学 计算机学院·网络空间安全学院, 湖南 湘潭 411105;
    2. 中国科学院 计算技术研究所, 北京 100190
  • 收稿日期:2020-06-11 修回日期:2020-10-20 出版日期:2021-03-10 发布日期:2020-12-22
  • 通讯作者: 陈益强
  • 作者简介:龙广玉(1995-),女,广西宜州人,硕士研究生,CCF会员,主要研究方向:自然语言处理、数据挖掘;陈益强(1973-),男,湖南湘潭人,研究员,博士,CCF杰出会员,主要研究方向:泛在计算、可穿戴计算、智能人机交互;邢云冰(1982-),男,河北张家口人,高级工程师,硕士,主要研究方向:手语交互、感知计算、健康监护。
  • 基金资助:
    国家重点研发计划项目(2018YFC2002603)。

Text correction and completion method in continuous sign language recognition

LONG Guangyu1, CHEN Yiqiang1,2, XING Yunbing2   

  1. 1. School of Computer Science&School of Cyberspace Science, Xiangtan University, Xiangtan Hunan 411105, China;
    2. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
  • Received:2020-06-11 Revised:2020-10-20 Online:2021-03-10 Published:2020-12-22
  • Supported by:
    This work is partially supported by the National Key Research and Development Program of China (2018YFC2002603).

摘要: 针对基于视频的连续手语识别的文本结果存在语义模糊、语序混乱的问题,提出一种两步法将连续手语识别结果的手语文本转化为通顺、可懂的汉语文本。第一步,基于自然手语规则以及N元语言模型(N-gram)对连续手语识别的结果进行文本调序;第二步,利用汉语通用量词数据集训练双向长短期记忆(Bi-LSTM)网络模型,以解决手语语法无量词的问题,从而提升语句通顺度。使用绝对准确率和最长正确子序列占比作为文本调序的评价指标,实验结果显示,所提方法的文本调序结果绝对准确率为77.06%,最长正确子序列占比为86.55%,量词补全准确率为97.23%。所提的方法能够有效提升连续手语识别的文本结果的通畅度和可懂度,已成功应用于基于视频的连续手语识别,提升了听障人和健听人的无障碍交流体验。

关键词: 连续手语识别, N元语言模型, 文本调序, 双向长短记忆网络, 量词补全

Abstract: Aiming at the problem that the text results of continuous sign language recognition based on video have problems of semantic ambiguity and chaotic word order, a two-step method was proposed to convert the sign language text of the continuous sign language recognition result into a fluent and understandable Chinese text. In the first step, the natural sign language rules and N-gram language model (N-gram) were used to perform the text ordering of the continuous sign language recognition results. In the second step, a Bidirectional Long-Term Short-Term Memory (Bi-LSTM) network model was trained by using the Chinese universal quantifier dataset to solve the quantifier-free problem of the sign language grammar, so as to improve the fluency of texts. The absolute accuracy and the proportion of the longest correct subsequences were adopted as the evaluation indexes of text ordering. Experimental results showed that the text ordering results of the proposed method had the absolute accuracy of 77.06%, the proportion of the longest correct subsequences of 86.55%, and the accuracy of quantifier completion of 97.23%. The proposed method can effectively improve the smoothness and intelligibility of text results of continuous sign language recognition. It has been successfully applied to the video-based continuous sign language recognition, which improves the barrier-free communication experience between the hearing-impaired and the normal-hearing people.

Key words: continuous sign language recognition, N-gram language model, text ordering, Bidirectional Long-Term Short-Term Memory (Bi-LSTM) network, quantifier completion

中图分类号: