计算机应用

• 人工智能与先进计算(Artificial intelligen • 上一篇    下一篇

基于互联网和self-training的中文问答模式学习

李志圣 孙越恒 何丕廉 候越先   

  1. 天津大学 计算机科学与技术学院 天津大学 计算机科学与技术学院 天津大学 计算机科学与技术学院 天津大学 计算机科学与技术学院
  • 收稿日期:2007-12-14 修回日期:1900-01-01 发布日期:2008-06-01 出版日期:2008-06-01
  • 通讯作者: 李志圣

Chinese question answering pattern learning based on self-training mechanism and Web

Zhi-sheng LI Yue-heng SUN Pi-lian HE Yue-xian HOU   

  • Received:2007-12-14 Revised:1900-01-01 Online:2008-06-01 Published:2008-06-01
  • Contact: Zhi-sheng LI

摘要: 在已有的问答模式学习中,模式定义和候选答案评分偏于简单,而且学习过程依赖于人工标定语料。通过挖掘Web文本中动、名词序列的骨架模式,用以扩充模式定义;将self-training学习机制引入问答模式学习:用一对训练语料进行初始学习,通过互联网搜索,自动选择可靠程度较高的问答对,重新训练;扩充了启发规则,改进候选答案的评分方法。实验结果表明:所提出的问答模式学习方法能有效地提高中文问答系统的性能。

关键词: 互联网, 问答模式, self-training, 机器学习

Abstract: In the past, the learning for QA pattern relies on the labeled data, and the definition of pattern and the scoring method for the candidate answers are over simplified. The verb and noun sequence was extracted as the skeleton pattern to expand definition of QA pattern. In the learning process, a learning mechanism was established based on self-training. At first, the initial study was completed on a labeled QA pair, then the system would automatically select the reliable data for self training through searching in the Web while the system was running. The scoring method of the candidate answers was also improved by applying several heuristic rules. The experimental results show that the performance of Chinese QA system based on our method is improved significantly.

Key words: Web, QA pattern, self-training, machine learning