计算机应用 ›› 2013, Vol. 33 ›› Issue (08): 2375-2378.

• 典型应用 • 上一篇    下一篇

基于Trie树的相似字符串查找算法

刘丽霞1,2,张志强2   

  1. 1. 哈尔滨工程大学 计算机科学与技术学院,哈尔滨 150001
    2. 闽南理工学院 信息管理系,福建 石狮 362700;
  • 收稿日期:2013-03-05 修回日期:2013-04-18 出版日期:2013-08-01 发布日期:2013-09-11
  • 通讯作者: 刘丽霞
  • 作者简介:刘丽霞(1986-),女,黑龙江桦南人,助教,硕士,主要研究方向:信息检索;
    张志强(1973-),男,河北雄县人,教授,博士,CCF高级会员,主要研究方向:信息检索、智能信息处理、数据库。
  • 基金资助:
    国家自然科学基金资助项目

Similar string search algorithm based on Trie tree

Li-Xia LIU1,2,ZHANG Zhiqiang2   

  1. 1. College of Computer Science and Technology, Harbin Engineering University, Harbin Heilongjiang 150001, China
    2. Department of Information and Management, Minnan University of Science and Technology, Shishi Fujian 362700,China
  • Received:2013-03-05 Revised:2013-04-18 Online:2013-09-11 Published:2013-08-01
  • Contact: Li-Xia LIU

摘要: 基于Trie树的相似字符串查找算法是利用编辑距离的阈值来计算每个节点的活跃节点集,已有算法由于存在大量的冗余计算,导致时间复杂度和空间复杂度都比较高。针对这个问题,采用了基于活跃节点的对称性和动态规划算法的思想对已有算法进行改进,并对活跃节点集进行了修剪,提出了New-Trie-Stack算法。该算法避免了活跃节点的重复计算,以及已有算法在保存所有已遍历节点的活跃节点集时的空间开销。实验结果表明New-Trie-Stack算法在时间复杂度和空间复杂度上都有明显的下降。

关键词: Trie树, 相似字符串, 编辑距离, 活跃节点, 动态规划

Abstract: Similar string search algorithms based on Trie tree need to compute active-node set of a node by editing distance threshold. A large number of redundant computation leads to a high time and space complexity. A new algorithm named New-Trie-Stack was proposed, which utilized the symmetrical properties of active-node set and the dynamic programming method to improve the performance. It could avoid the redundancy cost on active-node set computing and storing; moreover, active-node sets were pruned. The experimental results show that New-Trie-Stack algorithm has lower time complexity and space complexity.

Key words: Trie tree, similar string, edit distance, active-node, dynamic programming

中图分类号: