基于Trie树的相似字符串查找算法

计算机应用 ›› 2013, Vol. 33 ›› Issue (08): 2375-2378.

基于Trie树的相似字符串查找算法

刘丽霞¹,²,张志强²

1. 哈尔滨工程大学计算机科学与技术学院,哈尔滨 150001
2. 闽南理工学院信息管理系,福建石狮 362700;

收稿日期:2013-03-05 修回日期:2013-04-18 出版日期:2013-08-01 发布日期:2013-09-11
通讯作者: 刘丽霞
作者简介:刘丽霞(1986-)，女，黑龙江桦南人，助教，硕士，主要研究方向：信息检索;
张志强(1973-)，男，河北雄县人，教授，博士，CCF高级会员，主要研究方向：信息检索、智能信息处理、数据库。
基金资助:
国家自然科学基金资助项目

Similar string search algorithm based on Trie tree

Li-Xia LIU¹,²,ZHANG Zhiqiang²

1. College of Computer Science and Technology, Harbin Engineering University, Harbin Heilongjiang 150001, China
2. Department of Information and Management, Minnan University of Science and Technology, Shishi Fujian 362700,China

Received:2013-03-05 Revised:2013-04-18 Online:2013-09-11 Published:2013-08-01
Contact: Li-Xia LIU

摘要/Abstract

摘要： 基于Trie树的相似字符串查找算法是利用编辑距离的阈值来计算每个节点的活跃节点集，已有算法由于存在大量的冗余计算，导致时间复杂度和空间复杂度都比较高。针对这个问题，采用了基于活跃节点的对称性和动态规划算法的思想对已有算法进行改进，并对活跃节点集进行了修剪，提出了New-Trie-Stack算法。该算法避免了活跃节点的重复计算，以及已有算法在保存所有已遍历节点的活跃节点集时的空间开销。实验结果表明New-Trie-Stack算法在时间复杂度和空间复杂度上都有明显的下降。

关键词: Trie树, 相似字符串, 编辑距离, 活跃节点, 动态规划

Abstract: Similar string search algorithms based on Trie tree need to compute active-node set of a node by editing distance threshold. A large number of redundant computation leads to a high time and space complexity. A new algorithm named New-Trie-Stack was proposed, which utilized the symmetrical properties of active-node set and the dynamic programming method to improve the performance. It could avoid the redundancy cost on active-node set computing and storing; moreover, active-node sets were pruned. The experimental results show that New-Trie-Stack algorithm has lower time complexity and space complexity.

Key words: Trie tree, similar string, edit distance, active-node, dynamic programming

中图分类号:

TP391.3

刘丽霞张志强. 基于Trie树的相似字符串查找算法[J]. 计算机应用, 2013, 33(08): 2375-2378.

Li-Xia LIU ZHANG Zhiqiang. Similar string search algorithm based on Trie tree[J]. Journal of Computer Applications, 2013, 33(08): 2375-2378.

[1]	姜琨, 刘征, 朱磊, 李晓星. 基于有向无环图的倒排链等字长划分压缩算法[J]. 计算机应用, 2021, 41(3): 727-732.
[2]	効琦, 尹增山, 高爽. 基于检测与跟踪相互迭代的极暗弱目标搜索算法[J]. 计算机应用, 2021, 41(10): 3017-3024.
[3]	朱怡, 宁振虎, 周艺华. 基于视觉特征的仿冒域名轻量级检测技术[J]. 计算机应用, 2020, 40(8): 2279-2285.
[4]	曾志阳, 陈燕, 王珂. 圆片下料并行遗传算法的设计与实现[J]. 计算机应用, 2020, 40(2): 392-397.
[5]	李昆明, 王超迁, 倪巍伟, 鲍晓涵. 基于差分隐私的高精度直方图发布方法[J]. 计算机应用, 2020, 40(11): 3242-3248.
[6]	史文旭, 杨洋, 鲍胜利. 贪心核加速动态规划算法求解折扣{0-1}背包问题[J]. 计算机应用, 2019, 39(7): 1912-1917.
[7]	杨世强, 罗晓宇, 乔丹, 柳培蕾, 李德信. 基于滑动窗口和动态规划的连续动作分割与识别[J]. 计算机应用, 2019, 39(2): 348-353.
[8]	赵宏, 常兆斌, 王乐. 基于词法特征的恶意域名快速检测算法[J]. 计算机应用, 2019, 39(1): 227-231.
[9]	刘莉, 詹恩奇, 郑建彬, 汪阳. 基于曲线分段相似匹配的在线签名认证[J]. 计算机应用, 2018, 38(4): 1046-1050.
[10]	徐林, 范昕炜. 基于改进遗传算法的餐厅服务机器人路径规划[J]. 计算机应用, 2017, 37(7): 1967-1971.
[11]	刘院英, 郭景峰, 魏立东, 胡心专. 成本控制下的快速影响最大化算法[J]. 计算机应用, 2017, 37(2): 367-372.
[12]	汤海建, 鲍宇, 闵玄, 罗煜璇, 邹宇驰. 无线传感器网络中载体的自主移动策略[J]. 计算机应用, 2016, 36(2): 478-482.
[13]	彭雅丽, 徐虹, 尹红, 章志明. 周期性移动公交车载网络路由协议[J]. 计算机应用, 2015, 35(2): 313-316.
[14]	潘雄, 江维, 文亮, 周可染, 董琪, 王峻龙. 面向可信嵌入式系统的随机实时任务能耗优化[J]. 计算机应用, 2015, 35(12): 3515-3519.
[15]	张久杰, 王春晖, 张丽萍, 侯敏, 刘东升. 基于Token编辑距离检测克隆代码[J]. 计算机应用, 2015, 35(12): 3536-3543.