Journal of Computer Applications ›› 2010, Vol. 30 ›› Issue (05): 1379-1382.
• Typical applications • Previous Articles Next Articles
Received:
Revised:
Online:
Published:
张丽霞1,宋鸿陟2
通讯作者:
基金资助:
Abstract: The multiple-compression concept was proposed based on the characteristics of DNA sequence data. The first step of multiple-compression is to enlarge the alphabet. Firstly, DNA sequence data was encoded by 0/1 coding, and then every eight bits was transformed into an ASCII character. Through this step, the former DNA sequence data with only four characters would be enlarged to 256 characters. Secondly, the enlarged DNA sequence data was compressed for the second time with Huffman coding compression algorithm based on statistical model and Burrows-Wheeler coding algorithm based on transition model separately. Finally, the compression performances of all algorithms were compared. The results show that the multiple-compression algorithm is better than others in terms of compression ratio.
Key words: DNA sequence data, multiple-compression, Huffman coding, Burrows-Wheeler coding
摘要: 根据DNA序列数据的特点,提出对DNA序列数据进行多重压缩的思想。多重压缩的首要步骤是扩展字母表。首先对DNA序列数据进行0/1编码,然后每8位转换成一个ASCII码字符,将原来的DNA序列数据仅含有的4个字符扩展到256个字符。第二步采取基于统计模型的Huffman编码压缩算法和基于转换模型的Burrows-Wheeler算法,对扩展后的DNA序列数据进行二次压缩。最后对各种算法的压缩结果进行性能分析比较,比较结果显示,多重压缩算法有较优的压缩比。
关键词: DNA序列数据, 多重压缩, Huffman编码, Burrows-Wheeler编码
张丽霞 宋鸿陟. 多重压缩DNA序列数据[J]. 计算机应用, 2010, 30(05): 1379-1382.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/
https://www.joca.cn/EN/Y2010/V30/I05/1379