Multiple-compression of DNA sequence data

Journal of Computer Applications ›› 2010, Vol. 30 ›› Issue (05): 1379-1382.

• Typical applications • Previous Articles Next Articles

Multiple-compression of DNA sequence data

Received:2009-10-30 Revised:2009-12-24 Online:2010-05-04 Published:2010-05-01

多重压缩DNA序列数据

张丽霞¹,宋鸿陟²

1. 华南农业大学信息学院
2.

通讯作者: 张丽霞
基金资助:
国家自然科学基金资助项目;教育部留学回国人员科研启动基金第31批次

Abstract

Abstract: The multiple-compression concept was proposed based on the characteristics of DNA sequence data. The first step of multiple-compression is to enlarge the alphabet. Firstly, DNA sequence data was encoded by 0/1 coding, and then every eight bits was transformed into an ASCII character. Through this step, the former DNA sequence data with only four characters would be enlarged to 256 characters. Secondly, the enlarged DNA sequence data was compressed for the second time with Huffman coding compression algorithm based on statistical model and Burrows-Wheeler coding algorithm based on transition model separately. Finally, the compression performances of all algorithms were compared. The results show that the multiple-compression algorithm is better than others in terms of compression ratio.

Key words: DNA sequence data, multiple-compression, Huffman coding, Burrows-Wheeler coding

摘要： 根据DNA序列数据的特点，提出对DNA序列数据进行多重压缩的思想。多重压缩的首要步骤是扩展字母表。首先对DNA序列数据进行0/1编码，然后每8位转换成一个ASCII码字符，将原来的DNA序列数据仅含有的4个字符扩展到256个字符。第二步采取基于统计模型的Huffman编码压缩算法和基于转换模型的Burrows-Wheeler算法，对扩展后的DNA序列数据进行二次压缩。最后对各种算法的压缩结果进行性能分析比较，比较结果显示，多重压缩算法有较优的压缩比。

关键词: DNA序列数据, 多重压缩, Huffman编码, Burrows-Wheeler编码

张丽霞宋鸿陟. 多重压缩DNA序列数据[J]. 计算机应用, 2010, 30(05): 1379-1382.

[1]	WEI Wei, DUAN Xiaodong, LIU Yongkui, GUO Chen. A new compressed vertex chain code [J]. Journal of Computer Applications, 2017, 37(6): 1747-1752.
[2]	CHEN Meng, YU Xiaohui, LIU Yang. Mining mobility patterns based on deep representation model [J]. Journal of Computer Applications, 2016, 36(1): 33-38.
[3]	XIANG Tao WANG An. Secure LZW coding algorithm and its application in GIF image encryption [J]. Journal of Computer Applications, 2012, 32(12): 3462-3465.
[4]	DUAN Chong-wen, HOU Chen-ping. Compressing method of texts via bivalue and sub-sampling [J]. Journal of Computer Applications, 2005, 25(01): 93-95.

Multiple-compression of DNA sequence data

多重压缩DNA序列数据

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 4

Recommended Articles

Metrics