Linzip - Compression Programming

Overview

I've always had an interest in compression technology so I decided to take a crack at my own implementation of a Burrows-Wheeler Transform (BWT) based compressor. You can download it here.

Algorithm

There are 4 basic steps used in most BWT compressors (including my own):

1. Burrows-Wheeler Transform
2. Move-to-front enconding
3. Run-length encoding
4. Arithmetic coding

These are all relatively simple operations and are explained extremely well on the following pages:

STL's Bwtzip page.
Charles bloom's page.
Data compression info.

Implementation

The implemenation uses C++ and STL quite extensively, and currently needs a bit of a redesign to allow better handling of large files. However, the code is quite clean and should serve as a good example of how to go about implementing the algorithms.

Results

These are the results of applying the compressor to the Calgary corpus.
bib     :      111261 ->      27955 = bpc: 2.010
book1   :      768771 ->     239417 = bpc: 2.491
book2   :      610856 ->     163913 = bpc: 2.147
geo     :      102400 ->      60598 = bpc: 4.734
news    :      377109 ->     120993 = bpc: 2.567
obj1    :       21504 ->      10870 = bpc: 4.044
obj2    :      246814 ->      80128 = bpc: 2.597
paper1  :       53161 ->      16833 = bpc: 2.533
paper2  :       82199 ->      25289 = bpc: 2.461
pic     :      513216 ->      48700 = bpc: 0.759
progc   :       39611 ->      12853 = bpc: 2.596
progl   :       71646 ->      16090 = bpc: 1.797
progp   :       49379 ->      11140 = bpc: 1.805
trans   :       93695 ->      18867 = bpc: 1.611
Took 32.026 seconds
Input size: 3141622
Total compressed size: 853646
Avg bpc = 2.174
These results compare quite well to most compressors, bzip is at 2.09bpc and gzip is 2.59bpc. With further improvements (especially to the frequency modelling) excellent results could be achieved. Also, an implementation of distance coding is planned for the somewhat near future, which should help the compression ratio significantly.

Back to Projects page

Back to projects.