## Overview

I've always had an interest in compression technology so I decided to take a crack at my own implementation of a Burrows-Wheeler Transform (BWT) based compressor. You can download it here.

### Algorithm

There are 4 basic steps used in most BWT compressors (including my own):
1. Burrows-Wheeler Transform

2. Move-to-front enconding

3. Run-length encoding

4. Arithmetic coding

These are all relatively simple operations and are explained extremely well on the following pages:

STL's Bwtzip page.

Charles bloom's page.

Data compression info.

## Implementation

The implemenation uses C++ and STL quite extensively, and currently needs a bit of a redesign to allow better handling of large files. However, the code is quite clean and should serve as a good example of how to go about implementing the algorithms.

### Results

These are the results of applying the compressor to the Calgary corpus.bib : 111261 -> 27955 = bpc: 2.010 book1 : 768771 -> 239417 = bpc: 2.491 book2 : 610856 -> 163913 = bpc: 2.147 geo : 102400 -> 60598 = bpc: 4.734 news : 377109 -> 120993 = bpc: 2.567 obj1 : 21504 -> 10870 = bpc: 4.044 obj2 : 246814 -> 80128 = bpc: 2.597 paper1 : 53161 -> 16833 = bpc: 2.533 paper2 : 82199 -> 25289 = bpc: 2.461 pic : 513216 -> 48700 = bpc: 0.759 progc : 39611 -> 12853 = bpc: 2.596 progl : 71646 -> 16090 = bpc: 1.797 progp : 49379 -> 11140 = bpc: 1.805 trans : 93695 -> 18867 = bpc: 1.611 Took 32.026 seconds Input size: 3141622 Total compressed size: 853646 Avg bpc = 2.174These results compare quite well to most compressors, bzip is at 2.09bpc and gzip is 2.59bpc. With further improvements (especially to the frequency modelling) excellent results could be achieved. Also, an implementation of distance coding is planned for the somewhat near future, which should help the compression ratio significantly.