Overview
I've always had an interest in compression technology so I decided to take a crack at my own implementation of a Burrows-Wheeler Transform (BWT) based compressor. You can download it here.
Algorithm
There are 4 basic steps used in most BWT compressors (including my own):
1. Burrows-Wheeler Transform
2. Move-to-front enconding
3. Run-length encoding
4. Arithmetic coding
These are all relatively simple operations and are explained extremely well on the following pages:
STL's Bwtzip page.
Charles bloom's page.
Data compression info.
Implementation
The implemenation uses C++ and STL quite extensively, and currently needs a bit of a redesign to allow better handling of large files. However, the code is quite clean and should serve as a good example of how to go about implementing the algorithms.
Results
These are the results of applying the compressor to the Calgary corpus.