Description

Implementation of LZ77

The idea is to use a sliding window to compress and decompress. The window w contains the n last tokens you have (de)compressed. It is initialized with a default element (ini :: a). The size of the window is a parameter.

You have a sequence of tokens s :: [a] to compress. You seek for the longest proper prefix p of s that is present in (w ++ dropLast p). The recursive definition allows for multiple repetition of the same pattern. E.g. w = xxxxx102, s = 1021021023 -> p has length 9 and position (n-3)

The maximum length of p is parametric. p can't be equal to s, it is a *proper* prefix, meaning that at least the last token of s is not in p.

Each compression step returns a (Code pos len c :: Code a) where pos and len are respectively the position and length of the prefix in the window. "c" is the token just after the prefix in s. When len is 0, only one character of s is encoded.

The final result is a sequence of (Code a). It can be later transformed into a new sequence of [a] by encoding positions and lengths into "a" values (e.g. imagine "a" is a byte (Word8)).

The decompression algorithm is similar. It uses the same kind of window. It decompresses an input s :: [Code a] into a [a].

From the paper: "A Universal Algorithm for Sequential Data Compression" Ziv, Jacob; Lempel, Abraham (May 1977) IEEE Transactions on Information Theory 23 (3): 337–343

Synopsis

# Documentation

data Code a Source #

A code (prefix + single value)

Constructors

 Code FieldscodePosition :: IntPosition of the prefix in the windowcodeLength :: IntLength of the prefixcodeElem :: aValue after the prefix

Instances

 Show a => Show (Code a) Source # MethodsshowsPrec :: Int -> Code a -> ShowS #show :: Code a -> String #showList :: [Code a] -> ShowS #

compress :: Eq a => Int -> Int -> a -> [a] -> [Code a] Source #

Compress a sequence, using LZ77

ls is the maximal word length n is the buffer length ini is the value filling the buffer initially

decompress :: Int -> Int -> a -> [Code a] -> [a] Source #

Decompress a sequence, using LZ77

ls is the maximal word length n is the buffer length ini is the value filling the buffer initially