1. Binary modules

haskus-binary is a package containing a set of modules dedicated to the manipulation of binary data. It provides data type mapping those of other languages such as C and even more.

All these modules are in Haskus.Format.Binary.

We don’t rely on external tools such as C2HS to provide bindings to C libraries. There are several reasons for that:

  • We don’t want to depend on .h files;

  • .h files often contain pecularities that are difficult to handle automatically;

  • Documentation and code of the resulting Haskell files are often very bad:

    • No haddock
    • Very low-level (e.g. #define are not transformed into datatypes with Enum instances)

Instead haskus-binary lets you write bindings in pure Haskell code and provides many useful things to make this process easy.

1.1. Word, Int

The Word module contains data types representing unsigned words (Word8, Word16, Word32, etc.) and signed integers (Int8, Int16, Int32, etc.). It also contains some C types such as CSize, CShort, CUShort, CLong, CULong, etc.

1.1.1. Endianness

Words and Ints are stored (i.e., read and written) using host endianness (byte ordering). AsBigEndian and AsLittleEndian data types in the Endianness module allow you to force a different endianness.

The following example shows a data type containing a field for each endianness variant. We explain how to use this kind of data type as a C structure later in this document.

data Dummy = Dummy
   { fieldX :: Word32                -- ^ 32-byte unsigned word (host endianness)
   , fieldY :: AsBigEndian Word32    -- ^ 32-byte unsigned word (big-endian)
   , fieldZ :: AsLittleEndian Word32 -- ^ 32-byte unsigned word (little-endian)
   } deriving (Generic,Storable)

We can also explicitly change the endianness with the following methods: hostToBigEndian, hostToLittleEndian, bigEndianToHost, littleEndianToHost, reverseBytes.

Each of these methods is either equivalent to id or to reverseBytes depending on the host endianness.

1.2. Bits

The Bits module allows you to perform bitwise operations on data types supporting them.

1.3. Buffer

A Buffer is basically a strict ByteString with a better name and a better integration with Storable type class.

1.4. Structures

You map C data structures with Haskell data type as follows:

{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE DeriveGeneric #-}

import Haskus.Format.Binary.Storable
import Haskus.Utils.Types.Generics (Generic)

data StructX = StructX
   { xField0 :: Word8
   , xField1 :: Word64
   } deriving (Show,Generic,Storable)

The Storable instance handles the alignment of the field as a C non-packed structure would (i.e. there are 7 padding bytes between xField0 and xField1).

peek and poke can be used to read and write the data structure in memory.

1.4.1. Nesting

Data structures can be nested:

data StructY = StructY
   { yField0 :: StructX
   , yField1 :: Word64
   } deriving (Show,Generic,Storable)

1.4.2. Arrays (or Vectors)

haskus-binary supports vectors: a fixed amount of Storable data correctly aligned. You can define a vector as follows:

{-# LANGUAGE DataKinds #-}

import Haskus.Format.Binary.Vector as V

v :: Vector 5 Word16

Vectors are storable, so you can peek and poke them from memory. Alternatively, you can create them from a list:

Just v = fromList [1,2,3,4,5]
Just v = fromList [1,2,3,4,5,6] -- this fails dynamically
Just v = fromList [1,2,3,4]     -- this fails dynamically

-- take at most 5 elements then fill with 0: v = [1,2,3,4,5]
v = fromFilledList 0 [1,2,3,4,5,6]

-- take at most 5 elements then fill with 7: v = [1,2,3,7,7]
v = fromFilledList 7 [1,2,3]

-- take at most 4 (!) elements then fill with 0: v = [1,2,3,0,0]
v = fromFilledListZ 0 [1,2,3]

-- useful for zero-terminal strings: s = "too long \NUL"
s :: Vector 10 CChar
s = fromFilledListZ 0 (fmap castCharToCChar "too long string")

You can concatenate several vectors into a single one:

import Haskus.Utils.HList

x = fromFilledList 0 [1,2,3,4] :: Vector 4 Int
y = fromFilledList 0 [5,6]     :: Vector 2 Int
z = fromFilledList 0 [7,8,9]   :: Vector 3 Int

v = V.concat (x `HCons` y `HCons` z `HCons` HNil)

>:t v
v :: Vector 9 Int

> v
fromList [1,2,3,4,5,6,7,8,9]

You can also safely drop or take elements in a vector. You can also index into a vector:

import Haskus.Format.Binary.Vector as V

v :: Vector 5 Int
v = fromFilledList 0 [1,2,3,4,5,6]

-- v2 = [1,2]
v2 = V.take @2 v

-- won't compile (8 > 5)
v2 = V.take @8 v

-- v2 = [3,4,5]
v2 = V.drop @2 v

-- x = 3
x = V.index @2 v

Finally, you can obtain a list of the values

> V.toList v
[1,2,3,4,5]

1.4.3. Enums

If you have a C enum (or a set of #define’s) with consecutive values and starting from 0, you can do:

{-# LANGUAGE DeriveAnyClass #-}

import Haskus.Format.Binary.Enum

data MyEnum
   = MyEnumX
   | MyEnumY
   | MyEnumZ
   deriving (Show,Eq,Enum,CEnum)

If the values are not consecutive or don’t start from 0, you can write your own CEnum instance:

-- Add 1 to the enum number to get the valid value
instance CEnum MyEnum where
   fromCEnum = (+1) . fromIntegral . fromEnum
   toCEnum   = toEnum . (\x -> x-1) . fromIntegral

To use an Enum as a field in a structure, use EnumField:

data StructZ = StructZ
   { zField0 :: StructX
   , zField1 :: EnumField Word32 MyEnum
   } deriving (Show,Generic,Storable)

The first type parameter of EnumField indicates the backing word type (i.e. the size of the field in the structure). For instance, you can use Word8, Word16, Word32 and Word64.

To create or extract an EnumField, use the methods:

fromEnumField :: CEnum a => EnumField b a -> a
toEnumField   :: CEnum a => a -> EnumField b a

We use a CEnum class that is very similar to Enum because Enum is a special class that has access to data constructor tags. If we redefine Enum, we cannot use fromEnum to get the data constructor tag.

1.4.4. Bit sets (or “flags”)

We often use flags that are combined in a single word. Each flag is associated to a bit of the word: if the bit is set the flag is active, otherwise the flag isn’t active.

haskus-binary uses the CBitSet class to get the bit offset of each flag. By default, it uses the Enum instance to get the bit offsets as in the following example:

{-# LANGUAGE DeriveAnyClass #-}

import Haskus.Format.Binary.BitSet

data Flag
   = FlagX  -- bit 0
   | FlagY  -- bit 1
   | FlagZ  -- bit 2
   deriving (Show,Eq,Enum,CBitSet)

If you want to use different bit offsets, you can define your own CBitSet instance:

-- Add 1 to the enum number to get the valid bit offset
instance CBitSet Flag where
   toBitOffset   = (+1) . fromEnum
   fromBitOffset = toEnum . (\x -> x-1)

To use a bit set as a field in a structure, use BitSet:

data StructZ = StructZ
   { zField0 :: ...
   , zField1 :: BitSet Word32 Flag
   } deriving (Show,Generic,Storable)

The first type parameter of BitSet indicates the backing word type (i.e. the size of the field in the structure). For instance, you can use Word8, Word16, Word32 and Word64.

Use the following methods to manipulate the BitSet:

fromBits     :: (CBitSet a, FiniteBits b) => b -> BitSet b a
toBits       :: (CBitSet a, FiniteBits b) => BitSet b a -> b
member       :: (CBitSet a, FiniteBits b) => BitSet b a -> a -> Bool
notMember    :: (CBitSet a, FiniteBits b) => BitSet b a -> a -> Bool
toList       :: (CBitSet a, FiniteBits b) => BitSet b a -> [a]
fromList     :: (CBitSet a, FiniteBits b, Foldable m) => m a -> BitSet b a
intersection :: FiniteBits b => BitSet b a -> BitSet b a -> BitSet b a
union        :: FiniteBits b => BitSet b a -> BitSet b a -> BitSet b a

Note that we don’t check if bit offsets are outside of the backing word. You have to choose a backing word that is large enough.

1.4.5. Unions

An union provides several ways to access the same buffer of memory. To use them with haskus-binary, you need to give the list of available representations in a type as follows:

{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE DataKinds #-}

import Haskus.Format.Binary.Union

u :: Union '[Word8, Word64, Vector 5 Word16]

Unions are storable so you can use them as fields in storable structures or you can directly peek/poke them.

You can retrieve a member of the union with fromUnion. The extracted type must be a member of the union otherwise it won’t compile.

fromUnion u :: Word64
fromUnion u :: Word8
fromUnion u :: Vector 5 Word16
fromUnion u :: Word32 -- won't compile!

To create a new union from one of its member, use toUnion or toUnionZero. The latter sets the remaining bytes of the buffer to 0. In the example, the union uses 10 bytes (5 * 2 for Vector 5 Word16) and we write 8 bytes (sizeOf Word64) hence there are two bytes that can be left uninitialized (toUnion) or set to 0 (toUnionZero).

u :: Union '[Word8,Word64,Vector 5 Word16]
u = toUnion (0x1122334455667788 :: Word64)

> print (fromUnion u :: Vector 5 Word16)
fromList [30600,21862,13124,4386,49850]

-- or
u = toUnionZero (0x1122334455667788 :: Word64)
> print (fromUnion u :: Vector 5 Word16)
fromList [30600,21862,13124,4386,0]

1.4.6. Bit fields

You may need to define bit fields over words. For instance, you can have a Word16 split into 3 fields X, Y and Z composed of 5, 9 and 2 bits respectively.

  X Y Z
w :: Word16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

You define it as follows:

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeApplications #-}

import Haskus.Format.Binary.BitField

w :: BitFields Word16 '[ BitField 5 "X" Word8
                       , BitField 9 "Y" Word16
                       , BitField 2 "Z" Word8
                       ]
w = BitFields 0x0102

Note that each field has its own associated type (e.g. Word8 for X and Z) that must be large enough to hold the number of bits for the field.

Operations on BitFields expect that the cumulated size of the fields is equal to the whole word size: use a padding field if necessary.

You can extract and update the value of a field by its name:

x = extractField @"X" w
z = extractField @"Z" w
w' = updateField @"Y" 0x100 w
-- w' = 0x402

z = extractField @"XXX" w -- won't compile

w'' = withField @"Y" (+2) w

Fields can also be ‘BitSet’ or ‘EnumField’:

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE DeriveAnyClass #-}

import Haskus.Format.Binary.BitField
import Haskus.Format.Binary.Enum
import Haskus.Format.Binary.BitSet

data A = A0 | A1 | A2 | A3 deriving (Show,Enum,CEnum)

data B = B0 | B1 deriving (Show,Enum,CBitSet)

w :: BitFields Word16 '[ BitField 5 "X" (EnumField Word8 A)
                       , BitField 9 "Y" Word16
                       , BitField 2 "Z" (BitSet Word8 B)
                       ]
w = BitFields 0x1503

BitFields are storable and can be used in storable structures.

You can easily pattern-match on all the fields at the same time with matchFields and matchNamedFields. It creates a tuple containing one value (and its name with matchNamedFields) per field.

> matchFields w
(EnumField A2,320,fromList [B0,B1])

> matchNamedFields  w
(("X",EnumField A2),("Y",320),("Z",fromList [B0,B1]))