Serialization and deserialization are tedious tasks, often riddled with boiler-plate and bug-prone code. There's a plethora of tools, such as parser combinator libraries, which aim to assist us in some of these tasks, and it seems that new tools never stop popping up. This hints that there are probably no well known satisfactory solutions to these problems, which probably means that most of these tools are ad-hoc rather than principled high-quality solutions.
Are there no good solutions out there? Actually, there are!
Back when I was programming in Python, I have used Construct to great satisfaction. But for a long time now I have been using Haskell, and have found no equivalent to it, which I've actually wanted for more than 10 years! So perhaps I should start looking into making one? Hence this post.
In this post we'll introduce Construct's declarative approach and then discuss how to implement an equivalent principled solution in Haskell, based on optics.
Construct: declarative parsers and builders
Construct is declarative.
What do I mean by that? A declarative implementation of a parser should describe the format, not how to parse it!
From this description, the computer will automatically figure out how to parse. Not only that, it will also know how to build, and ideally even generate extra artifacts like documentation for the format!
Here's how Construct defines the format of IPv4 packets:
= Struct(
ipv4 "header" / BitStruct(
"version" / Const(4, Nibble),
"length" / ExprAdapter(Nibble,
= lambda obj, ctx: obj * 4,
decoder = lambda obj, ctx: obj // 4,
encoder
),
),
..."total_length" / Int16ub,
..."options" / Bytes(this.header.length - 20),
"body" / Bytes(this.total_length - this.header.length),
)
In this simple declaration we created a parser, builder, and the data type for IPv4 packets!
Also, I'd like to highlight the richness of the format. It isn't a
simple "struct" with fields of known types, but rather a
"dependently-typed" one, where the size of the options
field depends on the value of header.length
! Haskell parser
combinators libraries which support such dependencies tend to identify
as "monadic parser
combinators" libraries, in constrast to "applicative
parser combinators" which don't support them.
Parsers and builders as prisms
What is the essence of parsing and building?
data Codec myType encoded = Codec
build :: myType -> encoded
{ parse :: encoded -> Maybe myType
, }
(Caveat: one may desire meaningful parse errors using
Either ParseError
rather of Maybe
. We'll
ignore this issue for now)
Looking for a principled solution, one may notice that this type is
equivalent to Prism
from the popular lens
package!
‘There are only two hard things in Computer Science: cache invalidation and naming things.’ - Phil Karlton
Naming things is hard, and we want principled approaches and code
re-use, so we'll choose to use the existing Prism
rather
than make an equivalent new ad-hoc type. Hopefully this will also enable
enjoying the fruits of the existing lens
eco-system.
Let's demonstrate with a simplified IP-like protocol:
- Word8: The constant 7
- Word16: Body length
- Word16: Origin address
- Word16: Destination address
- Body-length bytes: Packet body
> simplifiedIp # ((2, 3), "Hello World!")
"\a\NUL\f\NUL\STX\NUL\ETXHello World!"
> "\a\NUL\f\NUL\STX\NUL\ETXHello World!" ^? simplifiedIp
Just ((2,3),"Hello World!")
simplifiedIp :: Prism' ByteString ((Word16, Word16), ByteString)
= takeSimplifiedIp . secondOnly ""
simplifiedIp
takeSimplifiedIp ::
Prism' ByteString (((Word16, Word16), ByteString), ByteString)
=
takeSimplifiedIp . firstOnly 7 . -- Remove the constant byte 7
_Cons . -- (bodyLen, rest)
takeWord16 . aside takeWord16) .
aside (takeWord16 -- (bodyLen, (origin, (dest, rest)))
-- Reordering (this is somewhat painful):
. -- (bodyLen, ((origin, dest), rest))
aside retuple . -- ((bodyLen, (origin, dest)), rest)
retuple . -- (((origin, dest), bodyLen), rest)
asideFirst swapped . -- ((origin, dest), (bodyLen, rest))
from retuple
. -- ((origin, dest), (body, remainder))
aside takeBytes -- (((origin, dest), body), remainder) retuple
This uses some combinators from Control.Lens
and some
extra combinators defined below:
Parse-build prism combinators
firstOnly :: Eq e => e -> Prism' (e, a) a
= asideFirst (only x) . iso snd ((,) ())
firstOnly x
secondOnly :: Eq e => e -> Prism' (a, e) a
= swapped . firstOnly x
secondOnly x
asideFirst :: APrism s t a b -> Prism (s, e) (t, e) (a, e) (b, e)
= swapped . aside l . swapped
asideFirst l
-- Tuple shuffling Iso
retuple ::
Iso
(a0, (a1, a2)) (b0, (b1, b2))
((a0, a1), a2) ((b0, b1), b2)=
retuple
iso-> ((w0, w1), r))
(\(w0, (w1, r)) -> (w0, (w1, r)))
(\((w0, w1), r)
takeWord16 :: Prism' ByteString (Word16, ByteString)
= _Cons . aside _Cons . retuple . asideFirst (from word16Bytes)
takeWord16
word16Bytes :: Iso' Word16 (Word8, Word8)
=
word16Bytes
iso%~ fromIntegral) . (`divMod` 256))
((both -> fromIntegral w1 * 256 + fromIntegral w0)
(\(w1, w0)
takeBytes ::
Integral a =>
Prism' (a, ByteString) (ByteString, ByteString)
=
takeBytes
prism'->
( \(x, y) fromIntegral (ByteString.length x), x <> y))
(->
(\(count, x) ByteString.splitAt (fromIntegral count) x <$
fromIntegral count <= ByteString.length x)) guard (
Conclusion
We've succesfully crafted prisms to parse and build our structure!
Now let's review some drawbacks of the presented approach:
- We used anonymous tuples where Construct uses named fields
- Using tuples isn't so type-safe nor is it descriptive
- Reordering our tuples to fit the combinators was painful
- To do what Construct does, we would need structural records, where we can add fields to the structure one at a time (note that this may be possible using a package such as vinyl?)
- Our parsers don't perform error reporting
- When writing parsers for programming languages this is quite crucial! (Imagine getting a syntax error with the compiler not even pointing out what line it's at)
I believe that both of these problems can be solved, resulting in a powerful and ergonomic principled solution. In my next post I'll describe an approach to add error reporting to our prisms.
Notes:
- AJ. Kennedy's "Pickler Combinators" paper (JFP 2004) described parsing+building combinators in Haskell using a custom data type (not using prisms) and demonstrated some more useful combinators for parsing
- Apparently T. Filiba, the author of Construct, has attempted re-implementing it in Haskell in 2014, but gave up
- Image credit: Kelvinsong [CC0]
Discussion: