Following my previous post which
suggested the use of Prism
s for parsing and building, using
a binary format example - I also want to show how the same idea can work
nicely for parsing and building programming language syntax.
Simple AST example
data Expr
= Lit Int
| Add Expr Expr
| Mul Expr Expr
deriving (Show, Eq, Generic)
'Expr makePrisms '
Here's our Prism
for parsing and building the above
AST:
> expr # Mul (Add (Lit 1) (Lit 2)) (Lit 3)
"(1 + 2) * 3"
expr :: Prism' String Expr
=
expr . -- convert string to tokens
tokens . -- take the expression
takeExpr -- and there should be no remaining tokens
secondOnly []
takeExpr :: Prism' [String] (Expr, [String])
=
takeExpr "+" _Add $ -- Additions of
infixOpLeftRecursion "*" _Mul $ -- multiplications of
infixOpLeftRecursion -- literals or
tryMatch (asideFirst _Lit) . asideFirst _Show) $
(_Cons . firstOnly "(" . -- expressions in parentheses
_Cons . aside (_Cons . firstOnly ")") takeExpr
This uses the following combinators:
_Cons
,_Show
, andaside
fromControl.Lens
firstOnly
,secondOnly
, andasideFirst
from the previous posttokens
,infixOpLeftRecursion
, andtryMatch
are defined in the appendix at the bottom
Observations
In the previous post, Prism
s didn't match up to Python's
Construct
in encoding binary protocols, where Construct made good use of
structural duck types (though this appears solvable with some effort).
However, for programming language syntax Prism
s seem very
elegant imho.
Note how we harness optics' parametricity and composition. In the
previous post we parsed ByteString
s but here we parse
String
and we start by converting them to tokens (ie
[String]
) and parse that.
Renegade prisms
Unlike the previous post's lawful Prism
s, this post's
parsing is lossy, so it breaks the Traversal
laws:
> "1 + (2*3)" & expr %~ id
"1 + 2 * 3"
If one desires lawful parsing Prisms, their AST representation has to represent white-space and redundant parentheses.
A Prism
law that is kept is that if you parse what you
built you do get it back:
import Test.QuickCheck.Arbitrary.ADT
= (expr # e) ^? expr == Just e
propParseBack e
instance Arbitrary Expr where
= genericArbitrary
arbitrary = genericShrink
shrink
> quickCheck propParseBack
+++ OK, passed 100 tests.
Caveat: meaninful parse errors
When parsing with this Prism
fails, it offers no useful
error-reporting. But do I believe that this is solvable and I'll address
it in future posts.
Request for feedback
- Do you think that some extra combinators used here
(
asideFirst
,firstOnly
, etc) should belong inlens
? - Or prehaps these combinators should belong in a separate package? How would you call it?
- Any suggestions as for naming these combinators? Other code improvements?
- Image credit: Does anyone know who is the artist for the opening image? (I found it on the internets)
Btw: Thanks to Eyal Lotem for reading drafts of this.
Discussion:
Appendix
AST parse-build prism combinators
-- Extend a base parsing prism with applications of an operator
infixOpLeftRecursion ::
Eq a =>
-> -- The operator's text
a Prism' expr (expr, expr) -> -- The operator constructor's prism
Prism' [a] (expr, [a]) -> -- The base parsing prism
Prism' [a] (expr, [a])
=
infixOpLeftRecursion operatorText cons sub
leftRecursion cons. firstOnly operatorText . sub) . retuple)
(aside (_Cons
sub
-- Extend a base parsing prism with extensions to its right side
leftRecursion ::
Prism' whole cons ->
Prism' (whole, state) (cons, state) ->
Prism' state (whole, state) ->
Prism' state (whole, state)
=
leftRecursion cons extend base fmap parseExtends . (^? base))
prism' build (where
=
build (x, state) maybe
# (x, state))
(base . (extend #) . (, state)) (x ^? cons)
(build =
parseExtends x ^? extend <&> _1 %~ (cons #) & maybe x parseExtends
x
-- Add an encoding for a sum-type constructor to an existing prism
tryMatch ::
Prism' whole cons -> -- The sum-type constructor prism
Prism' src cons -> -- Parse the constructor contents
Prism' src whole -> -- Prism to encode the other options
Prism' src whole
=
tryMatch cons parse fallback -> (x ^? parse <&> (cons #)) <|> x ^? fallback)
prism' build (\x where
= maybe (fallback # x) (parse #) (x ^? cons)
build x
-- Transform a string into tokens
tokens :: Iso' String [String]
=
tokens foldr addToken "")
iso splitTokens (where
"" = x
addToken x
addToken [x] y| Char.generalCategory x == Char.OpenPunctuation = x : y
:ys)
addToken x (y| Char.generalCategory y == Char.ClosePunctuation = x <> (y:ys)
= x <> " " <> y
addToken x y =
isOp `elem` [Char.MathSymbol, Char.OtherPunctuation]) .
(Char.generalCategory
= (`elem` "()[]{}")
isParen "" = []
splitTokens :s:xs) | Char.isSpace s = [x] : splitTokens xs
splitTokens (x:xs) | Char.isSpace s = splitTokens xs
splitTokens (s:xs) | isParen x = [x] : splitTokens xs
splitTokens (x:xs) =
splitTokens (xcase splitTokens xs of
-> [[x]]
[] :ys) : zs) | not (isParen y) && isOp x == isOp y -> (x:y:ys) : zs
((y-> [x] : ys ys
- The
retuple
Iso
was defined in the previous post tryMatch
takes two prisms from the unparsed source and from the resulting structure to the matched pattern. If there were optics for inversed prisms and partial isomorphisms then these could be combined into one argument and the existingfailing
combinator could replacetryMatch
.