17.2: Composite Grammars with PetitParser
- Page ID
- 43765
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)In the previous section we saw the basic principles of PetitParser and gave some introductory examples. In this section we are going to present a way to define more complicated grammars. We continue where we left off with the arithmetic expression grammar.
Writing parsers as a script as we did previously can be cumbersome, especially when grammar productions are mutually recursive and refer to each other in complicated ways. Furthermore a grammar specified in a single script makes it unnecessary hard to reuse specific parts of that grammar. Luckily there is PPCompositeParser
to the rescue.
Defining the grammar
As an example let’s create a composite parser using the same expression grammar we built in the last section but this time we define it inside a class subclass of PPCompositeParser
.
Code \(\PageIndex{1}\) (Pharo): Creating a class to hold our arithmetic expression grammar
PPCompositeParser subclass: #ExpressionGrammar instanceVariableNames: '' classVariableNames: '' poolDictionaries: '' category: 'PetitTutorial'
Again we start with the grammar for an integer number. Define the method number as follows:
Code \(\PageIndex{2}\) (Pharo): Implementing our first parser as a method
ExpressionGrammar>>number ^ #digit asParser plus flatten trim ==> [ :str | str asNumber ]
Every production in ExpressionGrammar
is specified as a method that returns its parser. Similarly, we define the productions term
, prod
, mul
, and prim
. Productions refer to each other by reading the respective instance variable of the same name and PetitParser takes care of initializing these instance variables for you automatically. We let Pharo automatically add the necessary instance variables as we refer to them for the first time. We obtain the following class definition:
Code \(\PageIndex{3}\) (Pharo): Creating a class to hold our arithmetic expression grammar
PPCompositeParser subclass: #ExpressionGrammar instanceVariableNames: 'add prod term mul prim parens number' classVariableNames: '' poolDictionaries: '' category: 'PetitTutorial'
Code \(\PageIndex{4}\) (Pharo): Defining more expression grammar parsers, this time with no associated action
ExpressionGrammar>>term ^ add / prod ExpressionGrammar>>add ^ prod , $+ asParser trim , term ExpressionGrammar>>prod ^ mul / prim ExpressionGrammar>>mul ^ prim , $* asParser trim , prod ExpressionGrammar>>prim ^ parens / number ExpressionGrammar>>parens ^ $( asParser trim , term , $) asParser trim
Contrary to our previous implementation we do not define the production actions yet (what we previously did by using PPParser»==>
); and we factor out the parts for addition (add
), multiplication (mul
), and parenthesis (parens
) into separate productions. This will give us better reusability later on. For example, a subclass may override such methods to produce slightly different production output. Usually, production methods are categorized in a protocol named grammar
(which can be refined into more specific protocol names when necessary such as grammar-literals
).
Last but not least we define the starting point of the expression grammar. This is done by overriding PPCompositeParser»start
in the ExpressionGrammar
class:
Code \(\PageIndex{5}\) (Pharo): Defining the starting point of our expression grammar parser
ExpressionGrammar>>start ^ term end
Instantiating the ExpressionGrammar
gives us an expression parser that returns a default abstract-syntax tree:
Code \(\PageIndex{6}\) (Pharo): Testing our parser on simple arithmetic expressions
parser := ExpressionGrammar new. parserparse:'1+2*3'. → #(1$+#(2$*3)) parser parse: '(1 + 2) * 3'. → #(#($( #(1 $+ 2) $)) $* 3)
Writing dependent grammars
You can easily reuse parsers defined by other grammars. For example, imagine you want to create a new grammar that reuses the definition of number
in the ExpressionGrammar
we have just defined. For this, you have to declare a dependency to ExpressionGrammar
:
Code \(\PageIndex{7}\) (Pharo): Reusing the number parser from the ExpressionGrammar grammar
PPCompositeParser subclass: #MyNewGrammar instanceVariableNames: 'number' classVariableNames: '' poolDictionaries: '' category: 'PetitTutorial' MyNewGrammar class>>dependencies "Answer a collection of PPCompositeParser classes that this parser directly dependends on." ^ {ExpressionGrammar} MyNewGrammar>>number "Answer the same parser as ExpressionGrammar>>number." ^ (self dependencyAt: ExpressionGrammar) number
Defining an evaluator
Now that we have defined a grammar we can reuse this definition to implement an evaluator. To do this we create a subclass of ExpressionGrammar
called ExpressionEvaluator
.
Code \(\PageIndex{8}\) (Pharo): Separating the grammar from the evaluator by creating a subclass
ExpressionGrammar subclass: #ExpressionEvaluator instanceVariableNames: '' classVariableNames: '' poolDictionaries: '' category: 'PetitTutorial'
We then redefine the implementation of add
, mul
and parens
with our evaluation semantics. This is accomplished by calling the super implementation and adapting the returned parser as shown in the following methods.
Code \(\PageIndex{9}\) (Pharo): Refining the definition of some parsers to evaluate arithmetic expressions
ExpressionEvaluator>>add ^ super add ==> [ :nodes | nodes first + nodes last ] ExpressionEvaluator>>mul ^ super mul ==> [ :nodes | nodes first * nodes last ] ExpressionEvaluator>>parens ^ super parens ==> [ :nodes | nodes second ]
The evaluator is now ready to be tested:
Code \(\PageIndex{10}\) (Pharo): Testing our evaluator on simple arithmetic expressions
parser := ExpressionEvaluator new. parser parse: '1 + 2 * 3'. → 7 parser parse: '(1 + 2) * 3'. → 9
Defining a Pretty-Printer
We can reuse the grammar for example to define a simple pretty printer. This is as easy as subclassing ExpressionGrammar
again!
Code \(\PageIndex{11}\) (Pharo): Separating the grammar from the pretty printer by creating a subclass
ExpressionGrammar subclass: #ExpressionPrinter instanceVariableNames: '' classVariableNames: '' poolDictionaries: '' category: 'PetitTutorial' ExpressionPrinter>>add ^ super add ==> [:nodes | nodes first , ' + ' , nodes third] ExpressionPrinter>>mul ^ super mul ==> [:nodes | nodes first , ' * ' , nodes third] ExpressionPrinter>>number ^ super number ==> [:num | num printString] ExpressionPrinter>>parens ^ super parens ==> [:node | '(' , node second , ')']
This pretty printer can be tried out as shown by the following expressions.
Code \(\PageIndex{12}\) (Pharo): Testing our pretty printer on simple arithmetic expressions
parser := ExpressionPrinter new. parser parse: '1+2 *3'. → '1 + 2 * 3' parser parse: '(1+ 2 )* 3'. → '(1 + 2) * 3'
Easy expressions with PPExpressionParser
PetitParser proposes a powerful tool to create expressions; PPExpressionParser
is a parser to conveniently define an expression grammar with prefix, post-fix, and left- and right-associative infix operators. The operator-groups are defined in descending precedence.
Code \(\PageIndex{13}\) (Pharo): The ExpressionGrammar we previously defined can be implemented in few lines
| expression parens number | expression := PPExpressionParser new. parens := $( asParser token trim , expression , $) asParser token trim ==> [ :nodes | nodes second ]. number := #digit asParser plus flatten trim ==> [ :str | str asNumber ]. expression term: parens / number. expression group: [ :g | g left: $* asParser token trim do: [ :a :op :b | a * b ]. g left: $/ asParser token trim do: [ :a :op :b | a / b ] ]; group: [ :g | g left: $+ asParser token trim do: [ :a :op :b | a + b ]. g left: $- asParser token trim do: [ :a :op :b | a - b ] ].
Code \(\PageIndex{14}\) (Pharo): Now our parser is also able to manage subtraction and division
expression parse: '1-2/3'. → (1/3)
How do you decide when to create a subclass of PPCompositeParser
or instantiate PPExpressionParser
? On the one hand, you should instantiate a PPExpressionParser
if you want to do a small parser for a small task. On the other hand, if you have a grammar that’s composed of many parsers, you should subclass PPCompositeParser
.