17.2: Composite Grammars with PetitParser

Last updated
Save as PDF

Page ID: 43765

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

In the previous section we saw the basic principles of PetitParser and gave some introductory examples. In this section we are going to present a way to define more complicated grammars. We continue where we left off with the arithmetic expression grammar.

Writing parsers as a script as we did previously can be cumbersome, especially when grammar productions are mutually recursive and refer to each other in complicated ways. Furthermore a grammar specified in a single script makes it unnecessary hard to reuse specific parts of that grammar. Luckily there is PPCompositeParser to the rescue.

Defining the grammar

As an example let’s create a composite parser using the same expression grammar we built in the last section but this time we define it inside a class subclass of PPCompositeParser.

Code \(\PageIndex{1}\) (Pharo): Creating a class to hold our arithmetic expression grammar

PPCompositeParser subclass: #ExpressionGrammar
    instanceVariableNames: ''
    classVariableNames: ''
    poolDictionaries: ''
    category: 'PetitTutorial'

Again we start with the grammar for an integer number. Define the method number as follows:

Code \(\PageIndex{2}\) (Pharo): Implementing our first parser as a method

ExpressionGrammar>>number
    ^ #digit asParser plus flatten trim ==> [ :str | str asNumber ]

Every production in ExpressionGrammar is specified as a method that returns its parser. Similarly, we define the productions term, prod, mul, and prim. Productions refer to each other by reading the respective instance variable of the same name and PetitParser takes care of initializing these instance variables for you automatically. We let Pharo automatically add the necessary instance variables as we refer to them for the first time. We obtain the following class definition:

Code \(\PageIndex{3}\) (Pharo): Creating a class to hold our arithmetic expression grammar

PPCompositeParser subclass: #ExpressionGrammar
    instanceVariableNames: 'add prod term mul prim parens number'
    classVariableNames: ''
    poolDictionaries: ''
    category: 'PetitTutorial'

Code \(\PageIndex{4}\) (Pharo): Defining more expression grammar parsers, this time with no associated action

ExpressionGrammar>>term
    ^ add / prod

ExpressionGrammar>>add
    ^ prod , $+ asParser trim , term

ExpressionGrammar>>prod
    ^ mul / prim

ExpressionGrammar>>mul
    ^ prim , $* asParser trim , prod

ExpressionGrammar>>prim
    ^ parens / number

ExpressionGrammar>>parens
    ^ $( asParser trim , term , $) asParser trim

Contrary to our previous implementation we do not define the production actions yet (what we previously did by using PPParser»==>); and we factor out the parts for addition (add), multiplication (mul), and parenthesis (parens) into separate productions. This will give us better reusability later on. For example, a subclass may override such methods to produce slightly different production output. Usually, production methods are categorized in a protocol named grammar (which can be refined into more specific protocol names when necessary such as grammar-literals).

Last but not least we define the starting point of the expression grammar. This is done by overriding PPCompositeParser»start in the ExpressionGrammar class:

Code \(\PageIndex{5}\) (Pharo): Defining the starting point of our expression grammar parser

ExpressionGrammar>>start
    ^ term end

Instantiating the ExpressionGrammar gives us an expression parser that returns a default abstract-syntax tree:

Code \(\PageIndex{6}\) (Pharo): Testing our parser on simple arithmetic expressions

parser := ExpressionGrammar new.
parserparse:'1+2*3'.            →    #(1$+#(2$*3))
parser parse: '(1 + 2) * 3'.    →    #(#($( #(1 $+ 2) $)) $* 3)

Writing dependent grammars

You can easily reuse parsers defined by other grammars. For example, imagine you want to create a new grammar that reuses the definition of number in the ExpressionGrammar we have just defined. For this, you have to declare a dependency to ExpressionGrammar:

Code \(\PageIndex{7}\) (Pharo): Reusing the number parser from the ExpressionGrammar grammar

PPCompositeParser subclass: #MyNewGrammar
    instanceVariableNames: 'number'
    classVariableNames: ''
    poolDictionaries: ''
    category: 'PetitTutorial'

MyNewGrammar class>>dependencies
    "Answer a collection of PPCompositeParser classes that this parser directly dependends on."
    ^ {ExpressionGrammar}

MyNewGrammar>>number
    "Answer the same parser as ExpressionGrammar>>number."
    ^ (self dependencyAt: ExpressionGrammar) number

Defining an evaluator

Now that we have defined a grammar we can reuse this definition to implement an evaluator. To do this we create a subclass of ExpressionGrammar called ExpressionEvaluator.

Code \(\PageIndex{8}\) (Pharo): Separating the grammar from the evaluator by creating a subclass

ExpressionGrammar subclass: #ExpressionEvaluator
    instanceVariableNames: ''
    classVariableNames: ''
    poolDictionaries: ''
    category: 'PetitTutorial'

We then redefine the implementation of add, mul and parens with our evaluation semantics. This is accomplished by calling the super implementation and adapting the returned parser as shown in the following methods.

Code \(\PageIndex{9}\) (Pharo): Refining the definition of some parsers to evaluate arithmetic expressions

ExpressionEvaluator>>add
    ^ super add ==> [ :nodes | nodes first + nodes last ]

ExpressionEvaluator>>mul
    ^ super mul ==> [ :nodes | nodes first * nodes last ]

ExpressionEvaluator>>parens
    ^ super parens ==> [ :nodes | nodes second ]

The evaluator is now ready to be tested:

Code \(\PageIndex{10}\) (Pharo): Testing our evaluator on simple arithmetic expressions

parser := ExpressionEvaluator new.
parser parse: '1 + 2 * 3'.      → 7
parser parse: '(1 + 2) * 3'.    → 9

Defining a Pretty-Printer

We can reuse the grammar for example to define a simple pretty printer. This is as easy as subclassing ExpressionGrammar again!

Code \(\PageIndex{11}\) (Pharo): Separating the grammar from the pretty printer by creating a subclass

ExpressionGrammar subclass: #ExpressionPrinter
    instanceVariableNames: ''
    classVariableNames: ''
    poolDictionaries: ''
    category: 'PetitTutorial'

ExpressionPrinter>>add
    ^ super add ==> [:nodes | nodes first , ' + ' , nodes third]

ExpressionPrinter>>mul
    ^ super mul ==> [:nodes | nodes first , ' * ' , nodes third]

ExpressionPrinter>>number
    ^ super number ==> [:num | num printString]

ExpressionPrinter>>parens
    ^ super parens ==> [:node | '(' , node second , ')']

This pretty printer can be tried out as shown by the following expressions.

Code \(\PageIndex{12}\) (Pharo): Testing our pretty printer on simple arithmetic expressions

parser := ExpressionPrinter new.
parser parse: '1+2 *3'.        → '1 + 2 * 3'
parser parse: '(1+ 2 )* 3'.    → '(1 + 2) * 3'

Easy expressions with PPExpressionParser

PetitParser proposes a powerful tool to create expressions; PPExpressionParser is a parser to conveniently define an expression grammar with prefix, post-fix, and left- and right-associative infix operators. The operator-groups are defined in descending precedence.

Code \(\PageIndex{13}\) (Pharo): The ExpressionGrammar we previously defined can be implemented in few lines

| expression parens number |
expression := PPExpressionParser new.
parens := $( asParser token trim , expression , $) asParser token trim
==> [ :nodes | nodes second ].
number := #digit asParser plus flatten trim ==> [ :str | str asNumber ].

expression term: parens / number.

expression
    group: [ :g |
        g left: $* asParser token trim do: [ :a :op :b | a * b ].
        g left: $/ asParser token trim do: [ :a :op :b | a / b ] ];
    group: [ :g |
        g left: $+ asParser token trim do: [ :a :op :b | a + b ].
        g left: $- asParser token trim do: [ :a :op :b | a - b ] ].

Code \(\PageIndex{14}\) (Pharo): Now our parser is also able to manage subtraction and division

expression parse: '1-2/3'.    → (1/3)

How do you decide when to create a subclass of PPCompositeParser or instantiate PPExpressionParser? On the one hand, you should instantiate a PPExpressionParser if you want to do a small parser for a small task. On the other hand, if you have a grammar that’s composed of many parsers, you should subclass PPCompositeParser.