Skip to main content
Engineering LibreTexts

17.2: Composite Grammars with PetitParser

  • Page ID
    43765
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    In the previous section we saw the basic principles of PetitParser and gave some introductory examples. In this section we are going to present a way to define more complicated grammars. We continue where we left off with the arithmetic expression grammar.

    Writing parsers as a script as we did previously can be cumbersome, especially when grammar productions are mutually recursive and refer to each other in complicated ways. Furthermore a grammar specified in a single script makes it unnecessary hard to reuse specific parts of that grammar. Luckily there is PPCompositeParser to the rescue.

    Defining the grammar

    As an example let’s create a composite parser using the same expression grammar we built in the last section but this time we define it inside a class subclass of PPCompositeParser.

    Code \(\PageIndex{1}\) (Pharo): Creating a class to hold our arithmetic expression grammar

    PPCompositeParser subclass: #ExpressionGrammar
        instanceVariableNames: ''
        classVariableNames: ''
        poolDictionaries: ''
        category: 'PetitTutorial'
    

    Again we start with the grammar for an integer number. Define the method number as follows:

    Code \(\PageIndex{2}\) (Pharo): Implementing our first parser as a method

    ExpressionGrammar>>number
        ^ #digit asParser plus flatten trim ==> [ :str | str asNumber ]
    

    Every production in ExpressionGrammar is specified as a method that returns its parser. Similarly, we define the productions term, prod, mul, and prim. Productions refer to each other by reading the respective instance variable of the same name and PetitParser takes care of initializing these instance variables for you automatically. We let Pharo automatically add the necessary instance variables as we refer to them for the first time. We obtain the following class definition:

    Code \(\PageIndex{3}\) (Pharo): Creating a class to hold our arithmetic expression grammar

    PPCompositeParser subclass: #ExpressionGrammar
        instanceVariableNames: 'add prod term mul prim parens number'
        classVariableNames: ''
        poolDictionaries: ''
        category: 'PetitTutorial'
    

    Code \(\PageIndex{4}\) (Pharo): Defining more expression grammar parsers, this time with no associated action

    ExpressionGrammar>>term
        ^ add / prod
    
    ExpressionGrammar>>add
        ^ prod , $+ asParser trim , term
    
    ExpressionGrammar>>prod
        ^ mul / prim
    
    ExpressionGrammar>>mul
        ^ prim , $* asParser trim , prod
    
    ExpressionGrammar>>prim
        ^ parens / number
    
    ExpressionGrammar>>parens
        ^ $( asParser trim , term , $) asParser trim
    

    Contrary to our previous implementation we do not define the production actions yet (what we previously did by using PPParser»==>); and we factor out the parts for addition (add), multiplication (mul), and parenthesis (parens) into separate productions. This will give us better reusability later on. For example, a subclass may override such methods to produce slightly different production output. Usually, production methods are categorized in a protocol named grammar (which can be refined into more specific protocol names when necessary such as grammar-literals).

    Last but not least we define the starting point of the expression grammar. This is done by overriding PPCompositeParser»start in the ExpressionGrammar class:

    Code \(\PageIndex{5}\) (Pharo): Defining the starting point of our expression grammar parser

    ExpressionGrammar>>start
        ^ term end
    

    Instantiating the ExpressionGrammar gives us an expression parser that returns a default abstract-syntax tree:

    Code \(\PageIndex{6}\) (Pharo): Testing our parser on simple arithmetic expressions

    parser := ExpressionGrammar new.
    parserparse:'1+2*3'.            →    #(1$+#(2$*3))
    parser parse: '(1 + 2) * 3'.    →    #(#($( #(1 $+ 2) $)) $* 3)
    

    Writing dependent grammars

    You can easily reuse parsers defined by other grammars. For example, imagine you want to create a new grammar that reuses the definition of number in the ExpressionGrammar we have just defined. For this, you have to declare a dependency to ExpressionGrammar:

    Code \(\PageIndex{7}\) (Pharo): Reusing the number parser from the ExpressionGrammar grammar

    PPCompositeParser subclass: #MyNewGrammar
        instanceVariableNames: 'number'
        classVariableNames: ''
        poolDictionaries: ''
        category: 'PetitTutorial'
    
    MyNewGrammar class>>dependencies
        "Answer a collection of PPCompositeParser classes that this parser directly dependends on."
        ^ {ExpressionGrammar}
    
    MyNewGrammar>>number
        "Answer the same parser as ExpressionGrammar>>number."
        ^ (self dependencyAt: ExpressionGrammar) number 
    

    Defining an evaluator

    Now that we have defined a grammar we can reuse this definition to implement an evaluator. To do this we create a subclass of ExpressionGrammar called ExpressionEvaluator.

    Code \(\PageIndex{8}\) (Pharo): Separating the grammar from the evaluator by creating a subclass

    ExpressionGrammar subclass: #ExpressionEvaluator
        instanceVariableNames: ''
        classVariableNames: ''
        poolDictionaries: ''
        category: 'PetitTutorial'
    

    We then redefine the implementation of add, mul and parens with our evaluation semantics. This is accomplished by calling the super implementation and adapting the returned parser as shown in the following methods.

    Code \(\PageIndex{9}\) (Pharo): Refining the definition of some parsers to evaluate arithmetic expressions

    ExpressionEvaluator>>add
        ^ super add ==> [ :nodes | nodes first + nodes last ]
    
    ExpressionEvaluator>>mul
        ^ super mul ==> [ :nodes | nodes first * nodes last ]
    
    ExpressionEvaluator>>parens
        ^ super parens ==> [ :nodes | nodes second ]
    

    The evaluator is now ready to be tested:

    Code \(\PageIndex{10}\) (Pharo): Testing our evaluator on simple arithmetic expressions

    parser := ExpressionEvaluator new.
    parser parse: '1 + 2 * 3'.      → 7
    parser parse: '(1 + 2) * 3'.    → 9
    

    Defining a Pretty-Printer

    We can reuse the grammar for example to define a simple pretty printer. This is as easy as subclassing ExpressionGrammar again!

    Code \(\PageIndex{11}\) (Pharo): Separating the grammar from the pretty printer by creating a subclass

    ExpressionGrammar subclass: #ExpressionPrinter
        instanceVariableNames: ''
        classVariableNames: ''
        poolDictionaries: ''
        category: 'PetitTutorial'
    
    ExpressionPrinter>>add
        ^ super add ==> [:nodes | nodes first , ' + ' , nodes third]
    
    ExpressionPrinter>>mul
        ^ super mul ==> [:nodes | nodes first , ' * ' , nodes third]
    
    ExpressionPrinter>>number
        ^ super number ==> [:num | num printString]
    
    ExpressionPrinter>>parens
        ^ super parens ==> [:node | '(' , node second , ')']
    

    This pretty printer can be tried out as shown by the following expressions.

    Code \(\PageIndex{12}\) (Pharo): Testing our pretty printer on simple arithmetic expressions

    parser := ExpressionPrinter new.
    parser parse: '1+2 *3'.        → '1 + 2 * 3'
    parser parse: '(1+ 2 )* 3'.    → '(1 + 2) * 3'
    

    Easy expressions with PPExpressionParser

    PetitParser proposes a powerful tool to create expressions; PPExpressionParser is a parser to conveniently define an expression grammar with prefix, post-fix, and left- and right-associative infix operators. The operator-groups are defined in descending precedence.

    Code \(\PageIndex{13}\) (Pharo): The ExpressionGrammar we previously defined can be implemented in few lines

    | expression parens number |
    expression := PPExpressionParser new.
    parens := $( asParser token trim , expression , $) asParser token trim
    ==> [ :nodes | nodes second ].
    number := #digit asParser plus flatten trim ==> [ :str | str asNumber ].
    
    expression term: parens / number.
    
    expression
        group: [ :g |
            g left: $* asParser token trim do: [ :a :op :b | a * b ].
            g left: $/ asParser token trim do: [ :a :op :b | a / b ] ];
        group: [ :g |
            g left: $+ asParser token trim do: [ :a :op :b | a + b ].
            g left: $- asParser token trim do: [ :a :op :b | a - b ] ].
    

    Code \(\PageIndex{14}\) (Pharo): Now our parser is also able to manage subtraction and division

    expression parse: '1-2/3'.    → (1/3)
    

    How do you decide when to create a subclass of PPCompositeParser or instantiate PPExpressionParser? On the one hand, you should instantiate a PPExpressionParser if you want to do a small parser for a small task. On the other hand, if you have a grammar that’s composed of many parsers, you should subclass PPCompositeParser.


    This page titled 17.2: Composite Grammars with PetitParser is shared under a CC BY-SA 3.0 license and was authored, remixed, and/or curated by Alexandre Bergel, Damien Cassou, Stéphane Ducasse, Jannik Laval (Square Bracket Associates) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.