Skip to main content
Engineering LibreTexts

17.6: Packrat Parsers

  • Page ID
    43769
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    In the beginning of the chapter, we have mentioned four parser methodologies, one of them was Packrat Parsers. We claimed that packrat parsing gives linear parse times. But in the debugging example we saw that original version of the BacktrackingParser parsed \( \mathit{input}_{c} \) of length 10 in 233 steps. And if you try to parse \( \mathit{longinput}_{c} = \) 'aaaaaaaaaaaaaaaaaaaac' (length 20), you will see that the original parser needs 969 steps. Indeed, the progress is not linear.

    The PetitParser framework does not use packrat parsing by default. You need to send the memoized message to enable packrat parsing. The memoized parser ensures that the parsing for the particular position in an input and the particular parser will be performed only once and the result will be remembered in a dictionary for a future use. The second time the parser wants to parse the input, the result will be looked up in the dictionary. This way, a lot of unnecessary parsing can be avoided. The disadvantage is that PetitParser needs much more memory to remember all the results of all the possible parsers at all the possible positions.

    To give you an example with a packrat parser, let us return back to the BacktrackingParser once again (see Code 18.5.4). As we have analyzed before, the problem was in the parser ab that constantly failed in the p -> ab -> ap -> p loop. Now we can do the trick and memoize the parser ab by updating the method ab as in Code \(\PageIndex{1}\). When the memoization is applied, we get the progress as in Figure \(\PageIndex{1}\) with the total number of 63 invocations for \( \mathit{input}_{c} \) and the 129 invocations for \( \mathit{longinput}_{c} \). With the minor modification of BacktrackingParser we got a linear parsing time (related to the length of the input) with a factor around 6.

    Code \(\PageIndex{1}\) (Pharo): Memoized version of the parser ab

    BacktrackingParser>>ab
        ^ ( 'b' asParser /
            ('a' asParser, ab)
          ) memoized
    
    Progress of the memoized version of the BacktrackingParser.
    Figure \(\PageIndex{1}\): Progress of the memoized version of the BacktrackingParser.

    This page titled 17.6: Packrat Parsers is shared under a CC BY-SA 3.0 license and was authored, remixed, and/or curated by Alexandre Bergel, Damien Cassou, Stéphane Ducasse, Jannik Laval (Square Bracket Associates) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.