Skip to main content
Engineering LibreTexts

17.4: Case Study — A JSON Parser

  • Page ID
    43767
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    In this section we illustrate PetitParser through the development of a JSON parser. JSON is a lightweight data-interchange format defined in http://www.json.org. We are going to use the specification on this website to define our own JSON parser.

    JSON is a simple format based on nested pairs and arrays. The following script gives an example taken from Wikipedia http://en.Wikipedia.org/wiki/JSON.

    Code \(\PageIndex{1}\) (Pharo): An example of JSON

    { "firstName" : "John",
      "lastName" : "Smith",
      "age" : 25,
      "address" :
          { "streetAddress" : "21 2nd Street",
            "city" : "New York",
            "state" : "NY",
            "postalCode" : "10021" },
      "phoneNumber":
          [
              { "type" : "home",
                "number" : "212 555-1234" },
              { "type" : "fax",
                "number" : "646 555-4567" } ] }
    

    JSON consists of object definitions (between curly braces “{}”) and arrays (between square brackets “[]”). An object definition is a set of key/value pairs whereas an array is a list of values. The previous JSON example then represents an object (a person) with several key/value pairs (e.g., for the person’s first name, last name, and age). The address of the person is represented by another object while the phone number is represented by an array of objects.

    First we define a grammar as subclass of PPCompositeParser. Let us call it PPJsonGrammar.

    Code \(\PageIndex{2}\) (Pharo): Defining the JSON grammar class

    PPCompositeParser subclass: #PPJsonGrammar
        instanceVariableNames: ''
        classVariableNames: 'CharacterTable'
        poolDictionaries: ''
        category: 'PetitJson-Core'
    

    We define the CharacterTable class variable since we will later use it to parse strings.

    Syntax diagram representation for the JSON object parser.
    Figure \(\PageIndex{1}\): Syntax diagram representation for the JSON object parser defined in Code \(\PageIndex{3}\).
    Syntax diagram representation for the JSON array parser.
    Figure \(\PageIndex{2}\): Syntax diagram representation for the JSON array parser defined in Code \(\PageIndex{4}\).

    Parsing objects and arrays

    The syntax diagrams for JSON objects and arrays are in Figure \(\PageIndex{1}\) and Figure \(\PageIndex{2}\). A PetitParser can be defined for JSON objects with the following code:

    Code \(\PageIndex{3}\) (Pharo): Defining the JSON parser for object as represented in Figure \(\PageIndex{1}\)

    PPJsonGrammar>>object
        ^ ${ asParser token trim , members optional , $} asParser token trim
    
    PPJsonGrammar>>members
        ^ pair separatedBy: $, asParser token trim
    
    PPJsonGrammar>>pair
        ^ stringToken , $: asParser token trim , value
    

    The only new thing here is the call to the PPParser»separatedBy: convenience method which answers a new parser that parses the receiver (a value here) one or more times, separated by its parameter parser (a comma here).

    Arrays are much simpler to parse as depicted in the Code \(\PageIndex{4}\).

    Code \(\PageIndex{4}\) (Pharo): Defining the JSON parser for array as represented in Figure \(\PageIndex{2}\)

    PPJsonGrammar>>array
        ^ $[ asParser token trim ,
            elements optional ,
        $] asParser token trim
    
    PPJsonGrammar>>elements
        ^ value separatedBy: $, asParser token trim
    

    Parsing values

    In JSON, a value is either a string, a number, an object, an array, a Boolean (true or false), or null. The value parser is defined as below and represented in Figure \(\PageIndex{3}\):

    Code \(\PageIndex{5}\) (Pharo): Defining the JSON parser for value as represented in Figure \(\PageIndex{3}\)

    PPJsonGrammar>>value
        ^ stringToken / numberToken / object / array /
            trueToken / falseToken / nullToken 
    
    Syntax diagram representation for the JSON value parser.
    Figure \(\PageIndex{3}\): Syntax diagram representation for the JSON value parser defined in Code \(\PageIndex{5}\).

    A string requires quite some work to parse. A string starts and end with double-quotes. What is inside these double-quotes is a sequence of characters. Any character can either be an escape character, an octal character, or a normal character. An escape character is composed of a backslash immediately followed by a special character (e.g., '\n' to get a new line in the string). An octal character is composed of a backslash, immediately followed by the letter 'u', immediately followed by 4 hexadecimal digits. Finally, a normal character is any character except a double quote (used to end the string) and a backslash (used to introduce an escape character).

    Code \(\PageIndex{6}\) (Pharo): Defining the JSON parser for string as represented in Figure \(\PageIndex{4}\)

    PPJsonGrammar>>stringToken
        ^ string token trim
    PPJsonGrammar>>string
        ^ $" asParser , char star , $" asParser
    PPJsonGrammar>>char
        ^ charEscape / charOctal / charNormal
    PPJsonGrammar>>charEscape
        ^ $\ asParser , (PPPredicateObjectParser anyOf: (String withAll: CharacterTable keys))
    PPJsonGrammar>>charOctal
        ^ '\u' asParser , (#hex asParser min: 4 max: 4)
    PPJsonGrammar>>charNormal
        ^ PPPredicateObjectParser anyExceptAnyOf: '\"'
    

    Special characters allowed after a slash and their meanings are defined in the CharacterTable dictionary that we initialize in the initialize class method. Please note that initialize method on a class side is called when the class is loaded into the system. If you just created the initialize method class was loaded without the method. To execute it, you shoud evaluate PPJsonGrammar initialize in your workspace.

    Syntax diagram representation for the JSON string parser.
    Figure \(\PageIndex{4}\): Syntax diagram representation for the JSON string parser defined in Code \(\PageIndex{6}\).

    Code \(\PageIndex{7}\) (Pharo): Defining the JSON special characters and their meaning

    PPJsonGrammar class>>initialize
        CharacterTable := Dictionary new.
        CharacterTable
            at: $\ put: $\;
            at: $/ put: $/;
            at: $" put: $";
            at: $b put: Character backspace;
            at: $f put: Character newPage;
            at: $n put: Character lf;
            at: $r put: Character cr;
            at: $t put: Character tab
    

    Parsing numbers is only slightly simpler as a number can be positive or negative and integral or decimal. Additionally, a decimal number can be expressed with a floating number syntax.

    Code \(\PageIndex{8}\) (Pharo): Defining the JSON parser for number as represented in Figure \(\PageIndex{5}\)

    PPJsonGrammar>>numberToken
        ^ number token trim
    PPJsonGrammar>>number
        ^ $- asParser optional ,
        ($0 asParser / #digit asParser plus) ,
        ($. asParser , #digit asParser plus) optional ,
        (($e asParser / $E asParser) , ($- asParser / $+ asParser) optional , #digit asParser
            plus) optional
    
    Syntax diagram representation for the JSON number parser.
    Figure \(\PageIndex{5}\): Syntax diagram representation for the JSON number parser defined in Code \(\PageIndex{8}\).

    The attentive reader will have noticed a small difference between the syntax diagram in Figure \(\PageIndex{5}\) and the code in Code \(\PageIndex{8}\). Numbers in JSON can not contain leading zeros: i.e., strings such as "01" do not represent valid numbers. The syntax diagram makes that particularly explicit by allowing either a 0 or a digit between 1 and 9. In the above code, the rule is made implicit by relying on the fact that the parser combinator $/ is ordered: the parser on the right of $/ is only tried if the parser on the left fails: thus, ($0 asParser / #digit asParser plus) defines numbers as being just a 0 or a sequence of digits not starting with 0.

    The other parsers are fairly trivial:

    Code \(\PageIndex{9}\) (Pharo): Defining missing JSON parsers

    PPJsonGrammar>>falseToken
        ^ 'false' asParser token trim
    PPJsonGrammar>>nullToken
        ^ 'null' asParser token trim
    PPJsonGrammar>>trueToken
        ^ 'true' asParser token trim
    

    The only piece missing is the start parser.

    Code \(\PageIndex{10}\) (Pharo): Defining the JSON start parser as being a value (Figure \(\PageIndex{3}\)) with nothing following

    PPJsonGrammar>>start
        ^ value end
    

    This page titled 17.4: Case Study — A JSON Parser is shared under a CC BY-SA 3.0 license and was authored, remixed, and/or curated by Alexandre Bergel, Damien Cassou, Stéphane Ducasse, Jannik Laval (Square Bracket Associates) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.