6.2: Strings of Matched Brackets

Last updated
Save as PDF

Page ID: 48322

Eric Lehman, F. Thomson Leighton, & Alberty R. Meyer
Google and Massachusetts Institute of Technology via MIT OpenCourseWare

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Let \(\{\color{blue}],\color{red}[\color{black}\}^{*}\) be the set of all strings of square brackets. For example, the following two strings are in \(\{\color{blue}],\color{red}[\color{black}\}^{*}\):

\[\label{6.2.1} \color{red}[ \color{blue}]] \color{red}[[[ \quad \color{black} \text{ and } \quad \color{red}[\color{red}[\color{blue}]]\color{red}[\color{blue}]\]

A string, \(s \in \{\color{blue}],\color{red}[\color{black}\}^{*}\), is called a matched string if its brackets “match up” in the usual way. For example, the left hand string above is not matched because its second right bracket does not have a matching left bracket. The string on the right is matched.

We’re going to examine several different ways to define and prove properties of matched strings using recursively defined sets and functions. These properties are pretty straightforward, and you might wonder whether they have any particular relevance in computer science. The honest answer is “not much relevance any more.” The reason for this is one of the great successes of computer science, as explained in the text box below.

Expression Parsing

During the early development of computer science in the 1950’s and 60’s, creation of effective programming language compilers was a central concern. A key aspect in processing a program for compilation was expression parsing. One significant problem was to take an expression like

\[\nonumber x+y * z^{2} \div y+7\]

and put in the brackets that determined how it should be evaluated—should it bes

\[\nonumber \left[[x+y] * z^{2} \div y\right]+7, \text { or }\]

\[\nonumber x+\left[y * z^{2} \div[y+7]\right] \text { , or, }\]

\[\nonumber \left[x+\left[y * z^{2}\right]\right] \div[y+7], \text { or } \ldots ?\]

The Turing award (the “Nobel Prize” of computer science) was ultimately bestowed on Robert W. Floyd, for, among other things, discovering simple procedures that would insert the brackets properly.

In the 70’s and 80’s, this parsing technology was packaged into high-level compiler-compilers that automatically generated parsers from expression grammars. This automation of parsing was so effective that the subject no longer demanded attention. It had largely disappeared from the computer science curriculum by the 1990’s.

The matched strings can be nicely characterized as a recursive data type:

Definition \(\PageIndex{1}\)

Recursively define the set, RecMatch, of strings as follows:

Base case: \(\lambda \in \text{RecMatch}\).

Constructor case: If \(s, t \in \text{RecMatch}\), then

\[\nonumber \color{red}[ \color{black} s \color{blue}] \color{black}t \in \text{RecMatch}.\]

Here \(\color{red}[ \color{black} s \color{blue}] \color{black}t \) refers to the concatenation of strings which would be written in full as

\[\nonumber \color{red}[ \color{black} \cdot (s \cdot ( \color{blue} ] \color{black} \cdot t)).\]

From now on, we’ll usually omit the “\(\cdot ’s.\)”

Using this definition, \(\lambda \in \text{RecMatch}\) by the base case, so letting \(s = t = \lambda\) in the constructor case implies

\[\nonumber \color{red} [ \color{black} \lambda \color{blue}] \color{black}\lambda = \color{red}[ \color{blue}] \color{black} \in \text{RecMatch}.\]

Now,

\[\begin{aligned} \color{red}[ \color{black} \lambda \color{blue}] \color{red}[ \color{blue}] \color{black} &= \color{red}[ \color{blue}] \color{red}[ \color{blue}] \color{black} \in \text{RecMatch} & (\text{letting } s = \lambda , t = \color{red}[ \color{blue}] \color{black}) \\ \color{red}[\color{red}[\color{blue}]] \color{black} \lambda &= \color{red}[\color{red}[\color{blue}]] \color{black} \in \text{RecMatch} & (\text{letting } s = \color{red}[ \color{blue}] \color{black}, t = \lambda )\\ &\color{red}[\color{red}[\color{blue}]] \color{red}[\color{blue}] \color{black} \in \text{RecMatch} & (\text{letting } s = \color{red}[ \color{blue}] \color{black}, t = \color{red}[ \color{blue}] \color{black})\end{aligned}\]

are also strings in RecMatch by repeated applications of the constructor case; and so on.

It’s pretty obvious that in order for brackets to match, there had better be an equal number of left and right ones. For further practice, let’s carefully prove this from the recursive definitions.

Lemma. Every string in RecMatch has an equal number of left and right brackets.

Proof. The proof is by structural induction with induction hypothesis

\[\nonumber P(s) ::= \#_{\color{red}[} (s) = \#_{\color{blue}]} (s).\]

Base case: \(P(\lambda)\) holds because

\[\nonumber \#_{\color{red}[} (\lambda) = 0 = \#_{\color{blue}]} (\lambda)\]

by the base case of Definition 6.1.5 of \(\#_c ()\).

Constructor case: By structural induction hypothesis, we assume \(P(s)\) and \(P(t)\) and must show \(P(\color{red}[ \color{black}s \color{blue}] \color{black}t):\)

\[\nonumber \begin{array}{l}
\#_{\color{red}[}(\color{red}[ \color{black}s \color{blue}] \color{black}t) & =\#_{\color{red}[}\color{black}(\color{red}[\color{black}) + \#_{\color{red}[}(s)+\#_{\color{red}[}(\color{blue}]\color{black})+\#_{\color{red}[}\color{black}(t) & (\text {Lemma } 6.1.6) \\
& =1+\#_{\color{red}[}(s)+0+\#_{\color{red}[}(t) & (\text {def } \#_{\color{red}[}()) \\
& =1+\#_{\color{blue}]}(s)+0+\#_{\color{blue}]}(t) & \text {(by } P(s) \text { and } P(t)) \\
& =0+\#_{\color{blue}]}(s)+0+\#_{\color{blue}]}(t) \\
& = \#_{\color{blue}]}\color{black}(\color{red}[\color{black}) + \#_{\color{blue}]}(s)+\#_{\color{blue}]}\color{black}(\color{blue}]\color{black})+\#_{\color{blue}]}\color{black}(t) & (\text {def } \#_{\color{blue}]}())\\
&= \#_{\color{blue}]}\color{black}(\color{red}[ \color{black}s \color{blue}]\color{black}t) & (\text {Lemma } 6.1.6)
\end{array}\]

This completes the proof of the constructor case. We conclude by structural induction that \(P(s)\) holds for all \(s \in \text{RecMatch}\). \(\quad \blacksquare\)

Warning: When a recursive definition of a data type allows the same element to be constructed in more than one way, the definition is said to be ambiguous. We were careful to choose an unambiguous definition of RecMatch to ensure that functions defined recursively on its definition would always be well-defined. Recursively defining a function on an ambiguous data type definition usually will not work. To illustrate the problem, here’s another definition of the matched strings.

Definition \(\PageIndex{2}\)

Define the set, \(\text{AmbRecMatch} \subseteq \{\color{blue}],\color{red}[\color{black}\}^{*}\) recursively as follows:

Base case: \(\lambda \in \text{AmbRecMatch}\),

Constructor cases: if \(s,t \in \text{AmbRecMatch}\), then the strings \(\color{red}[ \color{black}s \color{blue}]\) and \(st\) are also in AmbRecMatch.

It’s pretty easy to see that the definition of AmbRecMatch is just another way to define RecMatch, that is AmbRecMatch D RecMatch (see Problem 6.15). The definition of AmbRecMatch is arguably easier to understand, but we didn’t use it because it’s ambiguous, while the trickier definition of RecMatch is unambiguous. Here’s why this matters. Let’s define the number of operations, \(f(s)\), to construct a matched string \(s\) recursively on the definition of \(s \in \text{AmbRecMatch}:\)

\[\begin{aligned}
f(\lambda) &::=0, & &(f \text { base case }) \\
f(\color{red}[ \color{black}s \color{blue}]) &::= 1+f(s), & & \\
f(st) &:=1+f(s)+f(t) . & &(f \text { concat case })
\end{aligned}\]

This definition may seem ok, but it isn’t: \(f(\lambda)\) winds up with two values, and consequently:

\[\nonumber \begin{array}{l}
0 & =f(\lambda) & (f \text { base case })) \\
& =f(\lambda \cdot \lambda) & \text {(concat def, base case) } \\
& =1+f(\lambda)+f(\lambda) & (f \text { concat case}), \\
& =1+0+0=1 & (f \text { base case}).
\end{array}\]

This is definitely not a situation we want to be in!