4.6: General Grammars
- Page ID
- 6734
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)At the beginning of this chapter the general idea of a grammar as a set of rewriting or production rules was introduced. For most of the chapter, how- ever, we have restricted our attention to context-free grammars, in which production rules must be of the form \(A \longrightarrow x\) where \(A\) is a non-terminal symbol. In this section, we will consider general grammars, that is, gram- mars in which there is no such restriction on the form of production rules. For a general grammar, a production rule has the form \(u \longrightarrow x,\) where \(u\) is string that can contain both terminal and non-terminal symbols. For convenience, we will assume that u contains at least one non-terminal symbol, although even this restriction could be lifted without changing the class of languages that can be generated by grammars. Note that a context-free grammar is, in fact, an example of a general grammar, since production rules in a general grammar are allowed to be of the form \(A \longrightarrow x .\) They just don’t have to be of this form. I will use the unmodified term grammar to refer to general grammars.2 The definition of grammar is identical to the definition of context-free grammar, except for the form of the production rules:
Definition 4.6.
A grammar is a 4 -tuple \((V, \Sigma, P, S),\) where:
1. V is a finite set of symbols. The elements of V are the non-terminal symbols of the grammar.
2.\(\Sigma\) is a finite set of symbols such that \(V \cap \Sigma=\emptyset .\) The elements of \(\Sigma\) are the terminal symbols of the grammar.
3. \(P\) is a set of production rules. Each rule is of the form \(u \longrightarrow x\) where \(u\) and \(x\) are strings in \((V \cup \Sigma)^{*}\) and \(u\) contains at least one symbol from V.
4. \(S \in V . S\) is the start symbol of the grammar.
Suppose G is a grammar. Just as in the context-free case, the language generated by \(G\) is denoted by \(L(G)\) and is defined as \(L(G)=\{x \in\) \(\Sigma^{*} | S \Longrightarrow_{G}^{*} x \} .\) That is, a string \(x\) is in \(L(G)\) if and only if \(x\) is a string of terminal symbols and there is a derivation that produces x from the start symbol, S, in one or more steps.
The natural question is whether there are languages that can be generated by general grammars but that cannot be generated by context-free languages. We can answer this question immediately by giving an example of such a language. Let \(L\) be the language \(L=\left\{w \in\{a, b, c\}^{*} | n_{a}(w)=\right.\)\(n_{b}(w)=n_{c}(w) \} .\) We saw at the end of the last section that \(L\) is not context-free. However, L is generated by the following grammar:
\(S \longrightarrow S A B C\)
\(S \longrightarrow \varepsilon\)
\(A B \longrightarrow B A\)
\(B A \longrightarrow A B\)
\(A C \longrightarrow C A\)
\(C A \longrightarrow A C\)
\(B C \longrightarrow C B\)
\(C B \longrightarrow B C\)
\(A \longrightarrow a\)
\(B \longrightarrow b\)
\(C \rightarrow c\)
For this grammar, the set of non-terminals is {S, A, B, C} and the set of ter- minal symbols is {a, b, c}. Since both terminals and non-terminal symbols can occur on the left-hand side of a production rule in a general grammar, it is not possible, in general, to determine which symbols are non-terminal and which are terminal just by looking at the list of production rules. However, I will follow the convention that uppercase letters are always non-terminal symbols. With this convention, I can continue to specify a grammar simply by listing production rules.
The first two rules in the above grammar make it possible to produce the strings ε, ABC, ABCABC, ABCABCABC, and so on. Each of these strings contains equal numbers of A’s, B’s, and C’s. The next six rules allow the order of the non-terminal symbols in the string to be changed. They make it possible to arrange the A’s, B’s, and C’s into any arbitrary order. Note that these rules could not occur in a context-free grammar. The last three rules convert the non-terminal symbols A, B, and C into the corresponding terminal symbols a, b, and c. Remember that all the non-terminals must be eliminated in order to produce a string in L(G). Here, for example, is a derivation of the string baabcc using this grammar. In each line, the string that will be replaced on the next line is underlined.
\(S \Longrightarrow \underline{S} A B C\)
\(\Longrightarrow \underline{S} A B C A B C\)
\(\Longrightarrow A B C A B C\)
\(\Longrightarrow B A C A B C\)
\(\Longrightarrow B A A C B C\)
\(\Longrightarrow B A A B C C\)
\(\Longrightarrow b A A B C C\)
\(\Longrightarrow b a A B C C\)
\(\Longrightarrow b a a B C C\)
\(\Longrightarrow b a a b C C\)
\(\Longrightarrow b a a b c C\)
\(\Longrightarrow b a a b c c\)
We could produce any string in L in a similar way. Of course, this only shows that \(L \subseteq L(G) .\) To show that \(L(G) \subseteq L,\) we can observe that for any string \(w\) such that \(S \Longrightarrow^{*} w, n_{A}(w)+n_{a}(w)=n_{B}(w)+n_{b}(w)=\) \(n_{C}(w)+n_{c}(w) .\) This follows since the rule \(S \Longrightarrow S A B C\) produces strings in which \(n_{A}(w)=n_{B}(w)=n_{C}(w),\) and no other rule changes any of the quantities \(n_{A}(w)+n_{a}(w), n_{B}(w)+n_{b}(w),\) or \(n_{C}(w)+n_{c}(w) .\) After applying these rules to produce a string \(x \in L(G),\) we must have that \(n_{A}(x), n_{B}(x)\) and \(n_{C}(x)\) are zero. The fact that \(n_{a}(x)=n_{c}(x)=n_{c}(x)\) then follows from the fact that \(n_{A}(x)+n_{a}(x)=n_{B}(x)+n_{b}(x)=n_{C}(x)+n_{c}(x) .\) That is, \(x \in L\).
Our first example of a non-context-free language was \(\left\{a^{n} b^{n} c^{n} | n \in \mathbb{N}\right\}\). This language can be generated by a general grammar similar to the previous example. However, it requires some cleverness to force the a’s, b’s, and c’s into the correct order. To do this, instead of allowing A’s, B’s, andC’s to transform themselves spontaneously into a’s, b’s, and c’s, we use additional non-terminal symbols to transform them only after they are in the correct position. Here is a grammar that does this:
\(S \longrightarrow S A B C\)
\(S \longrightarrow X\)
\(B A \longrightarrow A B\)
\(C A \longrightarrow A C\)
\(C B \longrightarrow B C\)
\(X A \longrightarrow a X\)
\(X \rightarrow Y\)
\(Y B \longrightarrow b Y\)
\(Y \longrightarrow Z\)
\(Z C \longrightarrow c Z\)
\(Z \longrightarrow \varepsilon\)
Here, the first two rules produce one of the strings X, XABC, XABCABC,XABCABCABC, and so on. The next three rules allow A’s to move to the left and C’s to move to the right, producing a string of the form XAnBnCn, for some \(n \in \mathbb{N} .\) The rule \(X A \longrightarrow a X\) allows the \(X\) to move through the A’s from left to right, converting A’s to a’s as it goes. After converting theA’s, the X can be transformed into a Y . The Y will then move through theB’s, converting them to b’s. Then, the Y is transformed into a Z, which is responsible for converting C’s to c’s. Finally, an application of the rule \(Z \longrightarrow \varepsilon\) removes the \(Z,\) leaving the string \(a^{n} b^{n} c^{n}\).
Note that if the rule \(X \longrightarrow Y\) is applied before all the \(A^{\prime} s\) have been converted to a’s, then there is no way for the remaining A’s to be converted to a’s or otherwise removed from the string. This means that the derivation has entered a dead end, which can never produce a string that consists of terminal symbols only. The only derivations that can produce strings in the language generated by the grammar are derivations in which the X moves past all the A’s, converting them all to a’s. At this point in the derivation, the string is of the form \(a^{n} X u\) where \(u\) is a string consisting entirely of \(B^{\prime} s\) and \(C\) 's. At this point, the rule \(X \longrightarrow Y\) can be applied, producing the string \(a^{n} Y u .\) Then, if a string of terminal symbols is ever to be produced, the \(Y\) must move past all the \(B^{\prime} s,\) producing the string \(a^{n} b^{n} Y C^{n} .\) You can see that the use of three separate non-terminals, X, Y , and Z, is essential for forcing the symbols in \(a^{n} b^{n} c^{n}\) into the correct order.
For one more example, consider the language \(\left\{a^{n^{2}} | n \in \mathbb{N}\right\} .\) Like the other languages we have considered in this section, this language is not context-free. However, it can be generated by a grammar. Consider the grammar
\(S \longrightarrow D T E\)
\(T \longrightarrow B T A\)
\(T \longrightarrow \varepsilon\)
\(B A \longrightarrow A a B\)
\(B a \longrightarrow a B\)
\(B E \longrightarrow E\)
\(D A \longrightarrow D\)
\(D a \longrightarrow a D\)
\(D E \longrightarrow \varepsilon\)
The first three rules produce all strings of the form \(D B^{n} A^{n} E,\) for \(n \in \mathbb{N}\). Let's consider what happens to the string \(D B^{n} A^{n} E\) as the remaining rules are applied. The next two rules allow a B to move to the right until it reaches the E. Each time the B passes an A, a new a is generated, but a Bwill simply move past an a without generating any other characters. Once the \(B\) reaches the \(E,\) the rule \(B E \longrightarrow E\) makes the \(B\) disappear. Each \(B\) from the string \(D B^{n} A^{n} E\) moves past \(n A^{\prime} s\) and generates \(n\) a's. Since there are \(n B^{\prime} s,\) a total of \(n^{2} a^{\prime} s\) are generated. Now, the only way to get rid of the D at the beginning of the string is for it to move right through all the A’s and a’s until it reaches the E at the end of the string. As it does this, the rule \(D A \longrightarrow D\) eliminates all the \(A^{\prime} s\) from the string, leaving the string \(a^{n^{2}} D E\) . Applying the rule \(D E \longrightarrow \varepsilon\) to this gives \(a^{n^{2}} .\) This string contains no non-terminal symbols and so is in the language generated by the grammar. We see that every string of the form \(a^{n^{2}}\) is generated by the above grammar. Furthermore, only strings of this form can be generated by the grammar.
Given a fixed alphabet Σ, there are only countably many different lan- guages over Σ that can be generated by grammars. Since there are un- countably many different languages over Σ, we know that there are many languages that cannot be generated by grammars. However, it is surpris- ingly difficult to find an actual example of such a language.
As a first guess, you might suspect that just as \(\left\{a^{n} b^{n} | n \in \mathbb{N}\right\}\) is an example of a language that is not regular and \(\left\{a^{n} b^{n} c^{n} | n \in \mathbb{N}\right\}\) is an example of a language that is not context-free, so \(\left\{a^{n} b^{n} c^{n} d^{n} | n \in \mathbb{N}\right\}\) might be an example of a language that cannot be generated by any grammar. However, this is not the case. The same technique that was used to produce a grammar that generates \(\left\{a^{n} b^{n} c^{n} | n \in \mathbb{N}\right\}\) can also be used to produce a grammar for \(\left\{a^{n} b^{n} c^{n} d^{n} | n \in \mathbb{N}\right\} .\) In fact, the technique extends to similar languages based on any number of symbols.
Or you might guess that there is no grammar for the language \(\left\{a^{n} | n\right.\)is a prime number }. Certainly, producing prime numbers doesn’t seem like the kind of thing that we would ordinarily do with a grammar. Nevertheless, there is a grammar that generates this language. We will not actually write down the grammar, but we will eventually have a way to prove that it exists.
The language \(\left\{a^{n^{2}} | n \in \mathbb{N}\right\}\) really doesn't seem all that "grammatical" either, but we produced a grammar for it above. If you think about how this grammar works, you might get the feeling that its operation is more like “computation” than “grammar.” This is our clue. A grammar can be thought of as a kind of program, albeit one that is executed in a non- deterministic fashion. It turns out that general grammars are precisely as powerful as any other general-purpose programming language, such as Java or C++. More exactly, a language can be generated by a grammar if and only if there is a computer program whose output consists of a list contain- ing all the strings and only the strings in that language. Languages that have this property are said to be srecursively enumerable languages. (This term as used here is not closely related to the idea of a recursive subroutine.) The languages that can be generated by general grammars are precisely the recursively enumerable languages. We will return to this topic in the next chapter.
It turns out that there are many forms of computation that are precisely equivalent in power to grammars and to computer programs, and no one has ever found any form of computation that is more powerful. This is one of the great discoveries of the twentieth century, and we will investigate it further in the next chapter.
Exercises
- Find a derivation for the string caabcb, according to the first example grammar in this section. Find a derivation for the string aabbcc, according to the second example grammar in this section. Find a derivation for the stringaaaa, according to the third example grammar in this section.
- Consider the third sample grammar from this section, which generates the language \(\left\{a^{n^{2}} | n \in \mathbb{N}\right\}\). Is the non-terminal symbol D necessary in this gram- mar? What if the first rule of the grammar were replaced by \(S \longrightarrow T E\) and the last three rules were replaced by \(A \rightarrow \varepsilon\) and \(E \longrightarrow \varepsilon\) ? Would the resulting grammar still generate the same language? Why or why not?
- Find a grammar that generates the language \(L=\left\{w \in\{a, b, c, d\}^{*} | n_{a}(w)=\right.\)\(n_{b}(w)=n_{c}(w)=n_{d}(w) \}\). Let Σ be any alphabet. Argue that the language \(\left\{w \in \Sigma^{*} | \text { all symbols in } \Sigma \text { occur equally often in } w\right\}\) can be generated by a grammar.
- For each of the following languages, find a grammar that generates the lan- guage. In each case, explain how your grammar works.
a) \(\left\{a^{n} b^{n} c^{n} d^{n} | n \in \mathbb{N}\right\} \quad\) b) \(\left\{a^{n} b^{m} c^{n m} | n \in \mathbb{N} \text { and } m \in \mathbb{N}\right\}\)
c) \(\left\{w w | w \in\{a, b\}^{*}\right\} \quad\) d) \(\left\{w w w | w \in\{a, b\}^{*}\right\}\)
e) \(\left\{a^{2^{n}} | n \in \mathbb{N}\right\}\) f) \(\left\{w \in\{a, b, c\}^{*} | n_{a}(w)>n_{b}(w)>n_{c}(w)\right\}\)