Try Understanding SLR Parsing by Walking Through an Example
This blog is aiming to sort out some parsing concepts, especially, by walking through a simple CFG example.
Most concepts are from Stanford CS143 Compilers, and Compilers: Principles, Techniques, & Tools (A.K.A the Dragon Book). This blog is just reflecting and trying to deepen my own understanding of the topic, you should refer to the course and the book for stricter definitions.
CFG Example
$$ E \rightarrow E + E | id \tag{CFG-1}$$
LR(0) DFA
States of the DFA
- $I_0$:
- $E’ \rightarrow .E$
- $E \rightarrow .E + E$ ($\epsilon$ transition from above transition, similar below)
- $E \rightarrow .id$
- $I_1$:
- $E’ \rightarrow E.$
- $E \rightarrow E. + E$
- $I_2$:
- $E \rightarrow id.$
- $I_3$:
- $E \rightarrow E + .E$
- $E \rightarrow .E + E$
- $E \rightarrow .id$
- $I_4$:
- $E \rightarrow E + E.$
- $E \rightarrow E. + E$
Transition Diagram
Parsing Example
The table is produced by following example 4.43 of the Dragon Book.
Parsing $id + id$
Line | Stack | Symbols | Input | Action |
---|---|---|---|---|
1 | 0 | $ | id + id $ | shift to I2 |
2 | 0 2 | $ id | + id $ | reduce by E -> id |
3 | 0 1 | $ E | + id $ | shift to I3 |
4 | 0 1 3 | $ E + | id $ | shift to I2 |
5 | 0 1 3 2 | $ E + id | $ | reduce by E -> id |
6 | 0 1 3 4 | $ E + E | $ | reduce by E -> E + E |
7 | 0 1 | $ E | $ | accept |
I feel the ordering of the columns is a little misleading, I actually finish the stack column at last of each step. Here are some explanations and considerations from me to better understand this table.
-
In line 1, the initial state is
0
, Symbols are those in parsing stack, and Input are those yet to be parsed. -
In line 1, there’s nothing to be reduced, all we can do is shift.
-
In line 2:
- First update Symbols: from
$
to$ id
; - Then update Input: from
id + id $
to+ id $
; - Walk the DFA using the Symbol from start: initially we’re in state $I_0$, then transit to $I_2$ by
id
in Symbols.
The transition represented in the above graph is also called GOTO function. Here the transition from $I_0$ to $I_1$ can be annotated as $GOTO(I_0, E) = I_1$.
Here, the Stack column of Line 2 could also be updated as
0 2
. - First update Symbols: from
Parsing $id + id + id$
Line | Stack | Symbols | Input | Action |
---|---|---|---|---|
1 | 0 | $ | id + id + id $ | shift to I2 |
2 | 0 2 | $ id | + id + id $ | reduce by E -> id |
3 | 0 1 | $ E | + id + id $ | shift to I3 |
4 | 0 1 3 | $ E + | id + id $ | shift to I2 |
5 | 0 1 3 2 | $ E + id | + id $ | reduce by E -> id |
6 | 0 1 3 4 | $ E + E | + id $ | reduce or shift??? |
Here in line 6, we have a shift/reduce conflict, meaning we could not decide whether to reduce or to shift. In fact, we could parse the string either by shifting or reducing at the specific stage; but to avoid such problem, we should either specify left associative or right associative.
Some Concepts
Ambiguity
We say a grammar is ambiguous if a string of the language could be parsed into different parsing trees.
For example, $id + id + id$, which is a string belonging to the language defined by CFG-1, could be parsed in either $id + (id + id)$ or $(id + id) + id$.
This is because the Associativity is not specified here.
In some other cases, without specifying Precedence of operations can also lead to ambiguous grammar. For example, $id + id * id$ could be parsed as $(id + id) * id$ or $id + (id * id)$ by grammar $E \rightarrow E + E | E * E | id$.
Conflicts
There are mainly 2 conflicts mentioned in this context: shift/reduce conflict and reduce/reduce conflict.
-
shift/reduce conflict happens when the parser could not decide whether to shift or to reduce. In the above $id + id + id$ parsing example, we came across such a conflict.
-
reduce/reduce conflict happens when there’re multiple reduction productions.
A simple reduce-reduce conflict example
A no-brain example could be:
$$ E \rightarrow A | B $$ $$ A \rightarrow x $$ $$ B \rightarrow x $$ when we have an $x$ in the parsing stack, while we have 2 productions $A \rightarrow x$ and $B \rightarrow x$ producing $x$, the parser could not decide which one to apply.
A complicated reduce-reduce conflict example
Supposer the grammar is like: $$ S \rightarrow A(S)B | \epsilon $$ $$ A \rightarrow S | SB | x | \epsilon $$ $$ B \rightarrow SB | y $$
The initial SLR(1) parsing automaton state of the grammar contains following LR(0) items: $$ S’ \rightarrow .S $$ $$ S \rightarrow .A(S)B $$ $$ S \rightarrow . $$ $$ A \rightarrow .S $$ $$ A \rightarrow .SB $$ $$ A \rightarrow .x $$ $$ A \rightarrow . $$
This grammar has a reduce-reduce conflict on input (
.
Why?
Because (
is both in $FOLLOW(A)$ (line 1 of the grammar) and $FOLLOW(S)$ (line 2 of the grammar we know that $FOLLOW(A) \subseteq FOLLOW(S)$).
Intuitively, the fact tells us that (
is valid following both an A
and an S
.
Notice that both A
and S
could be reduced to an $\epsilon$,
that’s why here we’re facing a situation where there’s more than one valid reductions.
Decision of the Action
The root for the conflict is $I_4$, where we have 2 LR(0) items, which are:
$$
E \rightarrow E + E. \tag{Item-1}
$$
and
$$
E \rightarrow E. + E \tag{Item-2}
$$
.
For Item-1, we already reach the end,
the choice for it is reduction.
While for Item-2, if the next symbol is +
,
then it’s possible to do a shift.
In this case, we say we have a shift/reduce conflict on +
.
If you have any feedback to this article, feel free to comment here or send an email to me