A constructive proof is given to show that SLL(k) testing is not in NP. The proof focuses on the verification step of determining membership in NP. An instance of the problem is shown to generate an intractable number of items that must be verified, making the nondeterministic solution unverifiable in polynomial time. Since non-SLL(k) testing is well known to be NP-complete, the complement of an NP-complete problem is not in NP, so it follows that NP≠coNP. Here the abbreviation SLL(k) means strong LL(k), not its original usage for simple LL(k).
The class of nondeterministic polynomial (NP) problems is both interesting and useful in a practical sense. These are problems whose possible solution space is intractably large. However, the decision problems are phrased in such a way that a single solution answers the question. For the problem to be in NP, the solution must be verifiable in a reasonable amount of time. If this were not the case, then no proposed solution could be proven correct, and we are no better off than examining all possible ones.
The complement of an NP decision problem requires a "no" answer to all solutions. Intuitively this seems like a harder problem than just answering a single question. These complement problems are referred to as coNP and in sum they are the class coNP. The relationship of NP to coNP is not obviously clear given that nondeterminism itself tends to muddy the waters of any attempted analysis.
Strong LL(k) is a restricted version of LL(k) parsing in which the left context of the parse is not needed to make parsing decisions. The predicted production is determined by at most the next k lookahead symbols. A definition of the strong LL(k) grammars follows.
Definition: The FIRST_{k} set of a string of symbols in a grammar is a set of k-length strings of terminal symbols that may begin a sentential form derivable from the string of symbols in the grammar. More specifically, for a grammar G = ( N, T, P, S ) FIRST_{k}(α) = { w | α ⇒^{*} wβ } ∪ { ε if α ⇒^{*} ε } where α ∈ (N ∪ T)^{+} β ∈ (N ∪ T)^{*} w ∈ T^{+} For the case of the empty string: FIRST_{k}(ε) = { ε } Note that the length of w may be less than or equal to k, in which case β = ε.
Definition: The FOLLOW_{k} set of a string of symbols in a grammar is a set of k-length terminal symbol strings in the grammar that may follow the string of symbols in some sentential form derivable in the grammar. More specifically, for a grammar G = ( N, T, P, S ) FOLLOW_{k}(β) = { w | S ⇒^{*} αβγ and w ∈ FIRST_{k}(γ) } ∪ { ε if S ⇒^{*} αβ } where α,γ ∈ ( N ∪ T )^{*} β ∈ ( N ∪ T )^{+} w ∈ T^{+}
Definition: A grammar G = ( N, T, P, S ) is said to be strong LL(k), or SLL(k), for some fixed natural number k if for all nonterminals A, and for any two distinct A-productions in the grammar A → α A → β FIRST_{k}(α FOLLOW_{k}(A)) ∩ FIRST_{k}(β FOLLOW_{k}(A)) = Ø where α and β are strings from (N ∪ T)*
When both α and β derive strings that begin with the same terminal symbol, this is referred to as a level one conflict. If at lookahead level two the continuing derivation again includes conflicting terminals, this is referred to as a level two conflict. For an SLL(k) grammar, this can continue up through level k-1, but at level k there can be no conflicts. Analyzing only the conflicting lookahead strings level by level enables doing less work than the naive approach of comparing all of the possible k-length strings. Conflict analysis level by level is just another way of implementing the definition of SLL(k). It is no more or less accurate than full string SLL(k) analysis. In the worst case, it does the same amount of work as the naive analysis.
Definition: For any nonterminal A in a grammar, a conflict is defined as two or more different A-productions being predicted by the same lookahead symbol at the same lookahead level. More specifically, for a grammar G = ( N, T, P, S ) for all nonterminals A, and for any two distinct A-productions in the grammar A → α A → β where α and β are strings from (N ∪ T)*, if FIRST_{k}(α FOLLOW_{k}(A)) ∩ FIRST_{k}(β FOLLOW_{k}(A)) ≠ Ø then each intersection of these sets at each lookahead level n is a conflict.
SLL(k) testing means to test a grammar for the strong LL(k) property. The inputs to SLL(k) testing are a grammar G and a natural number k. A grammar is SLL(k) if and only if there are no conflicts at lookahead level k. So any algorithm whether deterministic or nondeterministic must in some way resolve all conflicts that are present in the grammar. A nondeterministic algorithm is able to answer the question of whether or not a given conflict is resolved at or before lookahead level k.
In the case of non-SLL(k) testing a nondeterministic algorithm is able to provide a single solution that illustrates that the grammar is not SLL(k). This would be an example of a conflict that is not resolved at lookahead level k or before. In the case of SLL(k) testing a nondeterministic algorithm must state that all conflicts are resolved at or before lookahead level k. In effect it must be able to list all of the conflicts stating correctly that each one is resolved at or before level k.
Consider the following grammar, G_{1}.
This grammar is SLL(k), derives every k-length string in its alphabet, and contains 2^{k} conflicts. Since k is an input to SLL(k) testing it should be clear that any certificate for this grammar must necessarily contain an exponential number of parts that must be individually verified. Each conflict is by nature and by definition separate from every other conflict. There is no way to somehow combine conflicts to verify more than one at a time. This exponential number of conflicts is not deterministically verifiable in polynomial time.
A skeptical reader might point out that this simple grammar is obviously SLL(k) by inspection, so there must be some easy way to verify it. Not really. A deterministic algorithm does not have human insight. It must plod through some predetermined sequence of steps to achieve its goal. In the case of this grammar and others like it, the very best that a verification algorithm can do is list out the conflicts stating correctly that each one is resolved at or before level k. Simply listing an exponential number of conflicts takes exponential time, even if no work at all is done in determining that the conflict is resolved. But of course a deterministic algorithm cannot guess, use an oracle, or just intuit the answer. It must execute a repeatable sequence of steps that shows definitively that the answer is correct.
Given an instance of a problem, it is sometimes possible to find a verification algorithm that works for just that instance. For grammar G_{1}, a simple algorithm might be to check that no two A_{n} productions contain an instance of the same terminal symbol. In combination with some other easy checks, this would verify that the grammar is SLL(k) in polynomial time. This particular strategy cannot be considered to be a deterministic algorithm as is required by the verification step of determining membership in NP. Clearly there are an unbounded number of context free grammars. Finding a unique verification algorithm for each of these grammars cannot be done in a reasonable amount of time.
Clearly for the case of k=1, SLL(k) verification is possible in polynomial time. Indeed the full SLL(1) construction of a parser is well known to be a polynomial function of the size of the grammar. For the case of k>1, SLL(k) verification would seem to be a function of the number and length of the lookahead strings that must be compared for equality. A nondeterministic algorithm can supply the lookahead strings, but verification requires that they be compared deterministically. There is nothing that can be done nondeterministically to simplify the deterministic string comparison that must be done. For example, sorting the strings cannot help. They still need to be compared terminal symbol by terminal symbol to determine whether or not they are equal.
An alternate view of the comparison of lookahead strings is to consider them as possibly conflicting at each lookahead level. This enables possibly short circuiting the string comparison at a level less than k. The analogy with a direct string comparison is that when comparing the strings from left to right, if a difference is found before the end of the strings, they are different and so no further comparison is needed.
Lemma 1. The complexity of SLL(k) verification is exponential.
Proof. At a minimum, any certificate for an SLL(k) grammar must provide the capability to verify deterministically that all conflicts in the grammar are resolved at or below level k. The best that a nondeterministic algorithm can do is to provide the conflicting lookahead strings to the verification algorithm for comparison. The actual string comparison must be done deterministically by the verification algorithm if the verification step is to have meaning. Note that not all possible lookahead strings need to be provided and considered, only those that represent an SLL(n) conflict for some n less than k.
The minimum that a verification algorithm can do is to compare the conflicting lookahead strings level by level up to the point where they are different, which resolves the conflicts. Each such comparison is done on a conflict in the grammar, so it follows that the complexity of SLL(k) verification is at least the number of conflicts in the grammar. Since the SLL(k) grammar G_{1} above contains an exponential number of conflicts, the complexity of SLL(k) verification is exponential.
Theorum 1. SLL(k) testing is not in NP.
Proof. By Lemma 1, any algorithm to verify that a given arbitrary grammar is SLL(k) requires greater than polynomial time. So by the well known definition of NP, SLL(k) testing is not in NP.
Theorum 2. NP≠coNP
Proof. Non-SLL(k) testing is well known to be NP-complete. The complement of non-SLL(k) testing is SLL(k) testing. By Theorum 1, SLL(k) testing is not in NP. Since the complement of an NP-complete problem is not in NP, it follows that NP≠coNP.
Theorum 3. P≠NP
Proof. By Theorum 2, P≠NP since it is well known that P is closed under complement.
One real world analogy to the P/NP question is to ask whether it is as easy to solve a problem as it is to recognize a solution to the problem. Clearly not as could be attested to by researchers in any field. A common reaction to a new solution to a problem is for one to wonder why he did not see it earlier since it is so obvious once seen. Not always, but often this is the case. More so for the simple solutions, which tend to be the best ones.
Another simplified statement of the P/NP question is to ask whether in the worst case scenario, it is necessary to consider all of the possible solutions in order to find the correct one. Is there an instance of the problem that requires this even though most often it is not required? Grammars for programming languages are the most common use for SLL(k). Unambiguous versions of these grammars tend to need relatively small values of k and tend to have a manageable number of conflicts at lookahead levels less than k. Yet as has been shown here there exists at least one grammar for which determining SLL(k)-ness seems to be intractable.
Hunt H.B., T.G. Szymanski, and J.D. Ullman (1975). "On the complexity of LR(k) testing," CACM, 18:12, pp. 707-716.