GELATO GELATO Research Topic:
LR Techniques for Parsing and Deduction



The class of grammars that can be deterministically analyzed using LR parsing with k lookahead symbols are called LR(k) grammars. They are useful for describing programming languages, but they are very limited when they are used for other purposes, for example parsing of natural languages. If we consider LR parsing tables in which each entry can contain several actions, we obtain non-deterministic LR parsing, often known as generalized LR parsing.

Several implementations of generalized LR parsing has been proposed, most of them using a graph-structured stack instead of a single stack in order to deal with multiple parses of a single sentence. Some of those implementations has been problems with cyclic and hidden left recursive constructions. Moreover, their complexity with respect to the input string is O(np+1), where p is the length of the longer right-hand side of a rule.

However, generalized LR(1) and LALR(1) parsing algorithms for arbitrary context-free grammars can be derived, in a natural way, from the well known Earley's algorithm, preserving cubic time complexity in the worst case but performing better in the average case and attaining linear complexity in the case of LR grammars. That complexity is achieved by the use of dynamic programming to represent the non-deterministic evolution of the stack instead of graph-structured stack representations, as has often been the case in previous approaches

The resulting algorithm can be extended to logic-based grammatical, such as Definite Clause Grammar, recovering the context-free backbone of the grammar in order to build the finite-state engine which guide the parsing process and augmenting the tabulated pieces of information with logical terms.

Selected Readings:

Note: On-line version of these papers are available in the COLE Publications page.


Send comments and suggestions to webmaster@coleweb.dc.fi.udc.es