GELATO Research Topic:
Definite Clause Grammars

The popularity of definite clause grammars (DCGs) is often related to natural language processing. In comparison with other formalisms, they seem to be particularly well-suited to control the perspicuity with which linguistic phenomena may be understood and expressed in actual language descriptions. However, descriptive adequation does not guarantee operational efficiency, and computational tractability is required if we intend to use descriptions for the mechanical processing. Though much research has been devoted to this subject, most of the practically usable work deals with the reduction of backtracking phenomena when parsing. To attain this goal, authors often follow two different approaches:

Reduce the search space, using control techniques. Here, we distinguish between dynamic techniques when control is made by rewriting programs and static ones when control is made by an external driver, which determines the action to be performed.
Incorporate dynamic programming techniques which ensure that all parses are made in parallel, eliminating backtracking processes.

Our goal is to combine the advantages of the preceding approaches, eliminating drawbacks. We focus on three aspects: Firstly, improve the quality of sharing, reducing the dependence on the syntactic context. Secondly, avoid extra evaluation work. Finally, reduce the search space by indexing the parse, and implementing a garbage collector facility. We chose as operational model the logical push-down automaton (LPDA), a formal engine that generalizes the dynamic programming aspects of earlier evaluation strategies.

Due to non-determinism of DCGs, it is convenient to merge the search space as much as possible. This saves on the space needed to represent items, and also on their processing. We exploit the possibilities of dynamic programming taking S¹ as the dynamic frame. We also index the parse by string position. So, we limit the search space at the time of recovery, as parsing progresses, by deleting information relating to earlier string positions. This relies on the concept of itemset, for which we associate a set of items to each token in the input string, that represents the state of the parsing process at that point of the scan. And we attach to each item a back pointer to the itemset associated to the input symbol at which we began to look for that configuration of the LPDA, as well as a pointer to the current itemset.

S¹ guarantees the best sharing quality for a given evaluation scheme, but the choice of this scheme can alter perceptibly the results. A balance between computational and sharing efficiency, and parser size is the best basis to decide. We focus on LALR(1)-like methods, which have a moderate splitting state phenomenon, improving both sharing and efficiency. To build the driver, we recover the context-free backbone of the DCG. We obtain terminals from the extensional database and non-terminals from heads in the intensional one. Terms with the same functor, but different number of arguments correspond to different symbols. We then build the LALR(1) automaton, probably non-deterministic. To communicate the driver and the logical engine, we augment items with the state in which the driver is.

Selected Readings:

M. Vilares Ferro and M. A. Alonso Pardo,
An LALR Extension for DCGs in Dynamic Programming,
in C. Martín Vide (ed.), Mathematical Linguistics II, John Benjamins Publishing Company, Amsterdam, The Netherlands, 1997.

Note: On-line version of these papers are available in the COLE Publications page.

Send comments and suggestions to webmaster@coleweb.dc.fi.udc.es

GELATO Research Topic: Definite Clause Grammars

Selected Readings:

GELATO Research Topic:
Definite Clause Grammars