GELATO Research Topic:The popularity of definite clause grammars (DCGs) is often related to natural language processing. In comparison with other formalisms, they seem to be particularly well-suited to control the perspicuity with which linguistic phenomena may be understood and expressed in actual language descriptions. However, descriptive adequation does not guarantee operational efficiency, and computational tractability is required if we intend to use descriptions for the mechanical processing. Though much research has been devoted to this subject, most of the practically usable work deals with the reduction of backtracking phenomena when parsing. To attain this goal, authors often follow two different approaches:
Due to non-determinism of DCGs, it is convenient to merge the search space as much as possible. This saves on the space needed to represent items, and also on their processing. We exploit the possibilities of dynamic programming taking S1 as the dynamic frame. We also index the parse by string position. So, we limit the search space at the time of recovery, as parsing progresses, by deleting information relating to earlier string positions. This relies on the concept of itemset, for which we associate a set of items to each token in the input string, that represents the state of the parsing process at that point of the scan. And we attach to each item a back pointer to the itemset associated to the input symbol at which we began to look for that configuration of the LPDA, as well as a pointer to the current itemset.
S1 guarantees the best sharing quality for a given evaluation scheme, but the choice of this scheme can alter perceptibly the results. A balance between computational and sharing efficiency, and parser size is the best basis to decide. We focus on LALR(1)-like methods, which have a moderate splitting state phenomenon, improving both sharing and efficiency. To build the driver, we recover the context-free backbone of the DCG. We obtain terminals from the extensional database and non-terminals from heads in the intensional one. Terms with the same functor, but different number of arguments correspond to different symbols. We then build the LALR(1) automaton, probably non-deterministic. To communicate the driver and the logical engine, we augment items with the state in which the driver is.