GELATO Research Topic:Grammar formalisms based on the encoding of grammatical information in unification-based strategies enjoy some currency both in linguistics and natural language processing. Such formalisms, as is the case of definite clause grammars (DCGs), can be thought of, by analogy to context-free grammars, as generalizing the notion of non-terminal symbol from a finite domain of atomic elements to a possibly infinite domain of directed graph structures. Although the use of infinite terms can be often avoided in practical applications, the potential offered by cyclic trees is appreciated in language development tasks, where a large completion domain allows the modeling effort to be saved. Unfortunately, in moving to an infinite non-terminal domain, standard methods of parsing may no longer be applicable to the formalism. Typically, the problem manifests itself as gross inefficiency or even non-termination of the algorithms.
One approach guaranteeing termination that has been investigated is based on restricting to some extent the parsing process, for example mandating a non-cyclic context-free backbone, coupling grammar and parsing design or parameterizing parsing algorithms with grammar dependent information. Another approach has been to extend the unification in order to provide the capability of traversing cyclic trees, for example substituting resolution by another unification mechanism such as natural deduction, considering functions and predicates as elements with the same order as variables or generalizing an available algorithm for traversing cyclic lists. We try to combine the advantages provided for the above approaches eliminating, insofar as is possible, their drawbacks.
Experience shows that the most efficient evaluation strategies seem to be those bottom-up approaches including a predictive phase in order to restrict the search space. So, our evaluation scheme is a bottom-up architecture optimized with a control provided by an LALR(1) driver and implemented using dynamic programming techniques. This parsing algorithm has no problems in dealing with non-determinism: it simply explores all possible alternatives in each point of the parsing process. This does not affect the level of sharing, but it can pose problems with termination due to the presence of cyclic structures. Therefore, a special mechanism for representing cyclic terms must be used. At this point, it is important to remark that this mechanism should not decrease the efficiency in the treatment of non cyclic structures. In this context, we have separated cyclic tree traversal in two phases:
From the previous discussion, we can translate the first phase in our traverse strategy to detect cycles in context-free grammars in a dynamic frame using an LALR(1) parser. Given that we have indexed the parse, it is sufficient to verify that in a same itemset the parsing process re-visits a state. In effect, this implies that an empty string has been parsed in a loop within the automaton.
To verify now that we can extend cyclicity to predicate symbols, it will be sufficient to test whether the terms implicated unify. In particular, we must generalize unification to detect cyclic terms with functional symbols. To prevent the unification to loop, the concept of substitution is generalized to include function and predicate symbol substitution. This means modifying the unification algorithm so that these symbols are treated in the same way as for variables. As our bottom-up approach guarantees that at least one of the terms implied in a substitution is ground, the extension of the unification algorithm is limited to one case, when a predicate symbol is present. Here, after testing the compatibility of name and arity with the other term, the algorithm establishes if the associated non-terminals in the driver have been generated in the same state, covering the same portion of the text. If all these comparisons succeed, unification could be possible and we look for cyclicity, but only when these non-terminals show a cyclic behavior in the LALR(1) driver. In this case, the algorithm verifies, one by one, the possible occurrence of repeated terms by comparing the addresses of these with those of the arguments of the other predicate symbol. The optimal sharing of the interpretation guarantees that cyclicity arises if and only if any of these comparisons succeed. In this last case, the unification algorithm stops on the pair of arguments concerned, while continuing with the rest of the arguments.