GELATO Research Topic:
Tree Adjoining Grammars and Related Frameworks

The class of Mildly Context-Sensitive Languages (MCSL) is placed between Contex-Free Languages and Context-Sensitive Languages. Several grammar formalisms can be used to describe the languages belonging to this class, Tree Adjoining Grammars (TAG) and Linear Indexed Grammars (LIG) being are the two most relevant ones. The importance of TAG is due to its ability to express certain linguistic fenomena naturally. The relevance of LIG is due to its greater suitability for computational treatment. In fact, parsing algorithms for TAG and related formalisms often consider the compilation of the source grammar to an equivalent LIG.

Indexed Grammars (IG) are an extension of Context-free Grammars in which a stack of indices is associated with each non-terminal symbol. Linear Indexed Grammars (LIG) are a restricted form of Indexed Grammars in which the index stack of at most one body non-terminal (the child) is related with the stack of the head non-terminal (the father). The other stacks of the production must have a bounded stack size.

Tree Adjoining Grammars are a extension of CFG that uses trees instead of productions as primary representing structure. Formally, a TAG is a 5-tuple (V_N,V_T,S,I,A), where V_N is a finite set of non-terminal symbols, V_T a finite set of terminal symbols, S the axiom of the grammar, I a finite set of initial trees and A a finite set of auxiliary trees. The union of I and A is the set of elementary trees. Internal nodes are labeled by non-terminals and leaf nodes by terminals or the empty string, except for exactly one leaf per auxiliary tree (the foot) which is labeled by the same non-terminal used as label of the its node. New trees are derived by adjoining: let be T a tree containing a node labeled by A and let be T' an auxiliary tree with root and foot node also labeled by B. Then, the adjoining of T' into T is obtained by excising the subtree of T which have as root the node labeled by B (which is called the adjunction node), attaching T' to that node and attaching the excised subtree to the foot of T'.

Different types of automata (Embedded Push-Down Automata, Bottom-up Embedded Push-Down Automata, 2-Stack Automata and so on) have been proposed for recognizing of MCSL, some to describe top-down strategies, some to describe bottom-up strategies, but none that are able to describe both kinds of strategies.

We have defined the class of strongly-driven 2-Stack Automata that may be used to describe parsing strategies for TAGs and LIGs, ranging from purely top-down to purely bottom-up, and presenting a tabular interpretation of these automata in complexity O(n⁶). The tabular interpretation follows the principles of Dynamic Programming: the derivations are broken into elementary sub-derivations that may be combined in different contexts to retrieve all possible derivations and may be represented in a compact way, allowing tabulation.

Another approach for defining a general automata model for MCSl could be to define a translation of LIG symbols to DCG, in which non-terminal symbols of LIG are unary predicates of DCG, stacks in the form of lists are the arguments of predicates, and variables are used to represent dependent parts of the stacks. Using this translation, Logic Push-Down Automata (LPDA) can be tuned in order to obtain an operational formalism to MCSL. Every parsing algorithm developed for DCG using LPDA could be applied to LIG. Completeness is therefore guaranteed, as in the general case, but the DCG obtained from LIG can not have the classical properties guaranteeing termination, like off-line parseability or depth-boundness. However, the termination can be ensured because each step in a derivation depends only on which of a finite set of ``states'' the derivation is in. This property is known as context-freeness property of LIG.

Selected Readings:

E. de la Clergerie and M. A. Alonso Pardo,
A Tabular Interpretation of a Class of 2-Stack Automata,
in Proc. of 17th International Conference on Computational Linguistics (COLING'98) and 36th Annual Meeting of the Association for Computational Linguistics (ACL'98), Montreal, Canada, 1998.
E. de la Clergerie, M. A. Alonso Pardo and D. Cabrero Souto,
A Tabular Interpretation of Bottom-up Automata for TAG,
in Proc. of TAG+4, Fourth International Workshop on Tree-Adjoining Grammars and R elated Frameworks, pp. 42-45, Philadelphia, PA, USA, 1998.

Note: On-line version of these papers are available in the COLE Publications page.

Send comments and suggestions to webmaster@coleweb.dc.fi.udc.es

GELATO Research Topic: Tree Adjoining Grammars and Related Frameworks

Selected Readings:

GELATO Research Topic:
Tree Adjoining Grammars and Related Frameworks