Finite-State Methods and Natural Language Processing - FSMNLP 2007 Sixth International Workshop University of Potsdam, Germany 14 - 16 September 2007 http://www.ling.uni-potsdam.de/fsmnlp2007 mailto://fsmnlp2007@ling.uni-potsdam.de Papers due: 3 June 2007 The aim of the FSMNLP 2007 is to bring together members of the academic, research, and industrial community working on finite-state based models in language technology, computational linguistics, linguistics and cognitive science or on related theory or methods in fields such as computer science and mathematics. The workshop will be a forum for researchers working * on NLP applications, * on the theoretical and implementation aspects, or * on their combination. We invite novel high-quality papers that are related to the themes including but not limited to: 1. NLP applications and linguistic aspects of finite-state methods The topic includes but is not restricted to: * speech, sign language, phonology, hyphenation, prosody * scripts, text normalization, segmentation, tokenization, indexing * morphology, stemming, lemmatisation, information retrieval, spelling correction * syntax, POS tagging, partial parsing, disambiguation, information extraction * machine translation, translation memories, glossing, dialect adaptation * annotated corpora and treebanks, semi-automatic annotation, error mining, searching 2. Finite-state models of language With this more focused topic (inside 1) we invite papers on aspects that motivate sufficiency of finite-state methods or their subsets for capturing various requirements of natural language processing. The topic includes but is not restricted to: * performance, linguistic applicability, finite-state hypotheses * Zipf's law and coverage, model checking against finite corpora * regular approximations under parameterized complexity, limitations and definitions of relevant complexities such as ambiguity, recursion, crossings, rule applications, constraint violations, reduplication, exponents, discontinuity, path-width, and induction depth * similarity inferences, dissimilation, segmental length, counter-freeness, asynchronous machines * garden-path sentences, deterministic parsing, expected parses, Markov chains * incremental parsing, uncertainty, reliability/variance in stochastic parsing, linear sequential machines 3. Practices for building lexical transducers for the world's languages. The topic accounts for usability of finite-state methods in NLP. It includes but is not restricted to: * required user training and consultation, learning curve of non-specialists * questionnaires, discovery methods, adaptive computer-aided glossing and interlinearization * example-based grammars, semi-automatic learning, user-driven learning (see topic 6 too) * low literacy level and restricted availability of training data, writing systems/phonology under development, new non-Roman scripts, endangered languages * linguist's workbenches, stealth-to-wealth parser development * experiences of using existing tools (e.g. TWOL) for computational morphology and phonology 4. Specification and implementation of sets, relations and multiplicities in NLP using finite automata The topic includes but is not restricted to: * regular rule formalisms, grammar systems, expressions, operations, closure properties, complexities * algorithms for compilation, approximation, manipulation, optimization, and lazy evaluation of finite machines * finite string and tree automata, transducers, morphisms and bimorphisms * weights, registers, multiple tapes, alphabets, state covers and partitions, representations * locality, constraint propagation, star-free languages, data vs. query complexity * logical specification, MSO(SLR,matches), FO(Str,<), LTL, generalized restriction, local grammars 5. Constraint-based grammars and k-ary regular relations With this more focused topic (inside 4) we invite researchers from related fields (computational linguists, mathematicians and computer scientists) into discussion that is motivated by constraint-based, declarative approaches to morphology/phonology and computational problems related to them. For example, regular relations in general are not closed under intersection, but restricted use of intersection of relations have proven useful in computational phonology and morphology, and their implementations such as KIMMO, PC-KIMMO, TWOLC, SEMHE, AMAR, WFSC, etc. In the future, new useful approaches and implementations may come up. The approaches may also propagate to other application areas in natural language processing, including finite-state syntax and query languages for parallel annotations in linguistic corpora. The topic includes but is not restricted to: * multi-tape automata, same-length relations and partition-based morphology, Semitic morphology * autosegmental phonology, shuffle, trajectories, synchronization, segmental anchoring, alignment constraints, syllable structure, partial-order reductions * problems related to auto-intersection of multi-tape automata e.g. marked Post Correspondence Problem * varieties of regular languages and relations, descriptive complexity of finite-state based grammars * automaton-based approaches to declarative constraint grammars, constraints in optimality theory * parallel corpus annotations, register automata, acyclic timed automata 6. Machine learning of finite-state models of natural language This topic includes but is not restricted to: * learning regular rule systems, learning topologies of finite automata and transducers * parameter estimation and smoothing, lexical openness * computer-driven grammar writing, user-driven grammar learning, discovery procedures * data scarcity, realistic variations of Gold's model, learnability and cognitive science * incompletely specified finite-state networks * model-theoretic grammars, gradient well/ill-formedness 7. Finite-state manipulation software (with relevance to the above themes) This topic includes but is not restricted to * regular expression pre-compilers such as regexopt, xfst2fsa, standards and interfaces for finite-state based software components, conversion tools * tools such as LEXC, Lextools, Intex, XFST, FSM, GRM, WFSC, FIRE Engine, FADD, FSA/UTR, SRILM, FIRE Station and Grail * free or almost free software such as MIT FST, Carmel, RWTH FSA, FSA Utilities, FSM<2.0>, Unitex, OpenFIRE, Vaucanson, SFST, PCKIMMO, MONA, Hopskip, ASTL, UCFSM, HaLeX, SML, and WFST * results obtainable with such exploration tools as automata, Autographe, Amore, and TESTAS * visualization tools such as Graphviz and Vaucanson-G * language-specific resources and descriptions, freely available benchmarking resources The descriptions of the topics above are not meant to be complete, and should extend to cover all traditional FSMNLP topics. Submitted papers or abstracts may fall in several categories.