-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- C A L L F O R P A P E R S Building and Using Parallel Texts: Data Driven Machine Translation and Beyond An HLT-NAACL 2003 Workshop Edmonton, Alberta May 31 or June 1, 2003 http://www.cs.unt.edu/~rada/wpt -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- The goal of this workshop is to provide a forum for researchers working on problems related to the creation and use of parallel text. Recent events have demonstrated once again the importance of inter-language communication, and reinforce the need for advances in machine translation (MT) and multi-lingual processing tools. The workshop will be centered around the problem of building and using parallel corpora, which are vital resources for efficiently deriving multi-lingual text processing tools. In addition to regular papers, the workshop also includes a shared task that will result in a comparative evaluation of word alignment techniques. We invite submissions of papers addressing any of the following issues: - Construction of parallel corpora, including the automatic identification and harvesting of parallel corpora from the Web. - Methods to evaluate the quality of parallel corpora and word alignments - Tools for processing parallel corpora, including automatic sentence alignment, word alignment, phrase alignment, detection of omissions and gaps in translations, and others - Using parallel corpora for data driven Machine Translation - Using parallel corpora for the derivation of language processing tools in new languages - Using parallel corpora for automatic corpora annotation - Language learning applied to parallel corpora - Translation memory systems as a source of aligned corpora While we invite submissions addressing any of the above topics, or related issues, we particularly welcome work involving parallel corpora addressing languages with scarce resources. We expect to make arrangements with a journal in Natural Language Processing or Computational Linguistics for a special issue that will include selected papers from this workshop. Invited Speaker: -=-=-=-=-=-=-=-= Elliot Macklovitch, University of Montreal Shared Task: -=-=-=-=-=-= All researchers who have a word alignment system available are invited to participate in the shared task, individually or as part of a team. Participants in the shared task will be provided with common sets of training data, consisting of Romanian-English and French-English parallel texts. Participants will be given approximately one month to train their systems with this data, and then previously held out test data will be released. Participants will run their alignment system on this test data and submit their results, which will be evaluated using a common set of metrics. See the workshop website for details regarding the shared task. Submission format: -=-=-=-=-=-=-=-=-= Submissions should consist of regular full papers of max. 7 pages, formatted following the NAACL 2003 guidelines. In addition, teams participating in the word alignment shared task are invited to submit short papers (max. 4 pages) describing their systems and/or evaluation methodology. Send your submission (a ps or pdf file), prepared for anonymous review, to both: Rada Mihalcea, University of North Texas, rada@cs.unt.edu and Ted Pedersen, University of Minnesota, Duluth, tpederse@d.umn.edu Important dates: -=-=-=-=-=-=-=-= Deadline for regular paper submissions: March 10 Deadline for results submissions: March 25 (shared task) Deadline for short paper submissions: April 1 (shared task) Notification of acceptance for regular papers: April 1 Deadline for camera-ready papers: April 10 Organisation Committee: -=-=-=-=-=-=-=-=-=-=-=- Rada Mihalcea, University of North Texas Ted Pedersen, University of Minnesota, Duluth Program Committee: -=-=-=-=-=-=-=-=-= Lars Ahrenberg, Linkoping University Nicoletta Calzolari, University of Pisa Tim Chklovski, Massachusetts Institute of Technology Mona Diab, University of Maryland Ulrich Germann, Information Sciences Institute Daniel Gildea, University of Pennsylvania Maria das Gracas Volpe Nunes, University of Sao Paulo Nancy Ide, Vassar College Lucia Helena Machado Rino, Federal University of Sao Carlos Eduard Hovy, University of Southern California / Information Sciences Institute Philippe Langlais, University of Montreal Elliot Macklovitch, University of Montreal Daniel Marcu, University of Southern California / Information Sciences Institute Dan Melamed, New York University Magnus Merkel, Linkoping University Ruslan Mitkov, University of Wolverhampton Hermann Ney, RWTH Aachen Franz Och, Information Sciences Institute Kemal Oflazer, Sabanci University Kishore Papineni, IBM Jessie Pinkham, Microsoft Research Andrei Popescu-Belis, ISSCO/TIM/ETI University of Geneva Florence Reeder, MITRE Philip Resnik, University of Maryland Antonio Ribeiro, Joint Research Centre, Ispra, Italy Michel Simard, University of Montreal Harold Somers, University of Manchester Institute of Science and Technology Arturo Trujillo, Canon Research Centre Europe Jean Veronis, University of Provence Clare Voss, Army Research Lab Yorick Wilks, University of Sheffield