Call for Papers COLING/ACL 2006 Workshop INFORMATION EXTRACTION BEYOND THE DOCUMENT 22nd July 2006, Sydney, Australia Organisers: Mary Elaine Califf (Illinois State University) Mark A. Greenwood (University of Sheffield) Mark Stevenson (University of Sheffield) Roman Yangarber (University of Helsinki) Traditional approaches to the development and evaluation of Information Extraction (IE) systems have relied on relatively small collections of up to a few hundred documents tagged with detailed semantic annotations. While this paradigm has enabled rapid advances in IE technology, it remains constrained by a dependence on annotated documents and does not make use of the information available in large corpora. Alternative approaches, which make use of large text collections and inter-document information, are now beginning to emerge -- as evidenced by a parallel emergence of interest in learning >From unlabelled data in AI in general. For example, some systems learn extraction patterns by exploiting information about their distribution across corpora; others exploit the redundancy of the internet by assuming that facts with multiple mentions are more reliable. These approaches require large amounts of unannotated text, which is generally easy to obtain, and employ unsupervised or minimally supervised learning algorithms, as well as related techniques such as co-training and active learning. These alternative approaches are complementary to the established IE paradigm based on supervised training, and are now forming a cohesive emergent trend in recent research. They will constitute the focus of this workshop. There are several advantages to employing large text collections for IE. They provide enormous amounts of training data, albeit mostly unannotated. Facts can be extracted from, or verified across, multiple documents. Large text collections often contain vast amounts of redundancy in the form of multiple references to or mentions of closely related facts. Redundancy can be exploited in the IE setting to identify trends and patterns within the text, e.g., by means of Data Mining techniques. This workshop invites new, original work on learning extraction rules or identifying facts across document boundaries while exploiting sizable amounts of unlabelled text in the training stage, in the extraction stage, or both. The workshop hopes to bring together researchers from the various related areas, such as Information Extraction, Data Mining, biomedical text processing, Question Answering, Information Retrieval, Machine Learning, identification of lexical relations (hyponymy, meronymy etc.), multi-lingual text processing and the Semantic Web. This workshop solicits papers on all relevant aspects, including algorithms, techniques and applications. Topics of particular interest include: - Extraction of information described across documents - Integration and mutual benefits of IE and Data Mining - Extraction of information from massive corpora (such as the Internet) - Mutual applications and interaction between Information Extraction and the Semantic Web - Verification of information using external sources - Exploiting cross-lingual and multi-lingual approaches for improving performance in IE ------------------------- IMPORTANT DATES ------------------------- Submission Deadline: March 31st, 2006 Notification of acceptance: May 12th, 2006 Camera-ready papers due: May 29th, 2006 -------------------------- SUBMISSION INSTRUCTIONS -------------------------- Authors are invited to submit original, unpublished work on the topic areas of the workshop. Submissions should follow the standard two-column formatting instructions for the main COLING/ACL 2006 conference. Submitted papers should be no longer than eight (8) pages in length, including references. We strongly recommend the use of the Latex and Microsoft Word style files which will be available on the main conference website. As reviewing will be blind, the paper should not include the authors' names and affiliations. Furthermore, self-references that reveal the author's identity, e.g., "We previously showed (Smith, 1991) ...", should be avoided. Instead, use citations such as "Smith previously showed (Smith, 1991) ...". Submission will be electronic. Details will appear on the workshop web site (http://nlp.shef.ac.uk/result/iebd06). Questions regarding the submission procedure should be directed to Mark Greenwood (mark@dcs.shef.ac.uk). -------------------------- WORKSHOP ORGANIZERS -------------------------- Mary Elaine Califf School of Information Technology, Illinois State University Mark A. Greenwood Department of Computer Science, University of Sheffield Mark Stevenson Department of Computer Science, University of Sheffield Roman Yangarber Department of Computer Science, University of Helsinki -------------------------- PROGRAM COMMITTEE -------------------------- Markus Ackermann (University of Leipzig) Amit Bagga (AskJeeves) Roberto Basili (University of Rome, Tor Vergata) Antal van den Bosch (Tilburg Uniersity) Neus Catala (Universitat Polithcnica de Catalunya) Walter Daelemans (University of Antwerp) Jenny Rose Finkel (Stanford University) Robert Gaizauskas (University of Sheffield) Ralph Grishman (NYU) Takaaki Hasegawa (NTT) Heng Ji (NYU) Nick Kushmerick (University College Dublin, Ireland) Alberto Lavelli (ITK-IRST, Italy) Gideon Mann (John Hopkin's University) Ion Muslea (Language Weaver Inc.) Chikashi Nobata (Sharp, Japan) Ellen Riloff (University of Utah) Tony Rose (Cognia Ltd.) Stephen Soderland (University of Washington) Kiyotaka Uchimoto (CRL, Japan) Yorick Wilks (University of Sheffield)