Workshop on Machine Learning for Information Extraction Monday 21 August 2000 to be held in conjunction with the 14th European Conference on Artificial Intelligence (ECAI), BERLIN, HUMBOLDT UNIVERSITY Fabio Ciravegna (contact) ITC-irst Centro per la Ricerca Scientifica e Tecnologica, Roberto Basili Universitity of Roma Tor Vergata, Robert Gaizauskas, University of Sheffield The exponential increase in the quantity of textual information held in digital archives has fuelled growing interest in computer-assisted techniques for information extraction from text (IE). IE systems, as understood by the applied natural language processing community, identify predetermined relevant information in text documents from some specific domain. Once extracted, the information can be used for a number of purposes: database population, text indexing, information highlighting, and so on. While significant progress in constructing such systems has been made, stimulated in particular by the DARPA Message Understanding Conferences, by general agreement the main barriers to wider use and commercialisation of IE are the difficulties in adapting systems to new applications and domains. Porting IE systems is generally both difficult and expensive, given the current technology, since changes generally need to be carried out manually by highly skilled experts. Moreover some sources (e.g. Web pages) may change very rapidly in both format and content. Tracking all the changes and continuously re-adapting IE systems is very expensive or even unfeasible if done manually. To address these difficulties there has been increasing interest in applying machine learning (ML) techniques to Information Extraction from text. Tasks to which ML has been applied include template design, template filling, named entity recognition and resource compilation (e.g. lexicons, knowledge structures, grammars). The kind of sources analysed range from structured texts (e.g. Web pages) to semi-structured texts (e.g. rental ads) to free texts (e.g. newspaper articles). ML techniques which have been used range from symbolic (e.g. inductive logic programming, transformation-based learning, etc.) to numerical methods (e.g. naive-Bayes, maximum entropy, etc.).However, the current situation is characterized by isolated experiments in which individual ML techniques are applied to specific IE tasks. What is lacking is a unifying view of the issue of adopting ML techniques for IE. The proposed workshop aims to establish a forum for discussing current and future trends of the application of ML to IE, with a specific focus on the identification of a unifying view of the issue. The workshop has the following goals: * to bring together communities of researchers that address the ML for IE problem from different perspectives (e.g., natural language processing, information retrieval, machine learning, information integration); * to deepen the European IE community's understanding of the state of the art; * to identify further IE-related problems for which ML techniques might be appropriate. Particularly welcomed are contributions concerning: * descriptions of techniques adaptable for different languages, tasks and/or text typologies; * proposals of unifying views on the current or future application of ML to IE. In the interest of promoting as much discussion as possible, the number of paper presentations will be limited in favour of panels and posters. A final panel will discuss the research agenda for the coming years. Attendance will be limited to 30 participants Important dates Submission deadline: 12 March 2000 Notification of acceptance: 7 May 2000 Camera-ready versions of 7 June 2000 accepted papers due: For any information please contact Fabio Ciravegna (cirave@irst.itc.it)