Workshop on
               Machine Learning  for Information Extraction
                           Monday 21 August 2000
              to be held in conjunction with the 14th European
               Conference on Artificial Intelligence (ECAI),
                        BERLIN, HUMBOLDT UNIVERSITY

             Fabio Ciravegna (contact) ITC-irst Centro per la
             Ricerca Scientifica e Tecnologica,
             Roberto Basili Universitity of Roma Tor Vergata,

             Robert Gaizauskas, University of Sheffield

The exponential increase in the quantity of  textual information held in
digital archives has fuelled growing interest in computer-assisted
techniques for information extraction  from text (IE).  IE systems, as
understood by the applied natural language processing community, identify
predetermined relevant information in text documents from some specific
domain. Once extracted, the information can be used for a number of
purposes: database population, text indexing, information highlighting, and
so on. While significant progress in constructing such systems has been
made, stimulated in particular by the DARPA Message Understanding
Conferences, by general agreement the main barriers to wider use and
commercialisation of IE are the difficulties in adapting systems to new
applications and domains. Porting IE systems is generally both difficult
and expensive, given the current technology, since changes generally need
to be carried out manually by highly skilled experts. Moreover some sources
(e.g. Web pages) may change very rapidly in both format and content.
Tracking all the changes and continuously re-adapting IE systems is very
expensive or even unfeasible if done manually.
To address these difficulties there has been increasing interest in
applying machine learning (ML) techniques to Information Extraction from
text. Tasks to which ML has been applied include template design, template
filling, named entity recognition and resource compilation (e.g. lexicons,
knowledge structures, grammars). The kind of sources analysed range from
structured texts (e.g. Web pages) to semi-structured texts (e.g. rental
ads) to free texts (e.g. newspaper articles). ML techniques which have been
used range from symbolic (e.g. inductive logic programming,
transformation-based learning, etc.) to numerical methods (e.g.
naive-Bayes, maximum entropy, etc.).However, the current situation is
characterized by isolated experiments in which individual ML techniques are
applied to specific IE tasks. What is lacking is a unifying view of the
issue of adopting ML techniques for IE.
The proposed workshop aims to establish a forum for discussing current and
future trends of the application of ML to IE, with a specific focus on the
identification of a unifying view of the issue. The workshop has the
following goals:

   * to bring together communities of researchers that address the ML for
     IE problem from different perspectives (e.g., natural language
     processing, information retrieval, machine learning, information
     integration);
   * to deepen the European IE community's understanding of the state of
     the art;
   * to identify further IE-related problems for which ML techniques might
     be appropriate.

Particularly welcomed are contributions concerning:

   *  descriptions of techniques adaptable for different languages, tasks
     and/or text typologies;
   * proposals of unifying views on the current or future application of ML
     to IE.

In the interest of promoting as much discussion as possible, the number of
paper presentations will be limited in favour of panels and posters. A
final panel will discuss the research agenda for the coming years.
Attendance will be limited to 30 participants


Important dates

                   Submission
                   deadline:          12 March 2000
                   Notification of
                   acceptance:        7 May 2000
                   Camera-ready
                   versions of        7 June 2000
                   accepted papers
                   due:

 For any information please contact Fabio Ciravegna (cirave@irst.itc.it)