LAST CALL FOR PAPERS

        WEB CONTENT MINING WITH HUMAN LANGUAGE TECHNOLOGIES
                http://orestes.ii.uam.es/workshop/

workshop to be held at the 5th International Semantic Web Conference,
               Athens, GA, U.S.A. November 5-9 2006
                 http://iswc2006.semanticweb.org/


MOTIVATION, AIM AND SCOPE

With the large growth of the information stored in the World Wide Web,
it is necessary to have available tools for automatic or
semi-automatic analyzes of web data. Hence, a large effort has been
invested in the last years in developing techniques for extracting
patterns and implicit information from the web, a task that is usually
known as Web Mining. Web Mining itself can be divided into three
subtasks according to the kind of data that is collected: web
structure, web usage and web content.

Web content mining consists of automatically mining data from textual
web documents that can be represented with machine-readable semantic
formalisms. Initially, most web content mining systems used wrappers
to map documents to other data structures, but this is highly
dependent on the the layout and formatting instructions inside web
pages. Therefore, alternative approaches, that make use of Natural
Language Processing-based techniques, are increasingly used.

While more traditional approaches to Information Extraction from text,
such as those applied to the Message Understanding Conferences during
the nineties, relied on small collections of documents with many
semantic annotations, the characteristics of the web (its size,
redundancy and the lack of semantic annotations in most texts) favor
efficient algorithms able to learn from unannotated data. Furthermore,
new types of web content such as web forums, blogs and wikis, some of
them included in the so-called Web 2.0, are also a source of textual
information that contain an underlying structure from which specialist
systems can benefit. The workshop will give special emphasis to how
existing techniques can benefit from these kinds of contents.

This workshop aims at bringing together researchers from the Semantic
Web, the Natural Language Processing and the Text Mining
communities. The web constitutes a unique source of information to
train and exploit systems for tasks such as Named Entity
Identification and Classification, Term Identification, Relationships
Extraction, Ontology Learning and Population from text and Text
Mining. The Semantic Web community can contribute providing semantic
formalisms and tools for knowledge representation and reasoning for
exploiting the extracted metadata. The goal of the workshop is to
establish communication between all these communities.  

TOPICS OF INTEREST

Topics of interest include, but are not limited to:

    * Term Identification for specialist domains using web corpora, as
      an initial step for ontology construction.

    * Extracting taxonomic and non-taxonomic relationships from the
      web.

    * Automatic ontology-based semantic annotation and Information
      Extraction of web content.

    * Mining semantic information from blogs, forums or news sources.

    * Automatic annotation in Semantic Wikis.

    * Integrating mined information with semantic resources.

    * Semantic annotation of multilingual web sources.

    * Burst detection from web sources.

    * Multi-webpage Named Entity Coreference

    * Usage scenarios for the combination of the Semantic Web, Human
      Language Techonologies, Text Mining, decision support, etc.

IMPORTANT DATES:

 1 August 2006    - Paper submission

 5 September 2006 - Acceptance notification

18 September 2006 - Camera-ready papers

 6 November 2006  - Electronic version of the proceedings available: 

SUBMISSIONS

Paper submissions must be formatted in the style of the Springer
Publications format for the Lecture Notes in Computer Science series,
and submitted as PDF documents. We accept two kinds of papers:

    * Full papers, with a length limit of 10 pages.
    * Short position papers, with a length limit of 5 pages. 

In both cases, the names of the authors should not appear in the
paper, in order to ensure a blind review process.

At least one author must register for each accepted submission, for
the
paper to appear in the workshop proceedings.


ORGANISING COMMITTEE
(alphabetical ordering)

Enrique Alfonseca - Universidad Autonoma de Madrid, 
      Tokyo Institute of Technology.
Thierry Declerck - DFKI GmbH, Germany.
Manabu Okumura - Tokyo Institute of Technology.
Satoshi Sekine - New York University.
Hiroya Takamura - Tokyo Institute of Technology. 

PROGRAM COMMITTEE
(alphabetical ordering)

Eneko Agirre (University of the Basque Country)
Roberto Basili (University of Roma, Tor Vergata)
Paul Buitelaar (DFKI, Germany)
Philipp Cimiano (University of Karlsruhe)
Nigel Collier (National Institute of Informatics)
Hamish Cunningham (University of Sheffield)
Julio Gonzalo (UNED)
Ralph Grishman (New York University)
Siegfried Handschuh (DERI, Ireland)
Dimitrios Kokkinakis (University of Gothenburg)
Bernardo Magnini (ITC-IRST)
Gideon Mann (John Hopkins University)
Antonio Moreno-Sandoval (Universidad Autonoma de Madrid)
Nicolas Nicolov (Umbria Inc, USA)
Viktor Pekar (University of Wolverhampton)
Deepak Ravichandran (Google Inc.)
German Rigau (University of the Basque Country)
Stephen Staab (University of Koblenz-Landau)
Vojtech Svatek (University of Economics, Prague, Czech Republic)
Felisa Verdejo (UNED)