============================CALL FOR PAPERS============================ ELECTRA Workshop on Methodologies and Evaluation of Lexical Cohesion Techniques in Real-world Applications (Beyond Bag of Words) In association with the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005) Sponsored by Yahoo! Research Labs Pestana Bahia, Salvador, Brazil August 19, 2005 http://research.yahoo.com/workshops/electra2005/ ============================CALL FOR PAPERS============================ GUIDELINES: [1] Description [2] Target Audience [3] Areas of Interest [4] Important Dates [5] Paper Submission [6] Organising Committee [7] Program Committee [8] Contact ---------------- [1] Description: ---------------- Lexical cohesion can be subdivided into two distinct areas: (1) lexical associations, that embody a wide spectrum of language phenomena such as named entities, multiword units, collocations and word co-occurrences and (2) lexical relations that provide evidence of the semantic and discourse structure of text through relations between terms over large distances. The central goal of this workshop is to bring together researchers in NLP and IR to discuss the use of lexical cohesion in text applications, such as document and passage retrieval, question answering, topic segmentation and text summarization. Indeed, despite the fact that both communities are working with the same material (human language), collaboration between them has so far been limited. In this workshop we are interested in pointing at successes and failures of the integration of lexical cohesion in real-world IR applications. On the one hand, lexical cohesion has received much attention in Information Retrieval research during its more than 30-year old history, but so far with mixed results. On the other hand, a considerable amount of research has been devoted to this subject, both in terms of theory and practice, by the Natural Language Processing community, but with limited evaluation in real-world applications. It is clear that we are at a point where both communities should meet in order to discuss related issues. This is the objective of this workshop. In particular, we will address two questions that are of great importance for real-world IR applications. 1) Efficient methodologies for Lexical Cohesion identification Lexical cohesion has received attention in IR research since its outset. We can point to (a) the identification and the use of multiword units for indexing and search, and (b) the extraction of long-distance lexical relations for tasks such as passage retrieval, topic segmentation or text summarization. On the one hand, the interest in multiword units (or phrases) can be partially attributed to the fact that phrases typically have a higher information content and specificity than single words, and therefore represent the concepts expressed in text more accurately than single terms. On the other hand, interest in long-distance lexical relations in text has been motivated in IR research by the realization of the limitations of most IR models that assume term independence in text. As a consequence, a number of techniques have been developed to improve term independence models, such as passage retrieval and query expansion techniques. The choice of the methodologies and techniques for these tasks has always been restricted by the problem of efficiency that is critical for real-world IR applications. Indeed, real-world IR applications are constrained by variables such as processing time and memory space. Identifying and extracting lexical associations and lexical relations is a computationally intensive process. In recent years new algorithms and new technologies have been proposed to introduce lexical cohesion techniques in large scale applications, thus avoiding previous intractable implementations. Previous workshops on lexical cohesion have mainly focused on the unconstrained extraction process. In this workshop, we would like to focus on the comparison of different factors that can influence the scalability of the treatment of lexical cohesion in real-world applications, namely data structures, algorithms, parallel and distributed computing or grid computing. We would also be interested in new methodologies for lexical cohesion that may easily scale to real-world applications based on complexity measurements. 2) Evaluation of the benefits of Lexical Cohesion in IR applications Contiguous lexical associations have often been used in experimental IR systems. Different techniques have been studied for this purpose: (a) statistical methods based on co-occurrence statistics or ngram language modeling techniques (b) hybrid techniques based on simple statistics and shallow linguistic techniques such as part-of-speech tagging and noun-phrase chunking and (c) knowledge-based techniques. However, the importance of the contribution of phrase matching has not been systematically quantified. Moreover, the evaluation of such techniques is difficult in IR applications, as the number of environment variables is very large and each system combines a variety of indexing and matching techniques. Therefore, a more focused and systematic approach towards analyzing the uses of lexical associations in IR and their evaluation is needed. This workshop will provide a framework for such analysis, and will present for discussion a number of challenging questions regarding the use of lexical associations in text. In particular we will ask questions such as: How should multiword units be incorporated into IR models designed for single terms? What weighting models can be used for them? How should they be matched against their lexical-syntactic variants in text? How should we handle non-contiguous lexical associations? How can we avoid over-weighting a phrase occurrence in a document matching more than one phrase in the query? These are only few questions of a huge field of research full of unsolved problems. In contrast with contiguous lexical units, relations between non-contiguous lexical units are important building blocks of the text, forming its lexical cohesion. Indeed, the complete meaning of a word in text can only be realized when it is interpreted in combination with the surrounding words, forming lexical cohesive ties with them. These lexical relations have been used for a number of IR tasks, for example query expansion, passage retrieval, topic segmentation and text summarization. However, most of the techniques do not use deep semantic or discourse structure information in identifying such relations, instead relying on their statistical evidence i.e. their co-occurrence patterns. In fact, very little work has explored the use of NLP techniques such as lexical chaining or discourse analysis that make use of semantic and discourse structure within text to improve the performance of IR applications. One of the main objections to the use of such techniques has been the claim that they are more computationally demanding than statistical co-occurrence techniques. However, with the development of more efficient algorithms by the NLP community it will be interesting to further explore the use of such techniques in IR applications. As a consequence, we would like to gather people who use lexical relations in different subfields of IR. Non-trivial questions are addressed here. What types of lexical relations prove useful for different IR tasks? What statistical models are most effective for the identification of lexical relations for different IR tasks? Can linguistic techniques for identifying lexical relations in text, such as lexical chaining or discourse analysis techniques be useful for any IR tasks? How can contiguous or non-contiguous lexical cohesive relations be identified in text? How can we reliably evaluate and compare these techniques? -------------------- [2] Target Audience: -------------------- This workshop is intended to bring together IR and NLP researchers working on all areas of information retrieval and using lexical associations in information retrieval applications. The objective is to discuss what has been achieved in this area, to establish common themes between different approaches, and to discuss future research directions. ---------------------- [3] Areas of Interest: ---------------------- Papers are invited on, but not limited to, the following topics: * Efficient Techniques for Lexical Cohesion identification * Scalable Algorithms for Lexical Cohesion identification * Lexical Associations and Lexical Relations Resources * Document Representation and Lexical Associations * Document Ranking and Lexical Associations * Single-Term and Phrase Information Retrieval * Passage Retrieval and Lexical Cohesion * Query Expansion and Lexical Associations * Local and Global Context Analysis * Ontology-based Query Expansion * Question Answering and Lexical Relations * Web Search and Lexical Cohesion * Topic Segmentation and Lexical Cohesion * Text Summarization and Lexical Cohesion * Evaluation Standards and Benchmarks * Qualitative and Quantitative Evaluations Papers can cover one or more of these areas. -------------------- [4] Important dates: -------------------- Paper submission deadline: May 15th, 2005 Notification: June 15th, 2005 Camera ready papers: July 1st, 2005 Workshop: August 19th, 2005 --------------------- [5] Paper Submission: --------------------- Papers should follow SIGIR 2005 instructions (http://www.dcc.ufmg.br/eventos/sigir2005/). Papers should be submitted electronically in pdf format only to Rosie Jones [jonesr@yahoo-inc.com]. The following URL transforms postscript files to pdf files (http://www.ps2pdf.com/). The subject line should be "SIGIR 2005 ELECTRA WORKSHOP PAPER SUBMISSION". Because reviewing is blind, no author information should be included as part of the paper (i.e. the names of the authors and references that could identify the authors). An identification page must be sent in a separate email with the subject line "SIGIR 2005 ELECTRA WORKSHOP ID PAGE" and must include title, author(s), keywords, page number and name and email of the contact author. Late submissions will not be accepted. Notification of receipt will be emailed to the contact author shortly after receipt. ------------------------- [6] Organising Committee: ------------------------- Rosie Jones (Yahoo! Inc, United States of America) Olga Vechtomova (University of Waterloo, Canada) Gaël Harry Dias (University of Beira Interior, Portugal) ---------------------- [7] Program Committee: ---------------------- Brigitte Grau - (LIMSI, France) Bruce Croft - (University of Massachusetts, USA) Charlie Clarke - (University of Waterloo, Canada) Diana Inkpen - (University of Ottawa, Canada) Dunja Mladenic - (Josef Stephan Institute, Slovenia) Patrick Pantel - (University of Southern California, USA) Egidio Terra - (Pontifícia Univ. Católica do Rio Grande do Sul, Brazil) Gabriel Lopes - (New University of Lisbon, Portugal) Graeme Hirst - (University of Toronto, Canada) Hal Daume - (University of Southern California, USA) Helena Ahonen-Myka (University of Helsinki, Finland) Murat Karamuftuoglu - (Bilkent University, Turkey) Nicola Stokes - (University College Dublin, Ireland) Peter Turney - (National Research Council Canada, Canada) Rafael Muñoz - (University of Alicante, Spain) ------------ [8] Contact: ------------ Rosie Jones Yahoo! Overture Matching Sciences Yahoo! Inc 74 N. Pasadena Ave, 3F Pasadena, CA 91103 United States of America email: jonesr@yahoo-inc.com