First call for papers for a COLING/ACL 2006 Workshop on How can Computational Linguistics improve Information Retrieval? Organisers: John Tait, University of Sunderland, UK Michael Oakes, University of Sunderland, UK It is striking how rarely techniques from computational linguistics have been demonstrated to be helpful in performing the conventional Information Retrieval (IR) task By the conventional IR task we mean asearch in which a short query, often in the form a list of keywords, is provided with the desired result being a list of documents, ranked in terms of their relevance to the need underlying the query. This, of course, is the IR task as encapsulated by internet search engines. Although there have been one or two examples where techniques like Word Sense Disambiguation or deeper syntactic or semantic analysis have been shown to be useful for indexing documents in large scale classic Information Retrieval experiments (for example Strzalkowski and colleagues at TREC-2, Pirkola and Jarvelin's 1996 IP&M paper or Stokoe, Oakes and Tait in SIGIR 2003), information retrieval techniques using ever more sophisticated statistical models (which have demonstrated a 40% improvement in effectivenesss since TREC began in 1992) have almost always outperformed approaches which are more linguistically motivated. Of course in some more specialised tasks, especially question answering and summarising, techniques from computational linguistics have proven their worth: but even here the best performing systems frequently combine statistical techniques with more linguistically motivated ones. The workshop will explore why this is the case, and to what extent more appropriate and better performing computational linguistic techniques can improve the performance of text information retrieval systems. In particular we are calling position and discussion papers on the following topics: o Is the conventional information retrieval task formulated in a way which prevents or obstructs computational linguistics contributing; o Does statistical information retrieval in fact capture the relevant properties of language but in a form which is inaccessible or hidden? o Are assumptions made in computational linguistics about the nature of lexical semantics and the structural properties of well formed running text in some way ill founded, at least for the information retrieval task? o Is there some property of language (for example semantic redundancy) which means that the relatively crude statistical techniques capture enough information to obtain the available improvements in performance? o Is the problem that computational linguistic techniques are too unreliable or narrowly applicable, so improved performance on some documents or queries is masked by worse performance on others? Papers will also be accepted on closely related topics. A major outcome of the day will be a research agenda for increased contribution to information retrieval from computational linguistics and an enhanced dialogue between the two disciplines, following up on the Electra workshop held at SIGIR 2005. It is also hoped to produce a journal special issue or a book based a selected and extended workshop submissions. Paper Submission Submissions should follow the two-column format of ACL proceedings and should not exceed eight (8) pages, including references. We strongly recommend the use of the LaTeX style files or Microsoft Word document template that will be made available on the COLING-ACL main conference Web site (http://www.acl2006.mq.edu.au/). As reviewing will be blind, the paper should not include the authors' names and affiliations. Furthermore, self-references that reveal the author's identity, e.g., "We previously showed (Smith, 1991) ...", should be avoided. Instead, use citations such as "Smith previously showed (Smith, 1991) ...". Submission will be electronic using the paper submission START system, and they must be in Adobe PDF format. The papers must be submitted no later than March 24, 2006. Papers submitted after that time will not be reviewed. For details of the submission procedure, please consult the submission webpage reachable via the workshop website. Outline Program 09:00 Opening and scene setting 09:30 Invited talk - ~Jaime Callan, CMU (tbc) 10:15 Submitted Papers 11:00 Morning Tea 11:15 Submitted papers 12:30 Lunch 1:30 Submitted Papers 3:00 Afternoon Tea 3:15 Submitted Papers 4:00 Discussion Panel 5:00 Close Programme Committee John Tait, University of Sunderland, UK (Chair) Michael Oakes, University of Sunderland, UK (Co-Chair) Branimir Boguraev, IBM, USA Bruce Croft, Umass Amherst, USA Gakl Dias, University of Beira Interior, Portugal Hang Cui, National University of Singapore Noriko Kando, NII, Japan Rob Gaizauskas, University of Sheffield, UK Mark Sanderson, University of Sheffield, UK Alexander Gelbukh, National Polytechnic Instiute, Mexico Tomek Strzalkowski University at Albany, USA Karen Sparck Jones, University of Cambridge, UK Rosie Jones, Yahoo, USA Liz Liddy, Syracuse University, USA Lucia Rino, UFSCAR, Brazil Chris Stokoe, University of Sunderland, UK Simone Teufel, University of Cambridge, UK Olga Vetchimova, University of Waterloo, Canada Mirella Lapata University of Edinburgh, UK Stephen Clark, University of Oxford, UK Key Dates Deadline for Submission 24 March 2006 Decisions to Authors 8 May 2006 Final Copy of Accepted Papers Friday 19 May 2006 Workshop Sunday 23rd July 2006 Workshop Contact Details http://www.cet.sunderland.ac.uk/cliir/ cliir@sunderland.ac.uk