EACL 2006 Workshop: 2nd WEB AS CORPUS April 4 2006, Trento, Italy http://sslmit.unibo.it/~baroni/web_as_corpus_eacl06.html The EACL 2006 Workshop on the Web as Corpus will be hosted in conjunction with the 11th Conference of the European Chapter of the Association for Computational Linguistics that will take place April 3-7, 2006, in Trento, italy. * TOPICS Despite the fact that a growing body of work has shown that the World Wide Web is a mine of language data of unprecedented richness and ease of access (see, e.g., the papers collected in Kilgarriff and Grefenstette, 2003), many fundamental issues about the viability and exploitation of the Web as a linguistic corpus are just starting to be tackled, ranging from Web frequency distributions and registers, to efficient handling of massive data sets, to copyright. Research on the Web as corpus is currently at a very exciting stage: increasing evidence points to the enormous potential of the Internet as a source of linguistic data, but we are still far from a working, fully-fledged linguists' search engine. We invite submissions which: * describe Web corpus collection projects, or modules for one part of the process (crawling, filtering, language-id, tokenising, lemmatising, POS-tagging, indexing, ...) * explore characteristics of Web data, from a linguistics/NLP perspective * use crawled Web data for NLP purposes. Preference will be given to projects where Web data are downloaded and processed directly, rather than via search engine interfaces. * SUBMISSIONS Authors are invited to submit full papers on original, unpublished work in the topic area of this workshop. Submissions should follow the two-column format of ACL proceedings and should not exceed eight (8) pages, including references. We strongly recommend the use of ACL LaTeX or Microsoft Word style files tailored for this year's conference available at http://eacl06.itc.it/submission/submission.htm Papers must conform to the official EACL-06 style guidelines, and we reserve the right to reject submissions that do not conform to these styles, including font size restrictions. Submissions should be in PDF format and must include all fonts, so that the paper will print (not just view) anywhere. Please submit your paper no later than January 6, 2006. Details on the submission procedure will be available soon on the workshop Website. Each submission will be reviewed at least by two members of the programme committee. Accepted papers will be published in the workshop proceedings. Dual submissions to the main EACL 2006 conference and this workshop are allowed; if you submit to the main session, do indicate this when you submit to the workshop, and specify your EACL submission reference number, for administrative ease. If your paper is accepted for the main session, you should withdraw your paper from the workshop upon notification by the main session. * REGISTRATION Information on registration and registration fees will be provided at the conference web page. * IMPORTANT DATES January 6, 2006 - Deadline for workshop papers January 27, 2006 - Notification of acceptance February 10, 2006 - Camera-ready papers due April 4, 2006 - Workshop As the schedule is extremely tight, deadline extensions are NOT possible. * PROGRAMME COMMITTEE Marco Baroni (co-chair) Silvia Bernardini Massimiliano Ciaramita Stefan Evert William H. Fletcher Gregory Grefenstette Frank Keller Adam Kilgarriff (co-chair) Mirella Lapata Anke Luedeling Philip Resnik Serge Sharoff * FURTHER INFORMATION Workshop web page http://sslmit.unibo.it/~baroni/web_as_corpus_eacl06.html Conference web page http://eacl06.itc.it/ EACL 2006 Workshops site http://www.science.uva.nl/~mdr/EACL2006Workshops/ * CONTACT INFORMATION Adam Kilgarriff Lexical Computing Ltd 71 Freshfield Road, Brighton BN2 0BL, UK +44 7971 867845 adam@lexmasterclass.com