1st CALL FOR PAPERS

SIGIR'07 Workshop PAN

Plagiarism Analysis, Authorship Identification, and Near-Duplicate
Detection

-- http://www.aisearch.de/pan-07 --

In conjunction with the 30th Annual International ACM SIGIR Conference on
Research & Development on Information Retrieval, Amsterdam, 23-27 July
2007.

----------------------------------------------------------

ABOUT THIS WORKSHOP:

The workshop shall bring together experts and prospective researchers
around the exciting and future-oriented topic of plagiarism analysis,
authorship identification, and high similarity search. This topic receives
increasing attention, which results, among others, from the fact that
information about nearly any subject can be found on the World Wide
Web. At first sight, plagiarism, authorship, and near-duplicates may pose
very different challenges; however, they are closely related in several
technical respects.

Plagiarism analysis is a collective term for computer-based methods to
identify a plagiarism offense. In connection with text documents we
distinguish between corpus-based and intrinsic analysis: the former
compares suspicious documents against a set of potential original
documents, the latter identifies potentially plagiarized passages by
analyzing the suspicious document with respect to changes in writing
style.

Authorship identification divides into so-called attribution and
verification problems. In the authorship attribution problem, one is given
examples of the writing of a number of authors and is asked to determine
which of them authored given anonymous texts. In the authorship
verification problem, one is given examples of the writing of a single
author and is asked to determine if given texts were or were not written
by this author. Authorship verification and intrinsic plagiarism analysis
represent two sides of the same coin.

Near-duplicate detection is mainly a problem of the World Wide Web:
duplicate Web pages increase the index storage space of search engines,
slow down result serving, and decrease the retrieval
precision. Near-duplicate detection relates directly to plagiarism
analysis: at the document level, near-duplicate detection and plagiarism
analysis represent also two sides of the same coin. For a plagiarism
analysis at the paragraph level, the same specialized document models
(e.g. shingling, fingerprinting, hashing) can be applied, where a key
problem is the selection of useful chunks from a document.

The development of new solutions for the outlined problems may benefit
from the combination of existing technologies, and in this sense the
workshop provides a platform that spans different views and
approaches. The following list gives examples from the outlined field for
which contributions are welcome (but not restricted to):

- retrieval models for plagiarism analysis, authorship
identification, and style analysis
- software plagiarism, cross-language plagiarism, plagiarism in Web
communities and social networks
- NLP technologies for authorship identification and style analysis
- knowledge-based methods for plagiarism analysis and authorship
identification
- handling proper citation

- methods for identifying near-duplicate and versioned documents
(for all kinds of contents, including text, source code, image, and
music documents)
- shingling, fingerprinting, and similarity hashing
- hash-based search, high-dimensional search, approximate nearest
neighbor search
- efficiency issues and performance tradeoffs

- tailored indexes for plagiarism analysis and near-duplicate detection
- plagiarism analysis and near-duplicate detection on the Web
- evaluation, building of test collections, experimental design and
user studies

IMPORTANT DATES:

Deadline for paper submission May 27, 2007
Notification to authors June 24, 2007
Camera-ready copy due July 1, 2007
Workshop opens July 27, 2007

Contributions will be peer-reviewed by experts from the related field.

WORKSHOP ORGANIZATION:

Benno Stein, Bauhaus University Weimar
Moshe Koppel, Bar-Ilan University, Israel
Efstathios Stamatatos, University of the Aegean

Contact: pan-07@aisearch.de
URL: http://www.aisearch.de/pan-07

PROGRAM COMMITTEE:

Shlomo Argamon, Illinois Institute of Technology

Yaniv Bernstein, Google Switzerland

Dennis Fetterly, Microsoft Research

Graeme Hirst, University of Toronto

Timothy Hoad, Microsoft

Heiko Holzheuer, Lycos Europe

Jussi Karlgren, Swedish Institute of Computer Science

Hans Kleine Büning, University of Paderborn

Moshe Koppel, Bar-Ilan University, Israel

Hermann Maurer, University of Technology Graz

Sven Meyer zu Eissen, Bauhaus University Weimar

Efstathios Stamatatos, University of the Aegean

Benno Stein, Bauhaus University Weimar

Özlem Uzuner, State University of New York

Debora Weber-Wulff, University of Applied Sciences Berlin

Justin Zobel, RMIT University