IPM Special Issue on Large-Scale Distributed Systems for Information Retrieval
We are pleased to announce that we are preparing a special issue on the workshop topics which will be published in the Information Processing and Management Journal by Elsevier. You can find the CFP on the journal webpage and here.
Submissions are open to any contribution in the field of large scale distributed systems for information retrieval. Given the high quality of the papers presented at this year edition of the LSDS-IR workshop, we particularly invite extended versions of those works.
Contributions must be submitted by February 1st, 2010 at the following url: http://ees.elsevier.com/ipm/default.asp.
News Updates
2009-11-30 We are organizing a special issue on the workshop topics to be published on Information Processing and Management by Elsevier.
2009-07-14 The workshop proceedings have been published on-line by CEUR-WS.
2009-07-10 The workshop proceedings are available on-line.
2009-07-07 Techical program is now available here. The workshop will host two invited talks.
2009-07-02 Sample copyright box of the camera ready papers. donwload.
2009-06-28 Notifications were sent to authors. Soon, we will publish the workshop program.
2009-05-08 The Information Processing & Management Jorunal will publish a Special Issue on LSDS-IR.
2009-04-28 Submission website is now open. Submit your paper !
2009-04-16 Call for papers available in pdf and txt formats.
2009-03-11 The workshop site
goes online.
Workshop Evaluation
We thank the invited speakers and all the participants, they made the workshop successful !
If you participated to the workshop, please fill the evaluation form
http://tiny.cc/sigir2009ws.
Summary
The Web is continuously growing. Currently, there are more than 20 billions pages (some sources suggest 100 billions), compared to less than 1 billion documents in 1998. Traditionally, Web-scale search engines employ large and highly replicated systems, operating on computer clusters in one or few data centers. Coping with the increasing number of user requests and indexable pages requires adding more resources. However, data centers cannot grow indefinitely. Scalability problems in information retrieval have to be addressed in the near future, and new distributed applications are likely to drive the way in which people use the Web. Distributed IR is the point in which these two directions converge. This workshop will provide space for researchers to discuss these problems and to define new directions for the work on distributed information retrieval.
Program
Every regular paper will have 30 minutes for the presentation, including 5-10 minutes for questions. Short papers will have 15 minutes for presentation plus 5 for questions. The workshop proceedings, including all the accpted papers, are available here.
09:00 - 09:10 | Welcome | |
Session I: "YouTube" , chaired by Wai Gen Yee. | ||
09:10 - 10:10 | Keynote: "The Youtube Video Delivery System" by Leonidas Kontothanassis (Google, Boston, USA). | |
10:10 - 10:30 | "Are Web User Comments Useful for Search?" by Wai Gen Yee, Andrew Yates, Shizhu Liu, Ophir Frieder (Illinois Institute of Technology). |
|
Special guest: Martin Potthast (Bauhaus-Universität Weimar), with great enthusiasm, shared withus his ideas and findings on "Measuring the Descriptiveness of Web Comments" (website). | ||
10:30 - 11:00 | Coffee break | |
Session II: "Search", chaired by Claudio Lucchese. | ||
11:00 - 11:30 | "PP-Index: Using Permutation Prefixes for Efficient and Scalable Approximate Similarity Search" by Andrea Esuli (ISTI-CNR, Pisa, Italy). |
|
11:30 - 12:00 | "Collection Selection with Highly Discriminative Keys" by Sander Bockting (Avanade), Djoerd Hiemstra (U of Twente). |
|
12:00 - 12:20 | "Peer-to-Peer clustering of Web-browsing users" by Patrizio Dazzi, Matteo Mordacchini, Raffaele Perego (ISTI-CNR, Pisa, Italy), Pascal Felber, Lorenzo Leonini (U of Neuchatel), Martin Rajman (EPFL), Etienne Riviere (NTNU). |
|
12:20 - 13:30 | Lunch break | |
Session III: "Large Scale", chaired by Wai Gen Yee. | ||
13:30 - 14:30 | Keynote: "DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language" by Dennis Fetterly (Microsoft Research, Silicon Valley, USA). | |
14:30 - 15:00 | "Strong Ties vs. Weak Ties: Studying the Clustering Paradox for Decentralized search" by Weimao Ke, Javed Mostafa (U of North Carolina). |
|
15:00 - 15:30 | Coffee break | |
Session IV: "Large Scale cont.", chaired by Gleb Skobeltsyn. | ||
15:30 - 16:00 | "Sorting using BItonic netwoRk wIth CUDA" by Gabrielle Capannini, Fabrizio Silvestri, Ranieri Baraglia, Franco Maria Nardini (ISTI-CNR, Pisa, Italy). |
|
16:00 - 16:30 | "Static Index Pruning for Information Retrieval Systems: A Posting-Based Approach" by Linh Nguyen (Illinois Institute of Technology). |
|
16:30 - 17:00 | "Comparing Distributed Indexing: To MapReduce or Not?" by Richard McCreadie, Craig Mcdonald, Iadh Ounis (U of Glasgow). |
|
17:00 - 17:20 | "The Curse of Zipf and Limits to Parallelization: A Look at the Stragglers Problem in MapReduce" by Jimmy Lin (U of Maryland). |
|
17:20 - 17:30 | Concluding Remarks |
Keynote speakers
Speaker: Leonidas
Kontothanassis. He joined Google in 2006 and immediately
started working on networking and video delivery issues and have been
ever since. he currently acts as the manager of the teams working in
these areas. Previously he has worked in such areas as computer
architecture, parallel programming, and content delivery with multiple
companies in the Kendall/MIT area include DEC/HP/Intel Labs and
Akamai. He received a PhD in computer architecture in 1996 and has
served as committee member or organizer for academic conferences and
research funding organizations like NSF.
Title: "The Youtube Video Delivery
System". This talk will cover the Youtube Video Delivery
System. It will discuss access patterns and trends for both video
uploads and downloads. It will describe the storage and delivery
mechanisms for popular and unpopular content and the impact YouTube
has on the network storage infrastructure for Google. We will also
discuss the networking impact for ISPs around the world.
Speaker: Dennis Fetterly. He is a Research
Software Development Engineer in Microsoft Research's Silicon Valley
lab, which he joined in May, 2003. His research interests include a
wide variety of topics including web crawling, the evolution and
similarity of pages on the web, identifying spam web pages, and large
scale distributed systems. He is currently working on DryadLINQ,
TidyFS, and a project evaluating policies for corpus
selection. Interesting past projects include the MSRBot web crawler,
Dryad, the Your Desktop and Your Keychain projecy, which utilizes
flash memory devices to enable users to carry their desktop PC state
with them from machine to machine, and PageTurner, a large scale study
of the evolution of web-pages.
Title: "DryadLINQ: A system for general-purpose
distributed data-parallel computing using high-level language".
The goal of DryadLINQ is to make distributed computing on large
compute clusters simple. DryadLINQ combines two important pieces of
technology: the Dryad distributed execution engine and the .NET
Language INtegrated Query (LINQ). Dryad provides reliable, distributed
computing on thousands of applications in a SQL-like query language,
relying on the entire -NET library and using Visual Studio. DryadLINQ
is a simple, powerful, and elegant programming environment for writing
large-scale data parallel applications running on large PC
clusters. This talk will also describe the experience using DryadLINQ
for a series of information retrieval experiments.
Workshop Activities and Goals
The workshop aims to bring together researchers from the domains of IR and databases working on peer-to-peer information systems and to foster closer collaboration that could have a large impact on future research directions in the area of distributed and P2P IR.
This workshop continues the efforts from previous workshops.
- CIKM'08: Workshop on Large-Scale Distributed Systems for Information Retrieval - LSDS-IR'08
- SIGIR'07: Workshop on Large-Scale Distributed Systems for Information Retrieval - LSDS-IR'07
- CIKM'06: Workshop on Information Retrieval in Peer-to-Peer Networks - P2PIR'06
- CIKM'05: Workshop on Information Retrieval in Peer-to-Peer Networks - P2PIR'05
- SIGIR'05: Workshop on Heterogeneous and Distributed Information Retrieval - HDIR'05
- SIGIR'04: Workshop on Information Retrieval in Peer-to-Peer Networks - P2PIR'04
Topics
Topics of interest include, but are not limited to:
- Novel P2P IR applications/architectures
- Retrieval models for distributed and P2P IR
- Heterogeneous distributed data integration and organization
- Query routing in distributed environments
- Distributed ranking
- Distributed index structures
- Efficiency in distributed and P2P IR
- Scalability of distributed indexing and retrieval
- Multimedia retrieval for distributed and P2P IR
- Distributed indexing and retrieval of dynamic information sources
- Standards / benchmarks for P2P IR
- Grid IR
- Decentralized reputation management
- Distributed data provenance
- Tools for distributed IR
Workshop chairs
- Claudio Lucchese, National Research Council - ISTI, Italy
- Gleb Skobeltsyn, EPFL/Google, Switzerland
- Wai Gen Yee, Illinois Institute of Technology, Chicago, USA
Steering Committee
- Flavio Junqueira , Yahoo! Research Barcelona, Spain
- Fabrizio Silvestri , ISTI-CNR, Italy
- Ivana Podnar Zarko , University of Zagreb, Croatia
Program Committee
- Karl Aberer, Ecole Polytechnique Federale de Lausanne, Switzerland
- Ricardo Baeza-Yates, Yahoo! Research Barcelona, Spain
- Gregory Buehrer, Microsoft Live Labs, USA
- Roi Blanco, University of A Coruna, Spain
- Fabrizio Falchi, ISTI-CNR, Italy
- Ophir Frieder, Illinois Institute of Technology, Chicago, USA
- Flavio Junqueira, Yahoo! Research Barcelona, Spain
- Claudio Lucchese, ISTI-CNR, Italy
- Sebastian Michel, Ecole Polytechnique Federale de Lausanne, Switzerland
- Wolfgang Nejdl, University of Hannover, Germany
- Kjetil Norvag, Norwegian University of Science and Technology, Norway
- Salvatore Orlando, University of Venice, Italy
- Josiane Xavier Parreira, Max-Planck-Institut Informatik, Germany
- Raffaele Perego, ISTI-CNR, Italy
- Diego Puppin, Google, USA
- Martin Rajman, Ecole Polytechnique Federale de Lausanne, Switzerland
- Fabrizio Silvestri, ISTI-CNR, Italy
- Gleb Skobeltsyn, EPFL, Switzerland
- Torsten Suel, Polytechnic University, USA
- Christos Tryfonopoulos, Max-Planck-Institut Informatik, Germany
- Wai Gen Yee, Illinois Institute of Technology, USA
- Ivana Podnar Zarko, University of Zagreb, Croatia
- Pavel Zezula, Masaryk University of Brno, Czech Republic
Workshop format
The workshop solicits scientific papers that address problems specific to IR in heterogeneous and distributed environments. Additionally, position papers outlining interesting new research domains and approaches are welcome. The selection of papers is based primarily on their potential to influence future research. Papers have to present original contributions not concurrently submitted elsewhere.
Paper Submission
Papers should not exceed 8 pages, double column, including figures, tables and references in the standard ACM Conference style (for LaTeX, use the "Option 2" style). Papers have to present original research contributions not concurrently submitted elsewhere, and must be submitted electronically in printable PDF format (other formats will be rejected) via the online submission system. Submitted papers will undergo a peer review process by at least three members of the program committee. Submission is not blind.
At least one author of an accepted paper must register for the workshop. Registration must be done at the time when the author sends the camera-ready copy of the paper. Here you can find a sample of copyright box for the camera ready papers. Further instrunctions are available through the online submission system.
Best papers will be invited to submit an extended version to the Special Issue on "Large-Scale Distributed Systems for Information Retrieval", published by the Information Processing & Management Journal.
Important Dates
Paper submission: | |
Notification: | |
Camera-ready papers: | |
Workshop date: | July 23, 2009 |