CIKM'13 Tutorial on

Large Scale Machine Learning for Information Retrieval

Bo Long and Liang Zhang
LinkedIn Inc.

Topic Overview

The success of data-driven solutions to challenging  problems, along with exponential data growth in modern information retrieval systems,  has led to dramatically growing interest in large scale machine learning.
The objective of this tutorial is to provide in-depth and systematic introduction of  large scale machine learning challenges, algorithms, and architectures with focus on information retrieval applications. First, we
will  introduce fundamental  aspects of large scale machine learning and typical challenges for large scale learning in information retrieval systems. Second, we will present principal algorithm framework for large
scale learning, which covers traditional distributed learning frameworks, such as distributed gradient descent learning, as well as state-of-the-art works, such as alternating direction method of multipliers and
Bayesian distributed learning.  Third, we will further introduce commonly-used large scale machine learning algorithms for information retrieval, which fall into two categories: large scale supervised learning
algorithms such as classification, ranking and regression  and large scale unsupervised learning algorithms such as  matrix factorization and clustering.  In this part, we will discuss general aspects for each category
of algorithms as well as practical implementations of specific representative algorithms, such as large scale logistic regression, large scale gradient boosting tree, and large scale latent factor learning. Fourth, we
will  discuss and compare different architectures for large scale machine learning, such as Hadoop and Spark. Throughout  the  tutorial, concrete examples of large scale machine learning  as well as case studies
from real-world  applications, such as ads recommendation and Web search, will be provided for illustrations and discussion.

Learning Objective

This tutorial aims to  provide in-depth and systematic introduction of  large scale machine learning challenges, algorithms, and architectures with focus on information retrieval applications.

Intended Audience

This tutorial would be appropriate for everyone attending CIKM 2013. No prior knowledge of large scale machine learning is required.  We will only assume basic knowledge in  machine learning methods and
information retrieval systems.

Topics and scope

The tutorial will cover the following topics within both practical and theoretical scope of large scale machine learning for information retrieval.


You can download slides here


Bo Long  is a Staff applied researcher at LinkedIn Inc, and was formerly a senior research scientist at Yahoo! Labs. His research interests lie in data mining and machine learning with applications to web search, recommendation, and social network analysis. He holds eight innovations and has published peer-reviewed papers in top conferences and journals including ICML, KDD, ICDM, AAAI, SDM, CIKM, and KAIS.  he has served as reviewers, workshops co-organizers, conference organizer committee members, and area chairs for multiple conferences, including KDD, NIPS, SIGIR, ICML, SDM, CIKM, JSM etc.

Liang Zhang is a Staff Applied Researcher at LinkedIn Inc. He obtained his Ph. D degree at Department of Statistical Science,Duke University in 2008. He worked at Yahoo! Inc. as a Scientist from 2008 to March 2012.
Liang has done many work and published several papers on applying statistical approaches to real world Internet applications where we usually find massive data. He also has years of experience of using Map-Reduce
and Hadoop system for his own Statistical research. Liang's research interests include recommender systems, computational advertising, statistical modeling and analysis for large scale data.