Exploiting
Multimodal Synergy for Large Scale and Diverse Image Retrieval in
Digital Archives
Sponsor: National Science Foundation
This
project addresses a four
year research and education program (2006 - 2010)
focusing on developing a revolutionary approach to large scale and
diverse image retrieval in digital archives. It has become ubiquitous
today that almost all the digital archives contain not just the
traditional structured data, but more often the multimedia data; with
the rapid development in technologies, it has become more and more
dominant for the multimedia data in digital archives. Given a typical
presence of the multimedia data in digital archives, imagery is
considered as the most popular modality of the multimedia data
probably only next to text. Consequently, image retrieval becomes an
important research area in the literature, and thus is considered as
the focused research area towards the development of effective and
efficient Multimedia Information Retrieval (MIR) technologies in
digital archives.
Due
to this consideration, image retrieval has been studied for over a
decade as an emerging area called Content Based Image Retrieval
(CBIR), and has become a major focus of attention in the research in
MIR. The current status of the research in image retrieval exhibits
two notorious bottlenecks: (1) the issue of the semantic gap -- the
majority of the existing methods in the literature focuses on using
low-level image features to retrieve images and it is well-studied
that it is usually insufficient to find similar images solely using
image features due to the gap between the image features and the
semantic concepts carried in the image; this is due to the fact that
it is found to be very difficult to directly represent and use the
semantic concepts in image retrieval; and (2) the issue of
scalability -- all the existing methods in the literature are only
demonstrated using very clean data sets (e.g., the Corel data) and
very small data sets (typically below 10,000 images); this is due to
the three reasons: (a) most of the proposed methods in the literature
are not scalable in nature (e.g., linear search in complexity); (b)
in addition to the complexity in nature, many existing methods are
sensitive to the diversity of the image content and quality, which
results in reporting experiments using very clean data such as the
Corel collection; and (c) the image retrieval community at present
does not yet have a standard benchmark collection similar to the ones
in the text retrieval community; consequently, each research group
typically uses the data sets either collected by their own or shared
with other research groups which are typically small in scale. Note
that here the scalability issue refers to both the scales in
diversity of the image content and quality and the scales in size of
the image databases. This observation is supported by the recent
research in the literature in this area; it has been noted that the
data sets used in most recent automatic image annotation and/or image
retrieval systems fail to capture the difficulties inherent in many
real image databases.
On
the other hand, it is well-observed that often imagery data does not
exist in isolation; instead, typically there is rich collateral
information co-existing with image data in many applications.
Examples include the Web, many domain-archived image databases (in
which there are annotations to images), and even consumer photo
collections. In order to reduce the semantic gap, recently multimodal
approaches to image retrieval are proposed in the literature to
explicitly exploit the redundancy co-existing in the collateral
information to the images. In addition to the improved retrieval
accuracy, another added benefit found in the multimodal approaches is
the multiple query modalities -- users may query image databases
either by image, or by a collateral information modality (e.g.,
text), or by any combinations.
This
project focuses on developing a novel multimodal approach to image
retrieval by explicitly exploiting the synergy between the multimodal
data in addressing the two bottlenecks simultaneously. Ultimately,
this project aims at revolutionizing the research in image retrieval
and developing and advancing the proven and working technologies
allowing large scale and diverse image retrieval in digital archives.
Specifically,
as an integrated research and education program, this project focuses
on the following three specific objectives to be achieved
synergistically: (1) to develop a revolutionized theory as well as
the related methodology as a multimodal approach to large scale and
diverse image retrieval that addresses the semantic gap and the
scalability issues simultaneously; (2) to extensively evaluate the
theory and the methodology using truly large scale and diverse
multimodal data; and (3) to develop and evaluate innovative community
outreach activities through the existing partnership in research
collaborations in this project to further promote knowledge
dissemination.
The
intellectual merit of this project includes the revolutionized
understanding of the image retrieval in the multimodal context as
well as the expected breakthrough in effective and efficient image
retrieval that shall undoubtedly advance the literature of CBIR as
well as MIR and generate profound impact in the related areas
including pattern recognition, data mining, and computer vision.
The
broader impact of this project is two folds. Educationally, the
development, the implementation, and the evaluation of the innovative
community outreach activities in this project shall promote the
timely and effective knowledge dissemination related to multimodal
image retrieval and to further enrich the pedagogical literature; the
disseminated knowledge to the collaborating organizations, especially
those non-profit organizations, shall further advance and enhance
their research and services to the whole society. Technologically,
the expected breakthrough in image retrieval shall embrace a new era
of technological revolution in a wide range of applications
noticeably including the Web search engines, digital libraries, as
well as K-12 learning tools.
NSF Project Manager: Dr. Maria Zemankova
Project Personnel:
PI: Prof. Zhongfei (Mark) Zhang
PhD student:
Bo Long
Zhen Guo
Master student:
Tianbing Xu
Publications:
Zhongfei (Mark) Zhang and Ruofei Zhang, Multimedia Data Mining -- A Systematic Introduction to Concepts and Theory, Taylor & Francis Group/CRC Press, 2008, ISBN: 9781584889663
Ruofei Zhang and Zhongfei (Mark) Zhang, Solving Small and Asymmetric Sampling Problem in the Context of Image Retrieval, in Artificial Intelligence for Maximizing Content Based Image Retrieval, Edited by Zongmin Ma, Idea Group Inc., 2008
Jian Yao, Zhongfei (Mark) Zhang, Sameer Antani, Rodney Long, and George Thoma, Automatic Medical Image Annotation and Retrieval, Neurocomputing, Elsevier Science Press, Volume 71/10-12, 2008, pp 2012-2022
Xiao-Bing Xue, Zhi-Hua Zhou, and Zhongfei (Mark) Zhang, Improving Web Search Using Image Snippets, ACM Transactions on Internet Technology, ACM Press, in press, 2008
Zhongfei (Mark) Zhang, Haroon Khan, and Mark A. Robertson, A Holistic, In-Compression Approach to Video Segmentation for Independent Motion Detection, EURASIP Journal on Advances in Signal Processing, Hindawi Publishing Co., Article ID 738158, 9 pages, doi:10.1155/2008/738158, Volume 2008, 2008
Zhongfei (Mark) Zhang, Florent Masseglia, Ramesh Jain, and Alberto Del Bimbo, Editorial: Introduction to the Special Issue on Multimedia Data Mining, IEEE Transactions on Multimedia, IEEE Computer Society Press, Volume 10, Number 2, 2008, pp 165 -- 166
Tianbing Xu,
Zhongfei Zhang, Philip S. Yu, and Bo Long, Evolutionary Clustering
by Hierarchical Dirichlet Process with Hidden Markov State, Proc.
IEEE International Conference on Data Mining, Pisa, Italy, December,
2008, (9.6% acceptance rate)
[pdf]
Tianbing Xu,
Zhongfei Zhang, Philip S. Yu, and Bo Long, Dirichlet Process Based
Evolutionary Clustering, Proc. IEEE International Conference on Data
Mining, Pisa, Italy, December, 2008, (9.6% acceptance rate)
[pdf]
Xi Li, Zhongfei Zhang, Yanguo Wang, and Weiming Hu, Multiclass Spectral Clustering Based on Discriminant Analysis, Proc. International Conference on Pattern Recognition, Tempa, FL, USA, December, 2008, (20.0% acceptance rate)
Xi Li, Weiming Hu, Zhongfei Zhang, and Xiaoqin Zhang, Robust Visual Tracking Based on An Effective Appearance Model, Proc. European Computer Vision Conference, Marseille, France, October, 2008, (23.3% acceptance rate)
Xi Li,
Weiming Hu, Zhongfei Zhang, Xiaoqin Zhang, Robust Foreground
Segmentation Based on Two Effective Background Models, Proc. ACM
International Conference on Multimedia Information and Retrieval,
Vancouver, Canada, October, 2008, (20.0% acceptance rate)
[pdf]
Xi Li, Weiming Hu, Zhongfei Zhang, Xiaoqin Zhang, and Guan Luo, Trajectory-Based Video Retrieval Using Dirichlet Process Mixture Models, Proc. British Machine Vision Conference, Leeds, UK, September, 2008, (12.5% acceptance rate)
Bo Long,
Zhongfei (Mark) Zhang, and Tianbing Xu, Clustering on Complex
Graphs, Proc. 23th Conference on Artificial Intelligence (AAAI
2008), Chicago, IL, USA, July, 2008, (24% acceptance rate)
[pdf]
Xi Li, Weiming
Hu, Zhongfei Zhang, Xiaoqin Zhang, Mingliang Zhu, Jian Cheng, and
Guan Luo, Visual tracking via incremental log-Euclidean Riemannian
subspace learning, Proc. IEEE Computer vision and Pattern
Recognition, Anchorage, Alaska, USA, June 2008, (27.9% acceptance
rate)
[pdf]
Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou, Mining Bulletin Board Systems Using Community Generatio, Proc. Pacific and Asia Knowledge Discovery and Data Mining Conference, Osaka, Japan, May 2008, (11.9% acceptance rate)
Bo Long, Philip S. Yu and Zhongfei (Mark) Zhang, A general model for multiple view unsupervised learning, Proc. the SIAM International Conference on Data Mining, Atlanta, GA, 2008, (14% acceptance rate)
Zhen Guo,
Zhongfei (Mark) Zhang, Eric P. Xing, and Christos Faloutsos,
Semi-supervised learning based on semiparametric regularization,
Proc. the SIAM International Conference on Data Mining, Atlanta, GA,
2008, (14% acceptance rate)
[pdf]
Xi Li, Weiming Hu, Zhongfei (Mark)
Zhang, Xiaoqin Zhang, and Quan Luo, Robust Visual Tracking Based on
Incremental Tensor Subspace Learning, Proc. the IEEE International
Conference on Computer Vision, Rio de Janeiro, Brazil, October,
2007
[pdf]
Bo Long, Xiaoyun Wu, Zhongfei
(Mark) Zhang, and Philip S. Yu, Community Learning by Graph
Approximation, Proc. the IEEE International Conference on Data
Mining, Omaha, NE, USA, October, 2007
[pdf]
Xi Li, Weiming Hu, and Zhongfei (Mark) Zhang, Corner Detection of Contour Images Using Spectral Clustering, Proc. the 14th IEEE International Conference on Image Processing, San Antonio, TX, USA, September, 2007
Zhongfei Zhang, Zhen Guo, Christos Faloutsos, Eric P. Xing, and Jia-Yu (Tim) Pan, On the scalability and adaptability for multimodal image retrieval and image annotation, Proc. International Workshop on Visual and Multimedia Digital Libraries, Modena, Palazzo Ducale, Italy, September, 2007
Bo Long, Zhongfei (Mark) Zhang, and Philip S. Yu, A Probabilistic Framework for Relational Clustering, Proc. the 13th ACM International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, August, 2007
Zhen Guo, Zhongfei (Mark) Zhang, Eric P. Xing, and Christos Faloutsos, Enhanced Max Margin Learning on Multimodal Data Mining in a Multimedia Database, Proc. the 13th ACM International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, August, 2007
Bo Long, Zhongfei (Mark) Zhang, and Philip S. Yu, Graph Partitioning Based on Link Distribution, Proc. the 22nd Annual Conference on Artificial Intelligence (AAAI-07), Vancouver, British Columbia, Canada, July, 2007
Zhen Guo, Zhongfei (Mark) Zhang, Eric P. Xing, and Christos Faloutsos, A Max Margin Framework on Image Annotation and Multimodal Image Retrieval, Proc. the IEEE Annual International Conference on Multimedia and Expo, Beijing, China, July, 2007
Bo Long,
Zhongfei (Mark) Zhang, and Philip S. Yu, Relational Clustering by
Symmetric Convex Coding, Proc. the 24th Annual International
Conference on Machine Learning, Oregon State University, OR, USA,
June, 2007
[pdf]
Zhongfei (Mark) Zhang, Florent Masseglia, Ramesh Jain, and Alberto Del Bimbo, KDD/MDM 2006: The 7th KDD Multimedia Data Mining Workshop Report, ACM KDD Explorations, accepted, 2006
Ruofei Zhang and Zhongfei (Mark) Zhang, Effective Image Retrieval Based on Hidden Concept Discovery in Image Database, IEEE Transaction on Image Processing, Volume 16, Number 2, February, 2006, pp 562 -- 572
Arif Ghafoor, Zhongfei (Mark) Zhang, Michael S. Lew, and Zhi-Hua Zhou, Guest Editors' Introduction to Machine Learning Approaches to Multimedia Information Retrieval, ACM Multimedia Systems Journal, Springer, 2006
Zhongfei (Mark) Zhang, Querying Non-Uniform Image Databases for Biometrics-Related Identification Applications, Sensor Review, Emerald Publishers, Volume 26, Number 2, April, 2006, pp 122-126
Ruofei Zhang and Zhongfei (Mark) Zhang, Empirical Bayesian Learning in the Relevance Feedback of Image Retrieval, Image and Vision Computing, Elsevier Science, Volume 24, Issue 3, March, 2006, pp 211-223
Ruofei Zhang, Zhongfei (Mark) Zhang, Mingjing Li, Wei-Ying Ma, and Hong-Jiang Zhang, A Probabilistic Semantic Model for Image Annotation and Multi-Modal Image Retrieval, ACM Multimedia Systems Journal, the special issue of Using Machine Learning Approaches to Multimedia Information Retrieval, Springer, 2006
Jian Yao and Zhongfei (Mark) Zhang, Hierarchical Shadow Detection for Color Aerial Images, Computer Vision and Image Understanding, Elsevier Science, Volume 102, Issue 1, April, 2006, pp 60-69
Bo Long,
Xiaoyun Wu, Zhongfei (Mark) Zhang, and Philip S. Yu, Unsupervised
Learning on K-partite Graphs, Proc. ACM International Conference on
Knowledge Discovery and Data Mining, ACM Press, Philadelphia, PA,
USA, August, 2006
[pdf]
Bo Long,
Zhongfei (Mark) Zhang, Xiaoyun Wu, and Philip S. Yu, Spectral
Clustering for Multi-Type Relational Data, Proc. International
Conference on Machine Learning, ACM Press, Pittsburgh, PA, USA,
June, 2006
[pdf]
Xiao-Bing Xue, Zhi-Hua Zhou, and Zhongfei (Mark) Zhang, Improve Web Search Using Image Snippets, Proc. the 21st National Conference on Artificial Intelligence, AAAI Press, Boston, MA, USA, July, 2006
Jian Yao, Zhongfei (Mark) Zhang, Sameer Antani, Rodney Long, and George Thoma, Automatic Medical Image Annotation and Retrieval Using SEMI-SECC, Proc. IEEE International Conference on Multimedia and Expo, IEEE Computer Society Press, Toronto, Canada, July, 2006
Jian Yao,
Sameer Antani, Rodney Long, and George Thoma, and Zhongfei (Mark)
Zhang, Automatic Medical Image Annotation and Retrieval Using SECC,
Proc. IEEE International Symposium on Computer Based Medical
Systems, IEEE Computer Society Press, Salt Lake City, Utah, USA,
June, 2006
[pdf]
Code Release :
ยท EMML code(based on paper: Zhen Guo, Zhongfei (Mark) Zhang, Eric P. Xing, and Christos Faloutsos, Enhanced Max Margin Learning on Multimodal Data Mining in a Multimedia Database, Proc. the 13th ACM International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, August, 2007)
Partners:
This material is based upon the work supported by the National Science Foundation under Award No. 0535162.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.