Exploiting Multimodal Synergy for Large Scale and Diverse Image Retrieval in Digital Archives

Exploiting Multimodal Synergy for Large Scale and Diverse Image Retrieval in Digital Archives

Sponsor: National Science Foundation

This project addresses afouryear research and education program (2006 -2010) focusing on developing a revolutionary approach to large scale and diverse image retrieval in digital archives. It has become ubiquitous today that almost all the digital archives contain not just the traditional structured data, but more often the multimedia data; with the rapid development in technologies, it has become more and more dominant for the multimedia data in digital archives. Given a typical presence of the multimedia data in digital archives, imagery is considered as the most popular modality of the multimedia data probably only next to text. Consequently, image retrieval becomes an important research area in the literature, and thus is considered as the focused research area towards the development of effective and efficient Multimedia Information Retrieval (MIR) technologies in digital archives.

Due to this consideration, image retrieval has been studied for over a decade as an emerging area called Content Based Image Retrieval (CBIR), and has become a major focus of attention in the research in MIR. The current status of the research in image retrieval exhibits two notorious bottlenecks: (1) the issue of the semantic gap -- the majority of the existing methods in the literature focuses on using low-level image features to retrieve images and it is well-studied that it is usually insufficient to find similar images solely using image features due to the gap between the image features and the semantic concepts carried in the image; this is due to the fact that it is found to be very difficult to directly represent and use the semantic concepts in image retrieval; and (2) the issue of scalability -- all the existing methods in the literature are only demonstrated using very clean data sets (e.g., the Corel data) and very small data sets (typically below 10,000 images); this is due to the three reasons: (a) most of the proposed methods in the literature are not scalable in nature (e.g., linear search in complexity); (b) in addition to the complexity in nature, many existing methods are sensitive to the diversity of the image content and quality, which results in reporting experiments using very clean data such as the Corel collection; and (c) the image retrieval community at present does not yet have a standard benchmark collection similar to the ones in the text retrieval community; consequently, each research group typically uses the data sets either collected by their own or shared with other research groups which are typically small in scale. Note that here the scalability issue refers to both the scales in diversity of the image content and quality and the scales in size of the image databases. This observation is supported by the recent research in the literature in this area; it has been noted that the data sets used in most recent automatic image annotation and/or image retrieval systems fail to capture the difficulties inherent in many real image databases.

On the other hand, it is well-observed that often imagery data does not exist in isolation; instead, typically there is rich collateral information co-existing with image data in many applications. Examples include the Web, many domain-archived image databases (in which there are annotations to images), and even consumer photo collections. In order to reduce the semantic gap, recently multimodal approaches to image retrieval are proposed in the literature to explicitly exploit the redundancy co-existing in the collateral information to the images. In addition to the improved retrieval accuracy, another added benefit found in the multimodal approaches is the multiple query modalities -- users may query image databases either by image, or by a collateral information modality (e.g., text), or by any combinations.

This project focuses on developing a novel multimodal approach to image retrieval by explicitly exploiting the synergy between the multimodal data in addressing the two bottlenecks simultaneously. Ultimately, this project aims at revolutionizing the research in image retrieval and developing and advancing the proven and working technologies allowing large scale and diverse image retrieval in digital archives.

Specifically, as an integrated research and education program, this project focuses on the following three specific objectives to be achieved synergistically: (1) to develop a revolutionized theory as well as the related methodology as a multimodal approach to large scale and diverse image retrieval that addresses the semantic gap and the scalability issues simultaneously; (2) to extensively evaluate the theory and the methodology using truly large scale and diverse multimodal data; and (3) to develop and evaluate innovative community outreach activities through the existing partnership in research collaborations in this project to further promote knowledge dissemination.

The intellectual merit of this project includes the revolutionized understanding of the image retrieval in the multimodal context as well as the expected breakthrough in effective and efficient image retrieval that shall undoubtedly advance the literature of CBIR as well as MIR and generate profound impact in the related areas including pattern recognition, data mining, and computer vision.

The broader impact of this project is two folds. Educationally, the development, the implementation, and the evaluation of the innovative community outreach activities in this project shall promote the timely and effective knowledge dissemination related to multimodal image retrieval and to further enrich the pedagogical literature; the disseminated knowledge to the collaborating organizations, especially those non-profit organizations, shall further advance and enhance their research and services to the whole society. Technologically, the expected breakthrough in image retrieval shall embrace a new era of technological revolution in a wide range of applications noticeably including the Web search engines, digital libraries, as well as K-12 learning tools.

NSF Project Manager: Dr. Maria Zemankova

Project Personnel:

PI: Prof. Zhongfei (Mark) Zhang

PhD student:

Bo Long
Zhen Guo

Master student:

Tianbing Xu

Publications:

Zhongfei (Mark) Zhang and Ruofei Zhang, Multimedia Data Mining -- A Systematic Introduction to Concepts and Theory, Taylor & Francis Group/CRC Press, 2008, ISBN: 9781584889663

Ruofei Zhang and Zhongfei (Mark) Zhang, Solving Small and Asymmetric Sampling Problem in the Context of Image Retrieval, in Artificial Intelligence for Maximizing Content Based Image Retrieval, Edited by Zongmin Ma, Idea Group Inc., 2008

Jian Yao, Zhongfei (Mark) Zhang, Sameer Antani, Rodney Long, and George Thoma, Automatic Medical Image Annotation and Retrieval, Neurocomputing, Elsevier Science Press, Volume 71/10-12, 2008, pp 2012-2022

Xiao-Bing Xue, Zhi-Hua Zhou, and Zhongfei (Mark) Zhang, Improving Web Search Using Image Snippets, ACM Transactions on Internet Technology, ACM Press, in press, 2008

Zhongfei (Mark) Zhang, Haroon Khan, and Mark A. Robertson, A Holistic, In-Compression Approach to Video Segmentation for Independent Motion Detection, EURASIP Journal on Advances in Signal Processing, Hindawi Publishing Co., Article ID 738158, 9 pages, doi:10.1155/2008/738158, Volume 2008, 2008

Zhongfei (Mark) Zhang, Florent Masseglia, Ramesh Jain, and Alberto Del Bimbo, Editorial: Introduction to the Special Issue on Multimedia Data Mining, IEEE Transactions on Multimedia, IEEE Computer Society Press, Volume 10, Number 2, 2008, pp 165 -- 166

Tianbing Xu, Zhongfei Zhang, Philip S. Yu, and Bo Long, Evolutionary Clustering by Hierarchical Dirichlet Process with Hidden Markov State, Proc. IEEE International Conference on Data Mining, Pisa, Italy, December, 2008, (9.6% acceptance rate) [pdf]

Tianbing Xu, Zhongfei Zhang, Philip S. Yu, and Bo Long, Dirichlet Process Based Evolutionary Clustering, Proc. IEEE International Conference on Data Mining, Pisa, Italy, December, 2008, (9.6% acceptance rate) [pdf]

Xi Li, Zhongfei Zhang, Yanguo Wang, and Weiming Hu, Multiclass Spectral Clustering Based on Discriminant Analysis, Proc. International Conference on Pattern Recognition, Tempa, FL, USA, December, 2008, (20.0% acceptance rate)

Xi Li, Weiming Hu, Zhongfei Zhang, and Xiaoqin Zhang, Robust Visual Tracking Based on An Effective Appearance Model, Proc. European Computer Vision Conference, Marseille, France, October, 2008, (23.3% acceptance rate)

Xi Li, Weiming Hu, Zhongfei Zhang, Xiaoqin Zhang, Robust Foreground Segmentation Based on Two Effective Background Models, Proc. ACM International Conference on Multimedia Information and Retrieval, Vancouver, Canada, October, 2008, (20.0% acceptance rate) [pdf]

Xi Li, Weiming Hu, Zhongfei Zhang, Xiaoqin Zhang, and Guan Luo, Trajectory-Based Video Retrieval Using Dirichlet Process Mixture Models, Proc. British Machine Vision Conference, Leeds, UK, September, 2008, (12.5% acceptance rate)

Bo Long, Zhongfei (Mark) Zhang, and Tianbing Xu, Clustering on Complex Graphs, Proc. 23th Conference on Artificial Intelligence (AAAI 2008), Chicago, IL, USA, July, 2008, (24% acceptance rate) [pdf]

Xi Li, Weiming Hu, Zhongfei Zhang, Xiaoqin Zhang, Mingliang Zhu, Jian Cheng, and Guan Luo, Visual tracking via incremental log-Euclidean Riemannian subspace learning, Proc. IEEE Computer vision and Pattern Recognition, Anchorage, Alaska, USA, June 2008, (27.9% acceptance rate) [pdf]

Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou, Mining Bulletin Board Systems Using Community Generatio, Proc. Pacific and Asia Knowledge Discovery and Data Mining Conference, Osaka, Japan, May 2008, (11.9% acceptance rate)

Bo Long, Philip S. Yu and Zhongfei (Mark) Zhang, A general model for multiple view unsupervised learning, Proc. the SIAM International Conference on Data Mining, Atlanta, GA, 2008, (14% acceptance rate)

Zhen Guo, Zhongfei (Mark) Zhang, Eric P. Xing, and Christos Faloutsos, Semi-supervised learning based on semiparametric regularization, Proc. the SIAM International Conference on Data Mining, Atlanta, GA, 2008, (14% acceptance rate) [pdf]

Xi Li, Weiming Hu, Zhongfei (Mark) Zhang, Xiaoqin Zhang, and Quan Luo, Robust Visual Tracking Based on Incremental Tensor Subspace Learning, Proc. the IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, October, 2007[pdf]

Bo Long, Xiaoyun Wu, Zhongfei (Mark) Zhang, and Philip S. Yu, Community Learning by Graph Approximation, Proc. the IEEE International Conference on Data Mining, Omaha, NE, USA, October, 2007[pdf]

Xi Li, Weiming Hu, and Zhongfei (Mark) Zhang, Corner Detection of Contour Images Using Spectral Clustering, Proc. the 14th IEEE International Conference on Image Processing, San Antonio, TX, USA, September, 2007

Zhongfei Zhang, Zhen Guo, Christos Faloutsos, Eric P. Xing, and Jia-Yu (Tim) Pan, On the scalability and adaptability for multimodal image retrieval and image annotation, Proc. International Workshop on Visual and Multimedia Digital Libraries, Modena, Palazzo Ducale, Italy, September, 2007

Bo Long, Zhongfei (Mark) Zhang, and Philip S. Yu, A Probabilistic Framework for Relational Clustering, Proc. the 13th ACM International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, August, 2007

Zhen Guo, Zhongfei (Mark) Zhang, Eric P. Xing, and Christos Faloutsos, Enhanced Max Margin Learning on Multimodal Data Mining in a Multimedia Database, Proc. the 13th ACM International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, August, 2007

Bo Long, Zhongfei (Mark) Zhang, and Philip S. Yu, Graph Partitioning Based on Link Distribution, Proc. the 22nd Annual Conference on Artificial Intelligence (AAAI-07), Vancouver, British Columbia, Canada, July, 2007

Zhen Guo, Zhongfei (Mark) Zhang, Eric P. Xing, and Christos Faloutsos, A Max Margin Framework on Image Annotation and Multimodal Image Retrieval, Proc. the IEEE Annual International Conference on Multimedia and Expo, Beijing, China, July, 2007

Bo Long, Zhongfei (Mark) Zhang, and Philip S. Yu, Relational Clustering by Symmetric Convex Coding, Proc. the 24th Annual International Conference on Machine Learning, Oregon State University, OR, USA, June, 2007 [pdf]

Zhongfei (Mark) Zhang, Florent Masseglia, Ramesh Jain, and Alberto Del Bimbo, KDD/MDM 2006: The 7th KDD Multimedia Data Mining Workshop Report, ACM KDD Explorations, accepted, 2006

Ruofei Zhang and Zhongfei (Mark) Zhang, Effective Image Retrieval Based on Hidden Concept Discovery in Image Database, IEEE Transaction on Image Processing, Volume 16, Number 2, February, 2006, pp 562 -- 572

Arif Ghafoor, Zhongfei (Mark) Zhang, Michael S. Lew, and Zhi-Hua Zhou, Guest Editors' Introduction to Machine Learning Approaches to Multimedia Information Retrieval, ACM Multimedia Systems Journal, Springer, 2006

Zhongfei (Mark) Zhang, Querying Non-Uniform Image Databases for Biometrics-Related Identification Applications, Sensor Review, Emerald Publishers, Volume 26, Number 2, April, 2006, pp 122-126

Ruofei Zhang and Zhongfei (Mark) Zhang, Empirical Bayesian Learning in the Relevance Feedback of Image Retrieval, Image and Vision Computing, Elsevier Science, Volume 24, Issue 3, March, 2006, pp 211-223

Ruofei Zhang, Zhongfei (Mark) Zhang, Mingjing Li, Wei-Ying Ma, and Hong-Jiang Zhang, A Probabilistic Semantic Model for Image Annotation and Multi-Modal Image Retrieval, ACM Multimedia Systems Journal, the special issue of Using Machine Learning Approaches to Multimedia Information Retrieval, Springer, 2006

Jian Yao and Zhongfei (Mark) Zhang, Hierarchical Shadow Detection for Color Aerial Images, Computer Vision and Image Understanding, Elsevier Science, Volume 102, Issue 1, April, 2006, pp 60-69

Bo Long, Xiaoyun Wu, Zhongfei (Mark) Zhang, and Philip S. Yu, Unsupervised Learning on K-partite Graphs, Proc. ACM International Conference on Knowledge Discovery and Data Mining, ACM Press, Philadelphia, PA, USA, August, 2006[pdf]

Bo Long, Zhongfei (Mark) Zhang, Xiaoyun Wu, and Philip S. Yu, Spectral Clustering for Multi-Type Relational Data, Proc. International Conference on Machine Learning, ACM Press, Pittsburgh, PA, USA, June, 2006[pdf]

Xiao-Bing Xue, Zhi-Hua Zhou, and Zhongfei (Mark) Zhang, Improve Web Search Using Image Snippets, Proc. the 21st National Conference on Artificial Intelligence, AAAI Press, Boston, MA, USA, July, 2006

Jian Yao, Zhongfei (Mark) Zhang, Sameer Antani, Rodney Long, and George Thoma, Automatic Medical Image Annotation and Retrieval Using SEMI-SECC, Proc. IEEE International Conference on Multimedia and Expo, IEEE Computer Society Press, Toronto, Canada, July, 2006

Jian Yao, Sameer Antani, Rodney Long, and George Thoma, and Zhongfei (Mark) Zhang, Automatic Medical Image Annotation and Retrieval Using SECC, Proc. IEEE International Symposium on Computer Based Medical Systems, IEEE Computer Society Press, Salt Lake City, Utah, USA, June, 2006[pdf]

Code Release :

· EMML code(based on paper: Zhen Guo, Zhongfei (Mark) Zhang, Eric P. Xing, and Christos Faloutsos, Enhanced Max Margin Learning on Multimodal Data Mining in a Multimedia Database, Proc. the 13th ACM International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, August, 2007)

Partners:

Institute of Automation, Chinese Academy of Sciences

This material is based upon the work supported by the National Science Foundation under Award No. 0535162.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Go back to the Multimedia Computing Research Lab homepage