Summary and categorization of some papers

Region Based:


VisualSEEK: a fully automated content-based image query system (Columbia)

Based on color set and its back-projection to extract regions, then use color set and spatial properties (size, absolute position, relative relationships of regions, using 2D strings) to describe similarities between regions (images)

SIMPLicity: Semantis-Sensitive Integrated Matching for Picture Libraries (Stanford)

Clustering based on color and texture(wavelet) and their shape(inertia). Integrate the region matching of two images to get similarity between images.

IRM: Integrated Region Matching for Image Retrieval (Stanford)

Same as above

Soft Computing Approach for Content-Based Image Retrieval (PSU)

Using fuzzy set to represent clusters got by color and texture and define the distance between regions by the distance between  fuzzy set and come up with a integrated region matching approach.

Textual combined with somewhat color

Chabot: Retrieval from a Relational Database of Images (UC Berkeley)

Textual combined with somewhat color (or color concept)based image retrieval in a relational DB:

Color Feature based:


Image Indexing Using Color Correlograms (Cornell)

Using color spatial relationship(block distance between pixels) to define distance between images

Histogram Refinement for Content-Based Image Retrieval (Cornell)

Refine the histogram approach by splitting the pixels in a given bucket into severl classes based upon spatial coherence (color coherence vector).

Geometric Histogram: A Distribution of Geometric Configurations of Color Subsets (SUNY Buffalo)

Refine the histogram approach by count the color histogram with certain spatial (geometric) relationships between pixels to improve the histogram comparison with spatial information.

Embedding Fuzzy Logic in Content Based Image Retrieval (INRIA, France)

Improve the color histogram by using fuzzy logic to describe the fuzzy of personal perception and quantization of the color components. Define each color as a fuzzy feature and then define the fuzzy histogram(as a fuzzy set) based on the similarity between fuzzy feature. Also define two distance (dissimilarity or resemblance) between two fuzzy sets (fuzzy histograms).

Color-based image retrieval using spatial-chromatic histogram (Italy)

Another histogram refinement approach by recording the average poison of each color histogram and their standard deviation to add some kind of spatial information on traditional histogram approach, called Spatial-Chromatic Histogram. It also provides a new color quantization method.

Classification Based:


Applying perceptual grouping to content-based image retrieval (UT-Austin)

By using image processing techniques to extract line, junction, parallel groups and other structure information to classify an image to building or not-building classes. It applies Bayesian rule to classify the images by training the classifier with supervised  methodology.

Appearance based:


Retrieval Images by Appearance (UMASS)

Using a description of the image intensity surface. Gaussian derivative filters at several scales are applied to the image and low order 2D differential invariants are computed to be features compared between images. User select appropriate regions to submit a query. The invariant vectors corresponding to these regions are matched with the database counterparts both in feature and coordinate space to yield a match score per image. Very time consuming.

Geometrical Distribution based:


A Signature for Content-Based Image Retrieval Using a Geometrical Transform (Univ. of Sydney, Australia)

Using Radon transform (one kind of geometrical transform) to be signature of an image. This signature has a property to reflect the geometrical distribution of the image intensities without a sophisticated segmentation.

Medical Image Retrieval


Content-based 3D Neuroradiologic Image Retrieval: Preliminary Results (CMU)

Classification-Driven Medical Image Retrieval (CMU)

Image Representation and Matching Iconic Indexing by 2-D Strings (Shi-Kuo Chang, Qingyun Shi    PAMI 9. No.3 May 1987)

Describe a way of representing a symbolic picture by a 2-D strings. This 2-D string describes the relative position relationship between objects(symbols). The problem of pictorial information retrieval then becomes a problem of 2-D subsequence matching. Three type of string subsequence (type-i 2-D subsequence) are defined based on the exact(approximate) relative position relationships between symbols. Thus different level of structure similarity between pictures are described.

Survey Paper Indexing pictorial documents by their content: a survey of current techniques (M.De Marsicoi Italy)

Content-Based Image Retrieval at the End of the Early Years (PAMI, Vol. 22, No. 12, Dec. 2000)

Techniques and Systems for Image and Video Retrieval (Clement T. Yu, IEEE Trans. on Knowledge and Data Engineering)

Object Detection and Recognition A Statistical Method for 3D Object Detection Applied to Faces and Cars (CVPR 2000, Henry Schneiderman and Takeo Kanade, Robotics Institute, CMU)

Based on Bayes decision rule P(image/object)/P(image/non-object)>P(non-object)/P(object). But the distribution of P(image/object) and P(image/non-object) is not defined (known), then the probability of these distribution is estimated by the histogram of 17 attributes(patterns) of a training set for object images and non-object images. Each attribute is a 8 efficient set of 3 level wavelet transform. We get the probability for each bin in each histogram(attribute) from the data training. Then by applying same process in a query image, we just compute the above formula by looking up the histogram we got with the attribute value (8 value set) to get the probability of the attribute as image/object and image/non-object. The related geometric position of the attribute is also incorporated to compute the product of histograms. The final form of the detector is given by:



A Histogram-Based Method for Detection of Faces and Cars (ICCP 2000, Henry Schneiderman and Takeo Kanade, Robotics Institute, CMU)

Same as above paper

Relevance Feedback Methodology
  • re-weighting approach, to associate larger weights with more important dimensions and small weights with unimportant ones. For a set of relevant documents and non-relevant documents given by the user, the new query is moved towards good example points and away from bad example ones.
  • probability approach, use Bayesian learning to incorporate user's feedback to update the probability distribution of all the images in the database. They consider the positive examples as a sequence of independent query and try to minimize the retrieval error by Bayesian rules.
  • Assuming (1) that all positive examples in one retrieval iteration belong to the same semantic class with common semantic object(s) or meaning(s) and (2) that the features from the same semantic class follows the Gaussian distribution, we can use all positive feedbacks of this query iteration to calculate and update the parameters of the corresponding semantic Gaussian class. Then we use a Bayesian classifier to rerank the images in database by posterior probabilities.

Extraction of Feature Subspace for Content-based Retrieval Using Relevance Feedback (ACM MM 2001, Zhong Su, Tsinghua Univ. Stan Li, Hongjiang Zhang, Microsoft Research China)

Multiple types of features are used. Principal Component Analysis is incorporated into a Bayesian feedback process to extract efficient features of reduced dimensionalities. The method is an improvement of the third approach above. Apart from updating the Gaussian parameters of each feature type, the number of PCA feature dimensionalities is updated either. In other word, the RF process plays two roles: providing information for updating the Gaussian parameters in the Bayesian feedback, and providing evidence for the adjustment of features (e. g., color and texture) are used, a method is proposed to allow different subspace dimensionalities for different features. This accounts for differences between individual feature subspace as based on evidence obtained so far from recent RF. The importance of each type of features is described as following to each iteration of RF:


, good feature will has small distance from its positive examples to the the query class center comparing to other images in database. The Mi is sorted, the higher ranked features will be given more PCA components to than lower ranked features.

Support Vector Machine Active Learning for Image Retrieval (Simon Tong, Stanford; Edward Chang, UCSB)

Using relevance feedback to determine the query concept of each query request.
performs the following for each round of relevance feedback:

  • Learn an SVM on the current labeled data
  • If this is the first feedback round, ask the user to label twenty randomly selected images. Otherwise, ask the user to label the twenty pool images closest to the SVM boundary (active learning) to minimize the version space of the parameter space W

After the relevance feedback rounds have been performed SVM retrieves the top-k most relevant images:

  • Learn a final SVM on the labeled data.
  • The final SVM boundary separates "relevant" images from irrelevant ones. Display the k "relevant" images that are farthest from the SMV boundary

In addition, the image retrieval system employs a multi-resolution image representation scheme. In this scheme, they characterize images by two main features: color and texture. The rationality comes from that for some visual tasks, our eyes may select filters to obtain coarse image features; for others, they require finer features. In color and texture, both coarse, medium and fine representation are included.

An active Learning Framework for Content-based information retrieval (Cha Zhang, IEEE Trans. on Multimedia, Vol.4, No.2 2002)

Combine the subjective annotation (semantic) distance and low feature distance to calculate the similarity between a query and objects in database. Active learning (uncertainty and probability density of objects based) are proposed to select the most useful unannotated sample to sample.

Inductive VS Transductive Learning

Inductive learning: generalize for any future test set

Transductive learning: predict the classes for a specific test set, in transduction we use information from the given test set

AI Issues Challenge: Where is the Impact of Bayesian Network in Learning? (Nir Friedman et al.)

A Bayesian network consists of two components. The first is a directed acyclic graph in which each vertex corresponds to a random variable. This graph represents a set of conditional independence properties of the represented distribution: each variable is probabilistically independent of its non-descendants in the graph given the state of its parents. The second component is a collection of local interaction models that describe the conditional probability

   of each variable given its parents

Together, these two components represent a unique joint probability distribution over the complete set of variables X. The joint distribution is given by the following equation:

It is often difficult and time-consuming to construct Bayesian networks from expert knowledge alone, particularly because of the need to provide numerical parameters. This observation, together with the fact that data is becoming increasing available and cheaper to acquire has led to a growing interest in using data to learn both the structure and probabilities of a Bayesian network.


Efficiency design and data structure Techniques and data structure for efficient multimedia retrieval based on similarity (Guojun Lu, IEEE Trans. on Multimedia, Vol. 4, No.3)

1. Filtering processes for reducing search space

     a) Filtering with classification, structured attributes, and keywords

     b) Methods based on the triangle inequality (for metric distance)

     c) Methods specific to color histogram-based retrieval, first use histogram with very few bins (or average of each color components) to select potential retrieval candidates, and then use the full histograms to calculate accurate distance between  the query and the potential candidates to get real results.

    d) Latent semantic indexing for vector space-based text IR. it's based on SVD decomposition. Any M (documetns) * N (terms in the collection) real matrix A can be expressed as . In the context of text document retrieval, the rank r of A is equal to the number of concepts. U can be thought of as the document-to-concept similarity matrix, while V is the term-to-concept similarity matrix. For example, u(2,3)=0.6 means that concept 3 has weight 0.6 in document 2 and v(1,2)=0.4 means that the similarity between term 1 and concept 2 is 0.4. Based on the SVD, we can store matrices U, S and V instead of A, reducing the storage requirement significantly. During retrieval, the query document similarity is calculated as follows. The query vector q in term-space is translated into qc in concept-space by multiplying by Vt as follows: . The similarity between the query and each of the documents is calculated as the dot product (cosine distance) between qc and each of the rows of U. Therefore, using LSI, we manipulate r-dimensional vectors instead of N-dimensional vectors during the similarity calculation.

2. Multi-dimensional data structures

     a) B+ tree and B tree for 1-D key searching

     b) Clustering

     c) Multidimensional B+ tree. The only difference is that the tree is organized based on the ordering of the regions instead of key values and each region corresponding to a list of feature vectors instead of a data record.

     d) K-d tree, the extension of the binary tree. The k-d tree is constructed as follows: at level 1, the tree is branched based on the value of the first dimension of the k-d vector; at level 2 the tree is branched based on the value of the second dimension of the k-d vector, and so on. this process continues until all dimensions have a turn and we then start from the first dimension again.

     e) Grid files, an extension of the fixed-grid access structure, effective when feature vectors are evenly distributed in the feature space. However, when feature vectors are unevenly distributed in the feature space, some grids will be empty or almost empty while others will be too full.

     f) R tree family , follow the rules of B+ tree with bounding rectangles, each bounding rectangle is represented by the coordinates of its lower-left and upper-right corners. Minimization of both coverage and overlap is crucial to the performance of the R-tree. R* tree is an R-tree improved by minimizing coverage. Based on the fact that the clustering of rectangles with low variance in the lengths of the edges tends to reduce the area of the cluster's covering rectangles. R+ tree is to overcome the overlap problem of internal nodes of the R-tree. SS tree and SS+ tree. The minimum bounding region used in an SS-tree is a sphere. The SS-tree split algorithm finds the dimension with highest variance in feature vector values and chooses the split location to minimize the sum of the variance on each side of the split. To insert a new feature vector, the branch or subtree which has the centroid closest to the new feature vector is chosen. The SS+ tree is similar to the SS-tree, the main difference is that the SS+ tree uses a tighter bounding sphere for each node which is an approximation to the smallest enclosing sphere, and a different split algorithm.

     g) TV tree

Some of the most of often discussed properties of a R-tree based multidimensional access structure are splitting heuristics, the shape of the bounding envelope of each node, the criteria used to choose the subtree to insert the new data, and the heuristics used to reduce overlapping between the nodes and bad decisions made earlier. These properties are not independent of each other. The splitting heuristic and substree selection criteria will determine the shape of each node and hence the goodness of the bounding envelope. The goodness of the bounding envelope will also determine the extent of the overlapping between nodes. The heuristics used to reduce overlappings and early bad decisions will also affect the shape of the nodes.

Distance (similarity) Calculation Similarity Estimation Techniques from Rounding Algorithms (Moses S. Charikar, Proceedings of the Symposium on Theory of Computing,2002)

Properties of Embedding Methods for Similarity Searching in Metric Spaces (Gisli R. hjaltason, hanan Samet, IEEE Trans. on PMAI, Vol. 25, No. 5, 2003)

Searching in Metric Spaces (Edgar Chavez et al, ACM Comp. Survey Vol. 33)

(1) Applying devised data structures which make extensive use of coordinate information to group and classify points in the vector space, e.g. R-tree, KD-tree, X-tree.

(2) Embedding the data objects in a vector space so that the distances of the embedded objects approximates the actual distance. Thus, queries can be performed (for the most part) on the embedded objects. Many use dimension reduction methods for Euclidean space. These methods need to satisfy contractiveness property to get 100% recall (no false dismissals).

(3)Applying approximation algorithms to approximate the distance between objects (sets). A locality sensitive hashing scheme is a distribution on a family F of hash functions operating on a collection of object, such that for two objects x, y,


is some similarity function defined on the collection of objects. Such a scheme leads to a compact representation of objects so that similarity of objects can be estimated from their compact sketches, and also leads to efficient algorithms for approximate nearest neighbor search and clustering. The problem is to construct locality sensitive hash functions from the similarity function. A set of constructed hash functions are applied to the objects to get a sketch vector. Applications of locality sensitive hash functions to solving nearest neighbor queries typically reduce the problem to the Hamming space (space  based on Hamming distance: A measure of the difference between two messages, each consisting of a finite string of character S, expressed by the number of characters that need to be changed to obtain one from the other. E.g., 0101 and 0110 has a Hamming distance of two whereas "Butter" and "ladder" are four characters apart

Intensity Thresholding 1. Iterative threshold selection: select the mean value as initial value and repeat iteratively

2. Otsu's method: select the lowest point between two classes (peaks): analysis of standard deviation

3. Entropy based thresholding: select the biggest threshold which generates the largest entropy of two classes

4. Moment preservation

5. Minimum Error: assumes the histogram is composed of two normally distributed classes of intensities

Statistical Analyis 1. Methods to estimate the probability density function

Estimate the probability density of a data point (position) without knowing their distribution. Method 1:

kernel density estimator: choose the kernel as an isotropic Gaussian function (assume the features has been normalized). The window of the estimation is a hyper-sphere centered at the concerned point (position) Oi,. Let the radius of the super-sphere be ri, which was named the bandwidth of the kernel density estimator in the literature. Normally, ri=r, for i=1,2,.. N, where r is a constant bandwidth. let xi be the feature vector of object Oi. The density estimation at the position where Oi locates is given by: , for i=1,2,...,N.

For simplicity, we choose a constant bandwidth r based on the maximum distance from any abject to its closet neighbor D, i.e.

2. Methods to approximate value of a sample based on all known values of other samples, biased kernel regression

, the weights is ,

where rm is the bandwidth for sample xm.

This method isn't like Gaussian model parametrical estimation, we don't assume a Gaussian model on the distribution of data. it imposes local  structure assumption on the feature space to fit the samples and it's better than what a global structure does.

3. Unsupervisored learning:

  •         K-mean clustering
  • Expectation-Maximization (EM) algorithm to suppose each cluster is a Gaussian distribution and calculate the parameters (weight for each cluster, mean value and covariance matrix for each cluster) by iteration to maximize the likelihood. So for each sample, we have a probability for each clusters for containing it. then we can assign the cluster which has the highest probability to this sample. To decide the number of clusters, apply the Minimum Description Length (MDL) principle to select among values of K (number of clusters).  See detail on pp. 1031 on paper "Blobworld: Image segmentation using expectation-maximization and ita application to image querying", IEEE trans. on PAMI, vol. 24, no.8,  2002

Affine Transform An Affine transformation is a geometrical transformation which is known to preserve the parallelism of lines but not lengths and angles.

rotation, scale, translation, shear (skew)

Optimization methods solve constrained optimization problems with Lagrangian function. Considering a problem of minimizing a function

, where is linear constraints that can be expressed as


if L is differentiable, then the solution to the optimization problem is readily obtained.

Basic Probability Equations Probability independent

Incomplete (missed) data Bayesian rule

Expectation of conditional probability:

Color Quantization and Color Map; Color histogram distance
  • Uniform quantization (independent on image itself)
  • Populosity Algorithm (histogram based, create different image map for different images)
  • Median Cut (choose median color in the longest dimension of color space, create different image map for different images)
  • Optimization Algorithm (results in the lowest quantization error, dependent on each image)
  • The distance between two color histograms incorporating the similarities between color bins:
    , where is a symmetric matrix of weights between 0 and 1 representing the similarity between the bin centers; neighboring bins have a weight 0.5. This distance measure allows us to give a high score to two regions with similar colors, even if the colors fall in different histogram bins.

      last modified: 6/20/03, Ruofei Zhang