other information

  • Pattern change discovery between high dimensional datasets
  • Problem Description

    Problem with the norm-based distance measurements

    The norm-based distance measurements, such as the l-1 norm or the Euclidean distance, are unable to distinguish subspace change from magnitude change. This becomes an issue when one attempt to measure the change between to high dimension data matrices. Fig.1 demonstrate the limitation of the Euclidean distance. Other norm-based distance share the common issue.

    Fig.1   The Euclidean distance between v1 and v2, v1' and v2' are the same. It cannot distinguish the magnitude change from the change of vector rotation.

    Consequence in the real world application

    In quite a few real-world applications, high dimensional data per se do not contribute to the data vectors' magnitude change, but to a new combination of a certain subset of the features. For example, we do not intend to conclude that the difference between a human baby and an adult is the same as that between the baby and a little monkey; a banker is not interested in the volume of the financial news but the newly emerged key words; to examine the mutation of a DNA sequence, a biologist needs to find the new combination of Adenine and Guanine instead of the DNA data size change. In these cases, the change of feature subspace should not be confused with the change of data's magnitude.

    Why do Principal angles fit?

    In order to measure the subspace difference between two high-dimensional data sets, we apply the notion of the principal angles between the subspaces firstly introduced by Golub. Principal angles between two subspaces is a natural generalization of an angle between two vectors as the rank goes from one to n, where n&ge 1. Same as the angle between two vectors, principal angles bear a property to isolate subspace change from the magnitude change.

    What can our algorithm do?

  • Detect new patterns from time evolutionary data
  • Detect new event topics from news data streams
  • Detect abnormal events from video streams
  • Experiment results and demos can be found here

    Resource download

  • Matlab code and data download (tested under Matlab 2012a)
  • Paper download