Relational Data Community Discovery and Learning

Sponsor: National Science Foundation

This proposed project addresses a three year integrated research and education program focusing on engaging into an in-depth research on a series of fundamental, open, but very important issues leading to relational data community discovery and learning, built upon our existing strength on the state-of-the-art research on this topic.


The intellectual merit of this project includes the revolutionized understanding of the unsupervised general relational data clustering and learning as well as the expected breakthrough in the community discovery and learning methodologies that shall undoubtedly advance the literature of data mining and machine learning and generate profound impact in the related areas.


The broader impacts of this project are two folds. Educationally, the development, the implementation, and the evaluation of the innovative community outreach activities proposed in this project shall promote the timely and effective knowledge dissemination related to relational data mining and machine learning and shall further enrich the pedagogical literature; the disseminated knowledge to the collaborating parties, especially the collaborating high school, shall further advance and enhance the high school education services and syllabi and develop the model for high schools' research and services to the whole society. Technologically, the expected breakthrough in developing the novel theory on relational data community discovery and learning shall embrace a new era of technological revolution in a wide range of applications in the world, and in particular, shall benefit the collaborating organizations in developing and advancing their domain expertise in applications related to social network mining in general and Web data mining in particular.


It is well-observed that the whole world is full of data, and is also highly related in terms of the different types of the data objects such as people, organizations, and events. In many applications, it is intended to discover the hidden structures through such relationships involving different types of data objects in the world, in addition to "clusters" of the same type of data objects. For example, in financial services, it is often needed to identify any potential fraud activities reflected in the normal transactions that involve people and financial institutions; in commercial sales, it is often needed to link the customer purchase patterns to the potential sales promotion strategies to identify what kinds of customers are related to what kinds of commercial products through what kinds of service providers; in Web search industries, it is extremely desirable to identify what kinds of users use what kinds of Web pages and are highly influenced by what kinds of advertisements related to what kinds of commercial industries.

On the other hand, it is also true that it is too often that we do not have the luxury to have any training data with ground truth for knowledge discovery. Consequently, unsupervised relational data learning is expected and desired for all these situations.


In this research, we focus on the most general scenario of relational data: the data objects may have attributes, homogeneous relations (among data objects of the same type) and/or heterogeneous relations (between data objects of different types). Given such general relational data, all the practical situations are considered as the special cases of this general scenario, and thus the novel unified theory as well as the related methodologies we wish to develop in this research shall be applicable to any real-world relational data knowledge discovery problems, potentially resulting in revolutionary technology development and making the proposed work fundamentally new and uniquely distinct from all the existing literature. Consequently, we define a relational data community in the broad sense that includes not only the local clusters of the same type of data objects, but more importantly also the global, hidden structures incorporating relationships with different types of data objects.


Relational data community discovery and learning is a fairly new area with many challenging and fundamentally new issues completely open.

On the other hand, solutions to these issues may lead to revolutionary technology development that shall generate significant societal impacts.

The work to be accomplished in this project shall be radically new because it is based on innovative preliminary research and it is to address a set of fundamentally new problems with fundamentally new solutions that not only aim at developing a better in-depth understanding of the literature, but more importantly it is likely to generate revolutionary technology development with significant societal impacts.  Specifically, this project focuses on the following three objectives to be achieved synergistically: (1) to address a series of challenging, fundamentally new, but very important issues on relational data community discovery and learning to lead to the development of a unified, fundamentally new theory on this topic to have a better in-depth understanding of the literature; (2) to extensively evaluate the theory and methodologies to be developed in collaborations with the domain experts in Web search industries as a specific application to social network mining; and (3) to develop and evaluate the innovative community outreach and education activities through the existing partnership with a local high school to further promote the knowledge dissemination from this research.

NSF Project Manager:  Dr. Maria Zemankova

Project Personnel:

PI: Prof. Zhongfei (Mark) Zhang

PhD students:

NSF REU Students:



Code Release :

Data Release :


This material is based upon the work supported by the National Science Foundation under Award No. 0812114.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Go back to the Multimedia Computing Research Lab homepage