Consensus Group Stable Feature Selection
Overview
Stability is an important issue in feature
selection from high-dimensional and small sample data. CGS is a feature
selection algorithm developed under a novel framework for stable feature
selection which first identifies consensus feature groups from subsampling of
training samples, and then performs feature selection by treating each consensus
feature group as a single entity. Experiments on both synthetic and real-world
data sets show that CGS algorithm is effective at alleviating the problem of
small sample size and leads to more stable feature selection results and
comparable or better generalization performance than state-of-the-art feature
selection algorithms.
CGS Software
This software package is prepared in Java. It is provided free of charge to the
research community as an academic software package with no commitment in terms
of support or maintenance.
- Java package and sample data sets can be downloaded from
here.
- README provides the details about how to run the code, how to
prepare the input data, and how to read the results.
- Two synthetic data sets described in the KDD-09 paper below are also provided.
- Source code of the software package is freely available at the request to
Lei Yu .
People
- Data Mining Research Lab, Binghamton University: Dr.
Lei Yu, Steven Loscalzo
- University of Texas at Arlington:
Dr.
Chris Ding
References
- Steven Loscalzo, Lei Yu, and Chris
Ding. "Consensus Group Based Stable Feature Selection". In Proceedings of
the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining (KDD-09), pages 567-576, Paris, France, June, 2009.
pdf
- Lei Yu, Chris Ding, and Steven
Loscalzo. "Stable Feature Selection via Dense Feature Groups". In
Proceedings of the 14th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining (KDD-08), pages 803-811, Las Vegas, NV, August,
2008.
pdf