CS 435/535 Introduction to Data Mining     Fall 2012


This data mining course introduces the concepts, algorithms, techniques, and applications of data mining. Topics include background of data mining, data preprocessing, classification, clustering, association-rules mining. This course is designed for CS graduate students, while senior CS undergraduate students interested in the field are also encouraged to take this course.

Class Schedule: T R 2:50 PM - 4:15 PM

Classroom:  LN G335

Instructor: Dr. Lei Yu  

TA: Peng Liu

Telephone:  (607) 777-6250


Email: lyu AT cs DOT binghamton DOT edu  

Email: pliu3 AT binghamton DOT edu

Office Location: G16, Engineering Building

Office Location: N23, Engineering Building

Office Hours: T R 1:30PM - 2:20PM or by appointment

Office Hours: M W 3:00PM - 4:00PM


  • Required courses: CS 333 (Algorithms) and MATH 327 (Probability with Statistical Methods), or equivalents
  • Programming: course projects can be implemented in any popular programming languages, such as C, C++, or Java. No programming-specific issues will be covered in this course.


  • Background of knowledge discovery and data mining
  • Data preprocessing  (e.g., data cleaning, transformation, dimensionality reduction, instance selection)
  • Classification (e.g., decision trees, Bayesian classifiers, instance-based classifiers, rule-based classifiers, support vector machines)
  • Clustering (e.g., K-means, hierarchical clustering, density-based clustering)
  • Mining association rules (e.g., Apriori, FP-growth)


  • Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison-Wesley, April 2005.


There will be 4 assignments in the form of written exercises on key concepts and algorithms.

Project (required only for graduate students):

There will be a group (of two students) project involving implementation of decision tree algorithm and standard model selection procedure.

Presentation (required only for graduate students):

Each student will be required to give one presentation on a selected topic (a list of topics given by the instructor).


There will be several quizzes and two exams in class.


For undergraduate students: final grades will be based on quizzes (10%), homework (4 assignments, 40%), Exam I (25%), Exam II (25%), project (5% bonus), presentation (5% bonus). 

For graduate students: final grades will be based on quizzes (10%), homework (4 assignments, 20%), project (15%), presentation (15%), Exam I (20%), Exam II (20%).

Academic Integrity:

Discussion of general concepts and questions concerning the homework assignments among students is encouraged. However, each of you is expected to work on the homework solutions on your own. Sharing of any part of solutions is prohibited. If you are unclear about the policy, please consult with the instructor before you act. Suspected cases of academic misconduct will be pursued fully in accordance to the Student Academic Honesty Code of Thomas J. Watson School of Engineering and Applied Science, Binghamton University.

Late Policy:

Each assignment is due at the beginning of class on the due date. Any assignment received within the next 24 hours will be penalized by 20% of the full credit; any assignment received within the time between 24 hours and 48 hours pass the deadline is penalized by 50% of the full credit; No assignment will be accepted after 48 hours pass the deadline. Rare exceptions of this policy may be made at the discretion of the instructor under demonstrably circumstances.