CSE652: Knowledge Discovery and Data Mining

Project # 2

Due Date/Presentation: May 05, 2010

  • Implement the following clustering algorithms
    • K-Means
    • Kohonen Self-Organizing Map
    • Adaptive Resonance Theory
  • Using Mushroom, Diabetes or any other data set, compare the performance of your implementation with the techniques available in KNIME, Weka, RapidMiner and SQL Server.
  • You can take the assumption that all the attributes are categorical. If they are not, then either use your Project 1 implementation or Weka to categorize continuous data attributes.
  • Use your relevant feature selection implementation (unsupervised learning) of Project 1 to remove irrelevant columns and analyze the performance of your clustering algorithms both before and after this removal process.