CSE652: Knowledge Discovery and Data Mining

Project # 1

Due Date/Presentation: March 6, 2010

  • Write a routine to normalize continuous data.
  • Implement the following feature ranking algorithms:
    • Supervised Mean-Variance based Feature Ranking
    • Unsupervised Entropy-based Feature Ranking
  • Implement the following feature discretization algorithms:
    • Supervised Chi Square based Feature Discretization
    • Unsupervised Feature Discretization
  • Using KNIME, Weka and your implementation of normalization, feature ranking and feature discretization (above parts), work on the Brazilian data set used in PAKDD 2009 competition. Report the best subset of features along with the best discretization of continuous data.