Data Mining and Machine Learning Tools

This summer I am trying to improve my understanding of various approaches, tools and methods in machine learning. Here is a list of data-mining tools which I have come across. The interesting aspect of these tools is that I had never heard about any of these before. In subsequent posts, I plan to cover some of them in greater detail.

  1. Weka 3: Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. It is a Java based software.
  2. scikit-learn: Simple and efficient tools for data mining and data analysis. Based in Python.
  3. sofia-ml: The suite of fast incremental algorithms for machine learning (sofia-ml) can be used for training models for classification, regression, ranking, or combined regression and ranking. Several different techniques are available. This release is intended to aid researchers and practitioners who require fast methods for classification and ranking on large, sparse data sets.
  4. Vowpal-Wabbit :┬áThere are two ways to have a fast learning algorithm: (a) start with a slow algorithm and speed it up, or (b) build an intrinsically fast learning algorithm. This project is about approach (b), and it’s reached a state where it may be useful to others as a platform for research and experimentation. The Vowpal Wabbit (VW) project is a fast out-of-core learning system sponsored by Microsoft Research and (previously) Yahoo! Research.