When to use what for Data Mining.
It is been a while since MVP Miguel Egea ( @miguelEgea ) introduce me to data mining in SQL Server, I was already finishing my graduate project on Artificial Intelligence using WEKA back in 2005, I found the product really interesting, and decided to learn everything about it.
We ( Solidq.com ) have been delivering solutions based on SQL server Analysis Services data mining since then. For long years that was pretty much what we had within Microsoft. Also, the Excel/Visio add in that came for SQL 2005.
Today, I was delivering a session about Big Data Mining with Mahout at SQL Saturday Nashville when someone asked me. Where do I start?, what would be the right Microsoft data mining tool for learning.
WOW, it is true, there have been many news in last two years so….
Here goes what we have:
If you are starting right now. Go for Azure Machine Learning, as for 01/2014 you can go to https://studio.azureml.net/ and start running experiments for free. It is well documented and come with a lot of samples.
Also you can read the book, for a great introduction: http://www.amazon.com/Predictive-Analytics-Microsoft-Machine-Learning
Azure Machine Learning is my first choice for POC and real projects. There will always be pieces that can only be solve with customization (tsql, ssis, hive, pig, SSAS, Mahout) but by default it is my data mining tool at this point.
There is a 10GB limit for training sets, 10GB is already a big number, and there are always ways to divide and conquer if larger dataset must be used for training ( rare…)
Paco Gonzalez —