Blank

Data Mining & Predictive Analytics

Tom Khabaza

Nine Laws of Data Mining

by Tom Khabaza

This page will be extended during the first quarter of 2010 to include and explain all nine laws. If you prefer brevity, see my tweets: twitter.com/tomkhabaza.

Data mining as a field of practise came into existence in the 1990s, aided by the emergence workbenches which packaged data mining algorithms so as to be suitable for business analysts. Perhaps because of its origins in practice rather than in theory, relatively little attention has been paid to understanding the nature of the data mining process. The development of the CRISP-DM methodology in the late 1990s was a substantial step towards a standardised description of the process that had already been found successful and was and is followed by most practising data miners.

Although CRISP-DM describes how data mining is performed, it does not explain what data mining is or why the process has the properties that it does. Here I propose nine maxims or “laws” of data mining (most of which are well-known to practitioners), together with explanations where known. The aim is to begin a theory that explains (and not merely describes) the data mining process.

It is not my purpose to criticise CRISP-DM; many of the concepts introduced by CRISP-DM are crucial to the understanding of data mining outlined here, and I also depend on CRISP-DM’s common terminology. This is merely the next step in the process that started with CRISP-DM.

——————————————————————————————————