Data Mining & Predictive Analytics

Tom Khabaza


Nine Laws of Data Mining—Part 3

by Tom Khabaza


Continued from “9 Laws of Data Mining” & “9 Laws of Data Mining—Part 2”






The 9 Laws of Data Mining are simple truths about data mining.  Most of the 9 laws are already well-known to data miners, although some are expressed in an unfamiliar way (for example, the 5th, 6th and 7th laws).  Most of the new ideas associated with the 9 laws are in the explanations, which express an attempt to understand the reasons behind the well-known form of the data mining process.


Why should we care why the data mining process takes the form that it does?  In addition to the simple appeal of knowledge and understanding, there is a practical reason to pursue these questions. 


The data mining process came into being in the form that exists today because of technological developments – the widespread availability of machine learning algorithms, and the development of workbenches which integrated these algorithms with other techniques and make them accessible to users with a business-oriented outlook.  Should we expect technological change to change the data mining process?  Eventually it must, but if we understand the reasons for the form of the process, then we can distinguish between technology which might change it and technology which cannot. 


Several technological developments have been hailed as revolutions in predictive analytics, for example the advent of automated data preparation and model re-building, and the integration of business rules with predictive models in deployment frameworks.  The 9 laws of data mining suggest, and their explanations demonstrate, that these developments will not change the nature of the process.  The 9 laws, and further development of these ideas, should be used to judge any future claims of revolutionising the data mining process, in addition to their educational value for data miners.


I would like to thank Chris Thornton and David Watkins, who supplied the insights which inspired this work, and also to thank all those who have contributed to the LinkedIn “9 Laws of Data Mining” discussion group, which has provided invaluable food for thought.




Copyright (c) Tom Khabaza 2010-11.