In a world of big data, insight is king. And whether it’s detecting the likelihood of fraud from millions of credit card transactions, giving businesses useful insights into their customers, or pinpointing distant galaxies from a mass of astronomical data, one IT discipline is becoming an increasingly bright star: data mining.


The concept first came to commercial prominence about 15 years ago. For several years, you couldn’t attend an IT conference without at least one of the speakers wheeling out the urban myth of giant US retailer Walmart’s use of the technology to uncover previously unnoticed patterns in shoppers’ purchasing habits.

Most famously, it is said, the company found out if could sell more alcohol in the evenings if it displayed a selection of beer next to its baby products. The reason for this is that fathers of young children were routinely being asked by their wives to pick up nappies on the way home from work. If dad saw a crate of beer while he was in the supermarket, he’d be far more likely to grab one.

Back then, data mining was largely the preserve of very large corporations with huge stores of data. But with the exponential growth of mobile technology, social networking and cloud computing in the intervening years, not only is there a lot more unstructured data out there that could potentially be hiding valuable insights, but a lot more businesses can afford to make use of it.

Today, many organisations are trying to develop ‘big data’ strategies.  And while there’s less hype around the concept of data mining than there once was, it remains a critical – and growing – discipline for any data scientist. But what exactly is it, and how does it differ from business analytics?

Essentially, data mining is a subset of analytics that uses mathematical algorithms to examine vast datasets and uncover previously unseen patterns and correlations. It differs from other types of statistical analysis in that, rather than testing a set hypothesis, it slices and dices data in many different ways until it spots something interesting.

Correlation and causality

Yet while data mining techniques can throw up new and fascinating patterns in data, it still takes the skills of an experienced data scientist to weed out what’s genuinely useful from what’s merely interesting. Automated pattern recognition can identify a bunch of things in data that are correlated, but that doesn’t necessarily mean there is any causal link between X and Y.

Source: www.computerweekly.com