Applied Data Labs
·Data Strategy

What is Data Mining?

Data mining from supermarket basket analysis to large language models -- the core concepts survived even as everything else changed.


title: "What is Data Mining?" slug: "What-Data-mining" description: "Data mining from supermarket basket analysis to large language models -- the core concepts survived even as everything else changed." datePublished: "2012-09-15" dateModified: "2026-03-15" category: "Data Strategy" tags: ["data mining", "machine learning", "deep learning", "LLMs"] tier: 3 originalUrl: "http://www.applieddatalabs.com/How-to-data-science/What-Data-mining" waybackUrl: "https://web.archive.org/web/20120915093637/http://www.applieddatalabs.com:80/How-to-data-science/What-Data-mining"

What is Data Mining?

In 2012, I explained data mining using a supermarket example. Beer and diapers on Thursday. Young fathers stocking up for the weekend. Put chips between the two aisles and watch sales climb. That example was already a cliche when I used it, and it's ancient history now. But the underlying concept -- finding patterns in data that humans can't see -- didn't just survive. It became the foundation of a multi-trillion dollar industry.

What We Wrote in 2012

We defined data mining as "the identification of correlations and patterns hidden in data that provide insight into decisions and help companies understand their businesses better." We broke the concept into three levels: data (raw facts), information (patterns and correlations), and understanding (actionable knowledge that drives outcomes).

The framework was simple enough that anyone could follow it. Data about grocery purchases becomes information when you spot the beer-and-diapers correlation. Information becomes understanding when you realize why it happens and use that knowledge to increase revenue. We noted that "recent advancements in technology have made it significantly easier and better" and that companies were using these tools "to reduce costs and/or increase profits."

That framing -- data, information, understanding -- still holds. What's changed beyond recognition is the technology stack sitting between raw data and actionable insight.

We used to explain data mining with beer and diapers. Now data mining explains itself -- and writes its own code to do it.

From Data Mining to Machine Learning to LLMs

The term "data mining" has mostly disappeared from serious technical conversation. It evolved through several stages, each one absorbing the previous.

First came the machine learning rebrand. Around 2015, the industry started calling the same techniques -- clustering, classification, regression, association rules -- "machine learning" instead of "data mining." The math didn't change much. The tooling improved dramatically. Scikit-learn, XGBoost, and cloud ML services made it possible to build models that used to require PhD-level expertise.

Then deep learning arrived and actually changed the math. Neural networks with many layers could find patterns in images, text, and audio that traditional algorithms couldn't touch. Computer vision, speech recognition, and natural language processing went from research projects to production services between 2015 and 2020. The supermarket wasn't just correlating transactions anymore -- it was recognizing shoppers on camera and predicting what they'd buy before they reached the shelf.

Now we're in the LLM era, and the shift is even more fundamental. GPT-4, Claude, and their successors don't just find patterns in structured data. They understand language, reason about problems, and generate new content. A modern data analyst doesn't write SQL queries to find correlations. They ask an AI assistant to analyze a dataset and explain what it found, in plain English.

But here's what I find most interesting: the three-level framework from 2012 still applies. LLMs consume data (training corpus), extract information (learned representations), and produce understanding (useful outputs). The abstraction layers have multiplied, but the core concept is identical. We're still mining data. We just stopped calling it that.

The tools are wildly different, though. In 2012, data mining meant SAS, SPSS, or maybe R if you were adventurous. Now it means PyTorch, Hugging Face, and API calls to foundation models. The barrier to entry collapsed and then rebuilt itself at a different level -- you don't need to understand statistics to use AI, but you need to understand AI to use it well.

The Operational AI Connection

The evolution from data mining to AI proves that core analytical thinking outlasts any specific tool. Organizations that built strategic foundations around data-driven decision making adapted to each wave of technology. Those that bet on specific tools -- SAS licenses, Hadoop clusters -- had to rebuild from scratch. The lesson for today's AI adoption is the same: invest in organizational capability, not just technology. Operational AI is about building the muscle, not buying the equipment.