What is Big Data?
The rise and quiet death of 'Big Data' as a buzzword -- and how its core ideas evolved into modern data engineering and AI.
title: "What is Big Data?" slug: "What-is-Big-Data" description: "The rise and quiet death of 'Big Data' as a buzzword -- and how its core ideas evolved into modern data engineering and AI." datePublished: "2012-09-15" dateModified: "2026-03-15" category: "Data Strategy" tags: ["big data", "data engineering", "AI training data", "cloud"] tier: 3 originalUrl: "http://www.applieddatalabs.com/How-to-data-science/What-is-Big-Data" waybackUrl: "https://web.archive.org/web/20120915094235/http://www.applieddatalabs.com:80/How-to-data-science/What-is-Big-Data"
What is Big Data?
I'll give myself credit for one thing: in 2012, I called "Big Data" a buzzword right in the opening sentence. "What is Big Data? It is a buzzword plain and simple." I went on to explain what it actually meant, but at least I didn't pretend the term itself was precise. Fourteen years later, nobody says "Big Data" anymore. The phrase died so completely that using it in a meeting would get you odd looks. But the concepts it described? They took over everything.
What We Wrote in 2012
We described Big Data as data that is "significantly large" -- big data is big data -- and then spent the rest of the piece explaining why that mattered. Eric Schmidt, then CEO of Google, had said we created as much data every two days as existed from the dawn of civilization through 2003. We talked about the Quantified Self movement, RFIDs giving everything a location sensor, and sensor networks in mobile devices.
We made a point about data that I still think is underappreciated: "With large amounts of data, 1 + 1 = 4." Combining two datasets doesn't just give you the sum of the parts. It gives you exponential return on information because you see connections between the datasets that were invisible when they sat in silos. Data marketplaces were starting to appear -- Microsoft Azure, Infochimps -- selling demographic data that became valuable only when you joined it with your own.
We also identified the real bottleneck: traditional business intelligence tools couldn't handle the integration problem. "They may have prettier interfaces than they used to, but they aren't able to find the right data to show you. They can't mix two pools of data together."
Nobody says "Big Data" at conferences anymore. But every company's AI strategy depends on the exact problems that term was trying to describe.
Big Data Died. Its Problems Won.
The phrase "Big Data" peaked in Google Trends around 2014 and has been declining ever since. By 2020, it had been replaced by more specific terms: data engineering, ML ops, AI training data, data mesh, lakehouse architecture. Each of these solved a piece of what "Big Data" had vaguely gestured at.
The volume problem got solved by cloud storage. When S3 costs pennies per gigabyte per month, the idea that data is "too big" stops making sense. You just store it.
The velocity problem got solved by stream processing. Apache Kafka, Amazon Kinesis, and modern event architectures handle real-time data flows that would have overwhelmed any 2012 system. The sensor networks and mobile devices we mentioned have multiplied a thousandfold, but the infrastructure caught up.
The variety problem -- the hardest one -- got partially solved by data lakes and then lakehouses. Dumping structured and unstructured data into one system and querying across it is now standard practice. But the messy work of entity resolution and schema alignment that we described in 2012 is still genuinely hard. LLMs are finally making a dent here, using language understanding to match records that share no common keys.
The most profound change is what happened to all that data. In 2012, we talked about data as a resource for finding information. We couldn't have predicted that data would become the raw material for training AI systems that can reason, write, and code. The biggest consumer of "Big Data" today isn't a dashboard or a business analyst. It's a training pipeline feeding tokens into a language model.
Our prediction that "when analytics truly gets hold of big data, the incredible power of this information will change the face of society" was more right than we knew. We just couldn't have guessed the mechanism would be neural networks rather than traditional analytics.
The Operational AI Connection
The death of "Big Data" as a term and its resurrection as a set of real engineering problems is exactly why strategic thinking about data matters more than following buzzwords. Every AI system depends on data quality, data volume, and data integration -- the exact problems Big Data described. The organizations winning at AI today are the ones that built strong data infrastructure over the past decade. Operational AI success starts with solving those foundational data problems.