Humanizing Data

I started Applied Data Labs with a conviction that data, handled well, could help us understand people better. Not "people" in the aggregate, statistical sense, but actual humans with complicated lives and messy problems. In 2012, the data industry was drunk on scale. Bigger datasets. Faster processing. More variables. The assumption was that if you could just collect enough data, understanding would follow automatically.

That assumption was wrong then, and it's spectacularly wrong now in the age of AI. More data doesn't automatically create more understanding. Sometimes it does the opposite. And the tension between efficiency and humanity has become the defining challenge for anyone building AI systems that affect real people.

What We Argued in 2012

Our original piece made a simple case: data about people should serve people. Every row in a database represents a human being with preferences, fears, habits, and contradictions that no schema can fully capture. When organizations treat data as an abstraction disconnected from the humans it represents, they build systems that are technically precise and humanly wrong.

We gave examples from healthcare, where patient data systems optimized for billing efficiency made it harder for doctors to actually understand their patients. From education, where standardized testing data told you a student's score but nothing about why they were struggling. From marketing, where customer segmentation models grouped people into neat categories that ignored the actual complexity of how people make decisions.

The argument wasn't anti-data. It was pro-context. Data without context is numbers. Data with context is insight. And context almost always requires human judgment.

Every row in a database represents a human being. AI systems that forget this don't just fail technically. They fail morally.

AI Made the Problem Bigger

Large language models can process more data about people than any human analyst could review in a lifetime. GPT-4 was trained on hundreds of billions of words of text. These models can generate plausible-sounding insights about customer segments, patient populations, and employee cohorts in seconds. And that speed is exactly where the danger lies.

When a human analyst spent weeks studying customer data, they developed an intuitive feel for the people behind the numbers. They'd notice weird outliers and dig into them. They'd talk to actual customers to validate what the data suggested. The process was slow, but the slowness created understanding.

When an AI generates a customer insight in 30 seconds, nobody develops that intuition. The output looks authoritative because it's articulate and confident. But the model has no actual understanding of the people it's describing. It's pattern-matching on text, not comprehending human experience. I've seen AI-generated customer analyses that were grammatically perfect, internally consistent, and completely wrong about why people were actually behaving the way they were.

The AI ethics community has been talking about this under various labels. "Responsible AI," "human-centered AI design," "AI fairness." These are important conversations, but they sometimes get lost in abstract principles. The practical version is simpler: does your AI system make things better for the actual humans it affects? If you can't answer that question specifically, with evidence, you've built a system that optimizes for metrics, not people.

The Tension Between Efficiency and Humanity

Here's where it gets uncomfortable. AI systems that prioritize efficiency often produce worse outcomes for people, and the efficiency gains look so good on paper that organizations implement them anyway.

Healthcare is the sharpest example. UnitedHealthcare's AI system for determining medical necessity was found to have a 90% error rate when its denials were appealed, according to a 2023 report. The AI was efficient at processing claims. It was terrible at understanding whether actual patients needed actual care. The system saved money in the short term by denying claims. It cost lives by denying treatment.

In hiring, resume screening AI reduced the time recruiters spent on initial review from minutes per resume to seconds. But Amazon famously scrapped its AI recruiting tool in 2018 after discovering it had learned to penalize resumes that contained the word "women's" (as in "women's chess club captain") because the training data reflected a decade of male-dominated hiring decisions. The AI was efficient. It was also perpetuating bias at scale.

Content moderation AI on social media platforms can review millions of posts per hour. But it regularly fails at understanding context, sarcasm, cultural nuance, and the difference between a news report about violence and an incitement to violence. Meta's own reports show that its AI moderation systems still struggle with languages other than English, meaning billions of users in the global south get worse moderation, which translates to more harassment and more exposure to harmful content.

Building AI That Actually Serves People

The organizations I've seen get this right share a common trait: they treat human judgment as a feature, not a bottleneck. They don't try to remove humans from the loop. They use AI to make humans better at understanding other humans.

This is what AI readiness actually looks like in practice. It's not just about having clean data and modern infrastructure. It's about designing systems that keep human understanding at the center. That means building feedback loops where the people affected by AI decisions can report when something goes wrong. It means validating AI outputs against real-world outcomes, not just model accuracy metrics. It means having governance frameworks that ask "who does this help?" and "who could this hurt?" before deployment.

The promise I saw in 2012, that data could help us understand people better, is still true. But it only works if we insist on it. Left to its own devices, the AI industry will optimize for speed, scale, and cost reduction. Humanizing data requires an active, deliberate choice to prioritize understanding over efficiency. That choice has to be built into the system design, not bolted on after complaints start rolling in.