Andreas Weigend and the Future of Social Data

In 2013, I interviewed Dr. Andreas Weigend about where social data was headed. He was already one of the most credible people in the data world: Stanford professor, director of the Stanford Social Data Lab, and former Chief Scientist at Amazon. His answers were prescient in ways that still surprise me.

What Weigend Told Us

Weigend's dream for social data was transparency. He argued that companies used to make money by creating information asymmetry, using the used car salesman as his classic example. The new model was the opposite: companies making money by removing information asymmetry. His vision was a society where "because of transparency more than big data, people will be more comfortable, there will be less corruption, and there will be fewer human rights violations."

He was clear-eyed about the risks, though. "Every technology that can be used for good can also be used for evil," he said, pointing out that the same social data tools behind the Arab Spring were also used to track and target individuals.

When asked what he wanted to see in five years, Weigend predicted data production rates would double every half year, with social data making up the majority. He pushed companies to "tightly integrate social data into their product design." He described moving from "dataset to toolset to skillset" and ultimately to "mindset" as what he taught at Stanford and Berkeley. His hope was that people would "make their tradeoffs much more consciously and deliberately" about the data they create and share.

Weigend wanted us to think carefully about the data we share. Instead, we handed it all to AI companies and barely read the terms of service.

Weigend was right about the volume. Social data did explode. But he probably didn't anticipate the specific way it would become valuable. Every Reddit comment, every tweet, every blog post, every Stack Overflow answer became training data for large language models. OpenAI trained GPT on massive scrapes of internet text. Meta used public Facebook and Instagram posts. Google ingested YouTube transcripts.

This created a consent crisis Weigend might have predicted if you'd described it to him. People shared thoughts, opinions, and personal stories expecting human audiences. Those posts now live inside AI models that generate text, code, and images. Reddit signed a $60 million annual deal with Google for API access to its content. The writers and commenters who created that content got nothing.

The "mindset" shift Weigend hoped for never happened. Instead of people becoming more deliberate about sharing, the incentive structures of social platforms pushed maximum sharing. Likes, followers, and algorithmic amplification rewarded oversharing. By the time AI training became a concern, billions of posts were already baked into models.

Weigend's transparency dream partially materialized in product reviews and social proof, exactly as he predicted with his hotel review example. But the bigger story is how social data went from something people generated for each other to raw material for AI systems built by corporations. That wasn't the transparent, empowering future he described.

Operational AI and the Data Economy

The social data story raises hard questions about data strategy. Organizations building AI systems need to think carefully about where their training data comes from and whether its use is defensible. AI governance now has to account for the provenance and licensing of every dataset. Weigend's advice about mindset applies to enterprises too: understanding the ethical dimensions of your AI program isn't optional anymore.

What Weigend Told Us

Social Data Became AI Training Data

Operational AI and the Data Economy