Did Deepseek use OpenAI to train its model? Probably, but the real issue is much bigger than that.

The recent buzz around DeepSeek potentially using OpenAI’s model outputs to train R1 has sparked debates on what constitutes ""stealing'' of the proprietary LLM models’ intellectual property.

But this misses the point.

The real issue is that we’re witnessing a dangerous trend: the privatization of humanity’s collective knowledge for the sake of profit. It’s time we said enough is enough.

We live in the age of information, yet a select few companies are racing to build proprietary large language models (LLMs) by vacuuming up vast swaths of publicly available data—your data, my data, our data—without compensation or even proper attribution.

They then lock this information behind paywalls and proprietary interfaces, effectively creating walled gardens of human thought, induced with the company’s and their country’s biases. This is not just unfair; it’s fundamentally wrong and poses a significant threat to the future of knowledge and innovation.

We live in a world where a handful of large corporations control access to the sum of human knowledge, where groundbreaking research, artistic expressions, and insightful perspectives are trapped behind a subscription model or exchanged for your behavioral data to advertisers.

This isn’t a dystopian fantasy; it’s the end game if we continue to allow the unchecked development and deployment of proprietary LLMs built on scraped public data.

The alternative is a radical shift towards open-source LLMs and open knowledge ecosystems. This isn’t about naive idealism; it’s about building a more robust, equitable, and, ultimately, more innovative future.

Here’s why:

Public scrutiny leads to more transparent models instead of black boxes.

Instead of concentrating power in the hands of a privileged few, open-source models empower a global network of innovators to build upon existing knowledge, fostering a vibrant ecosystem of diverse applications and perspectives.

Open-source models, built on ethically sourced and shared datasets, help protect and expand our public domain and common heritage, enabling us to use AI to help the whole humanity advance.

Imagine a future where humanity’s combined knowledge and cognitive power are harnessed to solve the world’s most pressing challenges. Open-source LLMs, trained on a vast, open repository of human knowledge, can unlock this potential.

They can serve as powerful tools for collaborative problem-solving, scientific discovery, and artistic creation, accelerating human progress in unprecedented ways.

Would you agree?

PS. The image is real. I asked this very question from ChatGPT, and it refused to answer. Censorship is not only a Chinese thing.