Module 3 of 6

AI me te raupatu matihiko

AI and digital extraction — tracing the colonial logic inside machine learning.

Tohu o tēnei akoranga — Module symbol

Niho taniwha

taniwha teeth — resistance, warrior strength against extraction

This module carries niho taniwha — the teeth of the taniwha. The taniwha is not simply a monster; it is a guardian of boundaries. When data is extracted without consent, the taniwha teeth are the challenge: who has the right to cross this threshold?

He kupu whakataki — Introduction

Raupatu — dispossession — is one of the defining words of Māori history. It describes the large-scale confiscation of Māori land under the New Zealand Settlements Act 1863. Today, a new form of raupatu is underway: the extraction of Māori data, cultural knowledge, and linguistic heritage by AI systems built and owned by corporations with no relationship to Māori communities.

This module examines the structural logic of AI development and asks: in what ways does the current AI economy reproduce the extractive patterns of colonialism?

"The model is trained on your words, your stories, your knowledge. The profit goes to Silicon Valley. This is colonialism with better graphics."

How AI systems extract cultural data

Large language models (LLMs) like GPT-4 and its successors are trained on massive datasets assembled from the internet. These datasets include vast quantities of te reo Māori text, Indigenous knowledge, oral traditions transcribed and published online, and community-generated content — all gathered without consent, and all used to build commercial products.

Web collection at scale — AI companies collect publicly accessible text from across the internet, including iwi websites, academic archives, and digital heritage collections.
Language model training — this data is used to train models that can generate text and perform translations in te reo Māori — a capability that has commercial value.
No attribution, no consent, no benefit-sharing — the communities whose knowledge made these capabilities possible receive nothing.

The myth of "public data"

AI companies defend these practices by pointing to the "public" status of the data they collect. This argument fails on multiple grounds:

Māori communities often publish cultural content online for community benefit, not for extraction by corporations.
The concept of "public" in Western law does not map onto tikanga frameworks, which distinguish between knowledge that is freely shareable and knowledge that carries restrictions (tapu).
Legal availability does not imply ethical permission — a principle recognised even within mainstream research ethics.

"Just because you can take something does not mean you have the right to take it. Māori knew this long before intellectual property law existed."

AI, power, and political economy

The AI economy is highly concentrated. A small number of US corporations — OpenAI (backed by Microsoft), Google, Meta, Amazon — dominate foundational AI development. These corporations operate under US law, answer to US shareholders, and are not subject to tikanga Māori, Treaty obligations, or New Zealand sovereignty claims over Māori cultural data.

This concentration of AI capability mirrors the concentration of colonial land ownership in the 19th century. The mechanisms differ, but the underlying dynamic — the capture of collective resources by private actors — is structurally similar.

Key concepts

Raupatu matihiko Digital dispossession; extraction of Māori data without consent
Large language model An AI system trained on large quantities of text data
Training data The dataset used to build an AI model
Tapu The sacred or restricted status of certain knowledge and cultural material

Pātai — Discussion questions

Is the training of AI models on publicly available Māori language data a form of raupatu? Defend your answer.
How does the concentration of AI development in US corporations create particular risks for Indigenous communities globally?
What practical mechanisms could give Māori communities meaningful control over how their data is used in AI training?
Is there a version of AI development that would be acceptable from a Māori data sovereignty perspective? What would it look like?