Module 3 of 6
AI me te raupatu matihiko
AI and digital extraction — tracing the colonial logic inside machine learning.
taniwha teeth — resistance, warrior strength against extraction
This module carries niho taniwha — the teeth of the taniwha. The taniwha is not simply a monster; it is a guardian of boundaries. When data is extracted without consent, the taniwha teeth are the challenge: who has the right to cross this threshold?
He kupu whakataki — Introduction
Raupatu — dispossession — is one of the defining words of Māori history. It describes the large-scale confiscation of Māori land under the New Zealand Settlements Act 1863. Today, a new form of raupatu is underway: the extraction of Māori data, cultural knowledge, and linguistic heritage by AI systems built and owned by corporations with no relationship to Māori communities.
This module examines the structural logic of AI development and asks: in what ways does the current AI economy reproduce the extractive patterns of colonialism?
How AI systems extract cultural data
Large language models (LLMs) like GPT-4 and its successors are trained on massive datasets assembled from the internet. These datasets include vast quantities of te reo Māori text, Indigenous knowledge, oral traditions transcribed and published online, and community-generated content — all gathered without consent, and all used to build commercial products.
- Web collection at scale — AI companies collect publicly accessible text from across the internet, including iwi websites, academic archives, and digital heritage collections.
- Language model training — this data is used to train models that can generate text and perform translations in te reo Māori — a capability that has commercial value.
- No attribution, no consent, no benefit-sharing — the communities whose knowledge made these capabilities possible receive nothing.
The myth of "public data"
AI companies defend these practices by pointing to the "public" status of the data they collect. This argument fails on multiple grounds:
- Māori communities often publish cultural content online for community benefit, not for extraction by corporations.
- The concept of "public" in Western law does not map onto tikanga frameworks, which distinguish between knowledge that is freely shareable and knowledge that carries restrictions (tapu).
- Legal availability does not imply ethical permission — a principle recognised even within mainstream research ethics.
AI, power, and political economy
The AI economy is highly concentrated. A small number of US corporations — OpenAI (backed by Microsoft), Google, Meta, Amazon — dominate foundational AI development. These corporations operate under US law, answer to US shareholders, and are not subject to tikanga Māori, Treaty obligations, or New Zealand sovereignty claims over Māori cultural data.
This concentration of AI capability mirrors the concentration of colonial land ownership in the 19th century. The mechanisms differ, but the underlying dynamic — the capture of collective resources by private actors — is structurally similar.
Key concepts
- Raupatu matihiko Digital dispossession; extraction of Māori data without consent
- Large language model An AI system trained on large quantities of text data
- Training data The dataset used to build an AI model
- Tapu The sacred or restricted status of certain knowledge and cultural material
Pātai — Discussion questions
- Is the training of AI models on publicly available Māori language data a form of raupatu? Defend your answer.
- How does the concentration of AI development in US corporations create particular risks for Indigenous communities globally?
- What practical mechanisms could give Māori communities meaningful control over how their data is used in AI training?
- Is there a version of AI development that would be acceptable from a Māori data sovereignty perspective? What would it look like?