
Understanding KIP Protocol, the AI-focused decentralized Web3 infrastructure protocol (2)
TechFlow Selected TechFlow Selected

Understanding KIP Protocol, the AI-focused decentralized Web3 infrastructure protocol (2)
RAG is an innovative technique used in generative AI that involves three key value creators in AI: app developers, model makers, and data owners.
Author: KIP Protocol
KIP Protocol is the world's first protocol supporting decentralized RAG, effectively providing a foundational framework to decentralize all AI—this is the first step toward breaking free from monopolies held by AI giants.
1) Overview of RAG
AI models are trained by being fed vast amounts of data. They learn patterns from this data and adjust internal weights to make predictions or decisions based on new inputs. The model then answers user queries using this acquired "raw" knowledge.
However, this training process requires exposing the entire dataset to the model, which essentially means the data gets "absorbed" into the model. If the data contains confidential or copyrighted information, the model might reproduce such content verbatim at some point in the future.
So what if you don’t want your data exposed to such risks?
This is where RAG (Retrieval-Augmented Generation) comes in.
RAG is a sophisticated technique that enables AI models to generate answers to questions they originally wouldn't know by retrieving relevant data and information from external knowledge bases and databases.
It works like an intelligent assistant who doesn’t inherently know the answer to your question but can expertly locate it from external sources.

1. User Query Input:
First, the user asks a question to a chatbot running a RAG system.
For example, “What are the symptoms of COVID-19?”
2. Retrieval from External Databases:
The model searches connected external knowledge bases and databases—such as medical journals, health websites, and clinical databases—to initiate the retrieval phase, pulling only data relevant to the user’s query.
3. Data Processing, Filtering, and Answer Generation:
The retrieved data is processed and filtered to extract key information and eliminate irrelevant content. The AI model integrates this data with the context of the user’s query to generate a response.
In the case of a query about COVID-19 symptoms, RAG might generate a response listing common symptoms such as fever, cough, and shortness of breath—and could even include findings from recent medical research papers not present during the model’s original training, resulting in a higher-quality answer.
4. Delivering the Response:
The generated response is delivered back to the user through the chatbot interface.
Thus, RAG allows AI queries to be answered using external data without requiring the model to first "absorb" that data through the training process.
RAG technology is maturing rapidly. In our research paper, we demonstrate that the quality of answers provided by RAG can surpass those from well-trained models. https://arxiv.org/pdf/2311.05903.pdf
2) The Importance of RAG
RAG will become increasingly important because:
-
Training models is highly technical and specialized, often very costly—skills and resources required are not accessible to everyone.
-
Owners of many types of data (confidential, proprietary, etc.) may be unwilling to expose their data to models they do not fully own or control.
You may also notice a critical issue:
Under the RAG framework, app developers, model creators, and data owners can collaborate, each contributing to answering user queries.
Therefore, in a fair ecosystem, each party should receive equitable compensation for their contribution.

Yet currently, there is no simple way to achieve this without compromising the independence or ownership rights of any party. (Incidentally, this exact challenge is what inspired us to start building KIP over a year ago.)
This is the “revenue distribution problem.”
3) The “Revenue Problem” of RAG and Centralized AI
Let’s imagine a scenario where one entity owns all three elements of AI value creation: there would be no need to redistribute payments collected from users across parties, as accounting could be done internally.
Conversely, if we refuse to accept a single entity controlling all three aspects of AI value creation (app developers, model creators, and data owners), we must solve how revenue is distributed among different contributors in the AI ecosystem.
Without solving the “revenue problem,” app developers, model creators, and data owners cannot maintain their independence and freedom to transact.
Yet monopolization in the AI industry has already begun.
Here’s our view on OpenAI’s monopoly:
-
OpenAI clearly possesses some of the most powerful models—closed-source ones like GPT-4—trained on knowledge and content we’ve collectively published online over years. These models power their apps (like ChatGPT) and user-created GPTs.
-
Through copyright protection policies (e.g., promising to cover legal fees for anyone found uploading copyrighted material on their platform), they encourage users to freely upload data to their closed platform without fear of legal consequences.
-
Given that OpenAI is a centralized, closed-source web2 platform, we must ask ourselves: does data uploaded by users (whether via ChatGPT or GPT apps) still belong to the uploader?
-
Therefore, given their existing models, unrestricted “scraping” of all user data, copyright safeguards, and massive financial reserves, OpenAI appears to be the greediest “data vacuum” in history—continuously sucking in data and resources to feed their models.
Combining all these factors—including the $7 billion they’ve raised for hardware—it becomes clear that unless countermeasures are taken, complete AI industry domination by one or a few companies is inevitable.
Based on the reasons we’ve shared, we firmly believe that AI monopolies are detrimental to humanity—and we are actively working on solutions to break free from them.
4) The Significance of Decentralized RAG
RAG involves all three core components of AI value creation: app developers, model creators, and data owners.
Therefore, by establishing a decentralized RAG framework, KIP is essentially creating a decentralized governance structure for AI value creation—one that provides a level playing field for all contributors and breaks the grip of AI monopolies.
We enable AI to function efficiently as the collective effort of millions of small and large creators, without any single corporation controlling every core component.
To achieve this, we will first address the three foundational challenges blocking RAG decentralization:
1. Ownership:
Ensure that app developers, model creators, and data owners can easily and securely publish content to web3 by creating Web3 “transaction entities” in the form of ERC-3525 semi-fungible tokens (SFTs), enabling them to prove digital ownership on-chain.
2. On-chain/Off-chain Connectivity:
Ensure seamless interaction between off-chain and on-chain environments, providing an open space where app developers, model creators, and data owners can freely connect with one another.
3. Monetization:
Provide a universal framework to record and account for each AI contributor’s input, enabling automated revenue sharing and withdrawals.

By enabling decentralized RAG (d/RAG), KIP is drawing up the first crucial blueprint for escaping AI monopolies.
Unlocking digital ownership for every AI value creator and enabling independent yet interconnected transactions stands in direct opposition to the goals of big tech companies in web2.
The KIP Protocol will provide AI creators with the tools necessary to break free from AI monopolies.
To learn more, join our official Chinese community:
Join TechFlow official community to stay tuned
Telegram:https://t.me/TechFlowDaily
X (Twitter):https://x.com/TechFlowPost
X (Twitter) EN:https://x.com/BlockFlow_News














