iTranslated by AI
Why Access Control is Essential for Generative AI & RAG: An Introduction to Best Practices Using Vector Databases
Starting with RAG and Thinking About Access Control in the Next Step
Introduction
Large Language Models (LLMs) like ChatGPT, when combined with a search augmentation technique called Retrieval-Augmented Generation (RAG), can instantly summarize internal documents and knowledge bases, creating an "Internal AI Google" that answers questions.
However, RAG hides a ticking bomb: "accidentally leaking confidential information." If the AI can read a PDF that it doesn't have access to, the LLM will generate text based on it without distinguishing between right and wrong.
In this article, focusing on access control, which is essential for implementing RAG, I will explain how to technically implement the principle: "What I cannot see, the AI cannot see." Specifically, using vector databases like Pinecone, Weaviate, and Qdrant as examples, I will introduce two typical implementation patterns (index partitioning and metadata filtering) and design tips. I'll proceed with failure examples and code snippets to make it easy for beginners to understand.
Who is this article for?
- Engineers who want to deploy LLM / RAG in production
- Those who are new to or have just started using Vector DBs
- People worried about, "What if the AI starts leaking sensitive data?"
This information is based on research as of July 2025 and may differ from the actual contents in the future due to specification changes in each service.
🎯 Goals
- Fully understand why "What I cannot see, the AI cannot see" is the fundamental premise.
- Understand two styles of implementing access control in Vector DBs (Pinecone, Weaviate, Qdrant, etc.).
- Take home sample code and design tips to start an internal PoC tomorrow.
0. RAG: A Very Quick Review
RAG (Retrieval-Augmented Generation) follows a two-step process:
1️⃣ Retrieval: Fetching relevant documents from a Vector DB
2️⃣ Generation: Passing the fetched documents to an LLM to generate a response
If unauthorized documents are mixed in during the retrieval phase, there is a risk that the LLM will cite them and cause a leak.
1. Why is "What I cannot see, the AI cannot see" Mandatory?
1-1. Examples of Failure
| Employee | Question | If access control is weak... |
|---|---|---|
| Accounting A | "Summarize next term's budget plan" | ✅ Normal |
| Sales B | "Summarize next term's budget plan" | ❌ Confidential budget is leaked |
| Intern C | "What is the development status of New Product X?" | ❌ Unreleased features are disclosed |
1-2. Actual Damage
- Information Leakage — Violation of NDAs or laws.
- Loss of Trust — Employees misunderstanding that "the AI will reveal everything if you ask."
- Regulatory Compliance — OWASP Top-10 for LLM Applications LLM-02: Sensitive Information Disclosure is a major risk.
2. Two Main Approaches to Protecting Permissions in Vector DBs
| Approach | Overview | Pros | Cons | Best Use Case |
|---|---|---|---|---|
| 🅰️ Index Partitioning | Separate index or Namespace per tenant/user | • Zero data mixing • Fast for small scales |
• Number of indexes can explode | B2B SaaS etc. with few customers |
| 🅱️ Single Index + Metadata Filtering | Store all data in one and separate using filter
|
• Zero data duplication • Scales to large number of users |
• Increased filtering overhead | Consumer apps etc. with many users |
🅰️ Index Partitioning
-
Pinecone: Following Implementing Multitenancy with Namespaces, isolation is achieved using
index.upsert(..., namespace="tenantA"). - Weaviate: As introduced in Multi-Tenancy Vector Search with millions of tenants, native multi-tenancy has been officially implemented since v1.20, supporting 50,000+ tenants per node. Refer to the Weaviate 1.20 Release for details.
🅱️ Single Index + Metadata Filtering
Example for Qdrant:
# Registration
qdrant.upsert(
collection_name="docs",
points=[
PointStruct(id=1, vector=v1, payload={"dept": "finance"}),
PointStruct(id=2, vector=v2, payload={"dept": "all"})
]
)
# Search
hits = qdrant.search(
collection_name="docs",
query_vector=qvec,
query_filter=Filter(
must=[FieldCondition(key="dept", match=MatchValue(value="finance"))]
)
)
- Indexing the payload column improves performance → Payload Indexing – Qdrant Docs
- A Complete Guide to Filtering in Vector Search provides a detailed explanation of optimizing filtered searches.
3. Design Q&A (For Beginners)
| 💬 FAQ | ✅ Best Practices |
|---|---|
| Is per-user metadata heavy? | First, aggregate by department/role, and refine if more granular requirements emerge. |
| What is "Default Deny"? | Zero Trust. The AI initially sees nothing and can only view documents with allowed tags like dept=all. |
| Benchmarking method? | Inject dummy data up to the expected maximum volume → Perform QPS tests with permission-restricted queries → Check P95 latency. |
| Deleted documents keep appearing... | For the Namespace method, drop the entire thing; for Single Index, use delete(filter=...) + exclude from snapshots. |
4. Summary
- RAG is “Retrieval + Generation”. Access control at the retrieval stage is the lifeline.
- Use index partitioning or metadata filtering depending on the situation.
- Let's guarantee “What I cannot see, the AI cannot see” through code.
Reference Links
-
Pinecone Docs — Implement Multitenancy using Namespaces
https://docs.pinecone.io/guides/index-data/implement-multitenancy -
Weaviate Blog — Multi-Tenancy Vector Search with millions of tenants
https://weaviate.io/blog/multi-tenancy-vector-search -
Qdrant Docs — Payload / Filtering
https://qdrant.tech/documentation/concepts/payload/ -
OWASP Top-10 for LLM Applications — LLM-02 Sensitive Information Disclosure
https://genai.owasp.org/llmrisk/llm022025-sensitive-information-disclosure/ -
Auth0 Blog — Build Trustworthy AI: Access Control for RAG
https://auth0.com/blog/rag-and-access-control-where-do-you-start/
Discussion