iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🛡️

Why Access Control is Essential for Generative AI & RAG: An Introduction to Best Practices Using Vector Databases

に公開

Starting with RAG and Thinking About Access Control in the Next Step

Introduction

Large Language Models (LLMs) like ChatGPT, when combined with a search augmentation technique called Retrieval-Augmented Generation (RAG), can instantly summarize internal documents and knowledge bases, creating an "Internal AI Google" that answers questions.
However, RAG hides a ticking bomb: "accidentally leaking confidential information." If the AI can read a PDF that it doesn't have access to, the LLM will generate text based on it without distinguishing between right and wrong.

In this article, focusing on access control, which is essential for implementing RAG, I will explain how to technically implement the principle: "What I cannot see, the AI cannot see." Specifically, using vector databases like Pinecone, Weaviate, and Qdrant as examples, I will introduce two typical implementation patterns (index partitioning and metadata filtering) and design tips. I'll proceed with failure examples and code snippets to make it easy for beginners to understand.

Who is this article for?

  • Engineers who want to deploy LLM / RAG in production
  • Those who are new to or have just started using Vector DBs
  • People worried about, "What if the AI starts leaking sensitive data?"

This information is based on research as of July 2025 and may differ from the actual contents in the future due to specification changes in each service.


🎯 Goals

  1. Fully understand why "What I cannot see, the AI cannot see" is the fundamental premise.
  2. Understand two styles of implementing access control in Vector DBs (Pinecone, Weaviate, Qdrant, etc.).
  3. Take home sample code and design tips to start an internal PoC tomorrow.

0. RAG: A Very Quick Review

RAG (Retrieval-Augmented Generation) follows a two-step process:

1️⃣ Retrieval: Fetching relevant documents from a Vector DB
2️⃣ Generation: Passing the fetched documents to an LLM to generate a response

If unauthorized documents are mixed in during the retrieval phase, there is a risk that the LLM will cite them and cause a leak.


1. Why is "What I cannot see, the AI cannot see" Mandatory?

1-1. Examples of Failure

Employee Question If access control is weak...
Accounting A "Summarize next term's budget plan" ✅ Normal
Sales B "Summarize next term's budget plan" ❌ Confidential budget is leaked
Intern C "What is the development status of New Product X?" ❌ Unreleased features are disclosed

1-2. Actual Damage


2. Two Main Approaches to Protecting Permissions in Vector DBs

Approach Overview Pros Cons Best Use Case
🅰️ Index Partitioning Separate index or Namespace per tenant/user • Zero data mixing
• Fast for small scales
• Number of indexes can explode B2B SaaS etc. with few customers
🅱️ Single Index + Metadata Filtering Store all data in one and separate using filter • Zero data duplication
• Scales to large number of users
• Increased filtering overhead Consumer apps etc. with many users

🅰️ Index Partitioning

🅱️ Single Index + Metadata Filtering

Example for Qdrant:

# Registration
qdrant.upsert(
  collection_name="docs",
  points=[
    PointStruct(id=1, vector=v1, payload={"dept": "finance"}),
    PointStruct(id=2, vector=v2, payload={"dept": "all"})
  ]
)

# Search
hits = qdrant.search(
  collection_name="docs",
  query_vector=qvec,
  query_filter=Filter(
    must=[FieldCondition(key="dept", match=MatchValue(value="finance"))]
  )
)

3. Design Q&A (For Beginners)

💬 FAQ ✅ Best Practices
Is per-user metadata heavy? First, aggregate by department/role, and refine if more granular requirements emerge.
What is "Default Deny"? Zero Trust. The AI initially sees nothing and can only view documents with allowed tags like dept=all.
Benchmarking method? Inject dummy data up to the expected maximum volume → Perform QPS tests with permission-restricted queries → Check P95 latency.
Deleted documents keep appearing... For the Namespace method, drop the entire thing; for Single Index, use delete(filter=...) + exclude from snapshots.

4. Summary

  • RAG is “Retrieval + Generation”. Access control at the retrieval stage is the lifeline.
  • Use index partitioning or metadata filtering depending on the situation.
  • Let's guarantee “What I cannot see, the AI cannot see” through code.

Reference Links

Discussion