iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🔍

Azure AI Search: A Comprehensive Guide to Data Storage and Filtering

に公開

About This Article

Azure AI Search (formerly Azure Cognitive Search) is a service that can cause significant headaches during production design if you dismiss it as just a "database that can do vector search." Especially if you leave the following three points ambiguous, you are likely to face rework or operational accidents.

  • Where is the data stored? Is a copy of the source data created?
  • How are data additions, updates, and deletions handled? What is automated, and what must be done manually?
  • How flexibly can the search scope be narrowed? How do you handle access control?

This article outlines the big picture of Azure AI Search and its relationship with other Azure services while clarifying these three points based on official documentation.


1. What is Azure AI Search? (Positioning and Key Components)

Azure AI Search is a fully managed cloud search service. It is designed to handle a wide range of scenarios centered on search within a single service, from classic full-text search to modern RAG (Retrieval-Augmented Generation) and agentic retrieval[^1].

Key Components (Remember just these)

Name Role
Search service The unit of contract. Created by selecting an SKU (Tier) and region
Index The entity of the search target. A collection of JSON documents + schema, stored in internal service storage
Document A single record within an index. Internally treated as JSON
Data source Definition of external data connections (Blob, Cosmos DB, SharePoint, etc.) referenced by Indexers
Indexer A crawler (pull model) that pulls data from the Data source into the Index. It drives skillsets
Skillset A collection of AI processing steps during indexing (chunking, embedding generation, OCR, translation, etc.)
Knowledge source / Knowledge base Management objects for agentic retrieval. Knowledge source is an abstraction of individual data sources like Blob or SharePoint, and Knowledge base is a container that aggregates multiple sources. It serves as the unit for permission control and cross-source search (introduced/redesigned in 2025-11-01-preview)

Two Operational Phases

Azure AI Search operates simply through two phases: Indexing (write) and Querying (read)[^1].

  • Indexing: JSON documents are ingested into an index. Text is stored in inverted indexes, and vectors are stored in vector indexes internally.
  • Querying: Clients send search requests and receive results.

2. Relationship Map with Other Azure Services

Azure AI Search is not a self-contained service; it is designed to be combined with other Azure services[^2][^3]. Here is a summary of the main integrations.

Data Sources (What to "Search")

Primary data sources that can be pulled via Indexers:

  • Azure Blob Storage / Azure Data Lake Storage Gen2
  • Azure Cosmos DB (NoSQL / MongoDB preview / Gremlin preview; Cassandra is not supported)
  • Microsoft OneLake
  • Microsoft SharePoint in Microsoft 365 (preview)
  • Azure SQL Database / Azure MySQL (MySQL is preview)
  • Logic Apps connectors (preview): Fetching data via a wide range of connectors other than the above

AI Capabilities (How to make it "Smart")

  • Azure OpenAI: Used for embedding models to generate vectors, and for LLMs to handle query decomposition in agentic retrieval
  • Azure AI Foundry: Utilizes Azure AI Search as a knowledge layer for Foundry IQ[^4]
  • Built-in Skills: OCR, translation, entity extraction, etc. Integrated into skillsets
  • Custom Skills: Allows you to plug in your own code, such as Azure Functions, as a skill

Security (How to "Protect")

  • Microsoft Entra ID: Authentication (keyless operation without API keys), ACL metadata inheritance
  • Azure Private Link / Private Endpoint: Private access for both inbound and outbound traffic
  • Azure Key Vault: Key management for CMK (Customer-Managed Keys)

In a nutshell, Azure AI Search is structured to "pull from external sources → store in internal indexes → return via queries," supported on both sides by AI capabilities (embeddings/LLM) and security features (Entra/Private Link/Key Vault).


3. [Caution ①] Where is the Data Stored?

This is the point most prone to confusion. In conclusion, it is as follows:

What is stored inside the service?

According to official Microsoft documentation, Azure AI Search maintains the following in its internal storage and encrypts everything[^5]:

  • Indexes
  • Synonym maps
  • Definitions of indexers, data sources, and skillsets
  • Data on temporary disks

In other words, copies of inverted indexes, vector indexes, and the actual values of searchable fields exist on the service side. Even after being ingested into Azure AI Search, your original data sources (such as PDFs in Blob storage or documents in Cosmos DB) naturally remain in their original locations. There is no automated relationship where deleting one automatically deletes the other.

Encryption is automatic; however, CMK affects performance

Layer Encryption Notes
At rest AES-256, Microsoft-managed, FIPS 140-2 compliant Automatic for all Tiers and regions. No configuration required[^5]
In transit TLS 1.2 or higher Includes communication between internal services
CMK (optional) Double encryption using keys managed in Azure Key Vault Applies to indexes and synonym maps
Temp disk CMK Supported only for services created on or after 2021-05-13

Storage Limits per Tier (Excerpt)

Tier Overview Characteristics
Free Shared with other tenants, 50 MB No SLA; may be deleted if unused for a long time
Basic Dedicated resources For small-scale production; can meet SLA with 3 replicas
Standard S1 / S2 / S3 Dedicated machines, tiered storage/performance General production
S3 HD Multi-tenant type for many small indexes For multi-index scenarios
Storage Optimized L1 / L2 Lower TB cost For large indexes with infrequent updates; higher query latency[^7]

Data Residency

Data is stored in the region chosen at creation. The search endpoint, metadata, and index content all remain in that region and never leave it[^5].


4. [Caution ②] How to Add, Update, and Delete Data

"So, if I just put a PDF in a Blob, it will be reflected automatically, right?" is half-correct and half-dangerous. To be precise, you must explicitly choose from the following two models[^3].

Two Ingestion Models

Model Mechanism Suitable Cases
Push Model Send upload, merge, mergeOrUpload, or delete actions collectively from the client via SDK/REST (IndexDocumentsBatch)[^9] When you need freshness of under 5 minutes, want immediate reflection via event-driven triggers, need to fully control synchronization logic from the app side, or are using non-Indexer supported data sources
Pull Model (Indexer) The Indexer on the Azure AI Search side crawls and ingests from the data source[^3] Cases where supported data sources exist and periodic scheduled updates are sufficient for freshness

In the Push model, you can mix upload, merge, and delete actions in a single batch from a search client (such as the .NET SearchClient). mergeOrUpload behaves like an UPSERT: "update partially if it exists, otherwise create new."

Indexer Schedule

  • On-demand or scheduled execution.
  • The minimum schedule interval is 5 minutes. If you need fresher data than that, the Push model is mandatory[^3].
  • 1 Indexer = 1 Data source → 1 Index (other combinations are not supported).
  • However, "writing to the same index from multiple indexers" is possible. This is the configuration to use when building a single index from multiple sources.

Change Detection and Deletion Detection (The Pitfall)

When data changes on the data source side, how the Indexer catches up varies by data source.

The recommended countermeasure is a Soft delete strategy.

  1. Enable "Native blob soft delete" on the Blob side, and set "Track deletions / Native blob soft delete" on the Indexer data source side.
  2. On the application side, for deletions, do not perform a direct physical deletion; first set a soft-delete flag.
  3. The Indexer detects the soft delete and removes the document from the search index, after which you can perform the physical deletion.

Integrating AI Processing into the Pipeline with Skillsets

By attaching a skillset to an Indexer, you can run AI processing during ingestion[^2]:

  • Integrated vectorization: Split text into chunks and call an embedding model (like Azure OpenAI) to generate vectors automatically.
  • OCR: Extract text from images within Blobs.
  • Language detection, translation, and entity extraction.
  • Custom Skill: Call Azure Functions, etc.

The ability to "search PDFs on Azure Blob storage as-is" is made possible by this combination of skillset and indexer.


5. [Caution ③] How Flexibly Can You Narrow Down the Search Range?

Search flexibility is the most critical point that determines user experience. Azure AI Search offers several levels of filtering, but there are several items you must decide during the design phase, or you will run into a dead end later.

5.1 The Starting Point: Field Attributes

For an index, you declare "which fields to use and how" via field attributes[^11].

Attribute Meaning
searchable Subject to full-text search (tokenized)
filterable Can be used for narrowing down with $filter
sortable Can be used as a sort key
facetable Can be used for faceted navigation
retrievable Can be included in search responses
key Document primary key (only one field allowed)

Additionally, if you enable any of filterable, sortable, or facetable for an Edm.String field, the value of a single field must be 32 KB or less. It cannot be used for long text.

5.2 Filters (OData $filter)

You can perform precise filtering using OData syntax. It supports value equality comparison, ranges, and / or / not, string functions, collection functions, and geospatial functions[^12].

Notes on size[^13]:

  • GET requests: The URL cannot exceed 8 KB. This is sufficient for most use cases, but may be insufficient for security trimming where filters can become massive.
  • POST requests: Allows up to approximately 16 MB. Send large filters via POST.

5.3 Faceted Navigation

This feature is for building UIs like sidebars on e-commerce sites (e.g., "filter by category" or "filter by brand"). If you pass the facets parameter to a field with facetable: true, it returns the categories and counts contained in the current search results[^14].

Implementation notes:

  • Fields used for facets should also have filterable enabled (for use in filter queries after facet selection).
  • Facets are created dynamically from the "current query results." If you want a static list of all possible facets, retrieve them via a separate query.
  • Preview APIs also support hierarchical facets, facet filters, and facet aggregation.

5.4 Scoring and Ranking (Weighting rather than Narrowing)

If you want to "put important items at the top" rather than "narrowing down," you can use the following features[^2]:

  • Scoring profiles: Prioritize newer documents, etc.
  • Synonym maps: Make "cars" hit "automobiles" as well.
  • Semantic ranker: Use machine learning to rank items that are semantically closer.
  • Highlighting / Autocomplete / Suggestions.

5.5 Access Control (Document Level)

This is particularly important for RAG. To change the documents visible to each user, you must take one of the following approaches:

Pattern A: Native ACL Integration (As of 2025, supported for SharePoint / Azure Storage)

In this pattern, the Indexer ingests ACL/RBAC metadata from the original data store into the index and automatically trims results during querying using user tokens[^15]. Data in Azure Storage can inherit Microsoft Entra ID permission metadata.

Pattern B: Security Filters (Security Trimming)

For data sources that do not support native ACLs, you must design fields yourself[^16].

Overview of the procedure:

  1. Prepare a filterable field for each document, such as an "array of group_ids permitted to view."
  2. Set that value during ingestion.
  3. Add a filter to your query during search, such as search.in(group_ids, 'g1,g2,g3').
// Example: A filter that only returns documents tied to specific groups
{
  "search": "*",
  "filter": "group_ids/any(g: search.in(g, 'group_id1,group_id2'))"
}

Pattern C: Knowledge-source Permissions in Agentic Retrieval

Agentic retrieval in the 2025-11 preview supports access control at the Knowledge source level (centrally managed by a parent Knowledge base object that aggregates multiple sources) and inheritance of SharePoint permissions and Entra ID metadata[^4][^15]. This is likely to become the preferred approach in the future.


6. Vector Search, Hybrid Search, and Agentic Retrieval (Simplified)

Assuming you are already familiar with the concepts of vector search, I will summarize only how it is handled within Azure AI Search.

  • Vector Field Definition: Create a field of type Collection(Edm.Single) and associate it with a vectorSearchConfiguration or vector profile[^11].
  • Integrated Vectorization: You can automatically vectorize text during indexing using an embedding model (such as Azure OpenAI). You can also use a vectorizer during querying to send natural language directly[^2].
  • Hybrid Search: Integrates results from BM25 keyword search and vector similarity search. This is the standard method for balancing precision and recall.
  • Semantic Ranker: Re-ranks hybrid search results using a machine learning model. As of November 2025, it is available in some regions even for the Free Tier[^8].
  • Agentic Retrieval (2025-11-01-preview): The LLM decomposes information needs from chat history, executes multiple sub-queries in parallel, and returns a structured response[^4]. It performs search and access control at the Knowledge base object level, which aggregates multiple Knowledge sources.

For deeper learning, please refer to the following:


7. Practical Example: Building an Internal Document RAG System

Combining the design points covered so far, I will demonstrate how this applies to an actual RAG system using an "Internal Document RAG" scenario. I will focus on the design of the Azure AI Search side, keeping references to Azure OpenAI calls and the UI layer to a minimum.

We assume the following "Internal Document Search Assistant" scenario:

  • Company Scale: Mid-sized company with 3,000 employees and about 10 departments (HR / Legal / Engineering / Sales, etc.).
  • Data: Tens of thousands of internal policies and technical documents (PDF / Word) stored in Blob Storage.
  • Access Control: Determined by Microsoft Entra ID security groups (e.g., grp-hr, grp-eng, grp-all-employees). HR policies are for HR + all employees, development guidelines are for the engineering department only, etc.
  • User Experience: When an employee asks a natural language question like "What is the procedure for telework application?" to a Teams chatbot, a RAG-based answer is returned using only documents the user is authorized to view.

Below, we deal only with the design of the Azure AI Search side based on these premises.

7.1 Overall Architecture

Component Role
Azure Blob Storage Stores originals like PDFs / Words. Pre-assign the "array of viewable Entra ID group IDs" and "owning department" to the metadata of each Blob
Azure AI Search (Indexer + Skillset) Pulls Blobs, performs chunk splitting, generates embeddings via Integrated vectorization, and injects ACL metadata into the index
Azure AI Search (Index) Stores body chunks, vectors, metadata, and entra_groups
Azure OpenAI Uses text-embedding-3-large for embeddings and gpt-4o for answer generation (details omitted)
Application Layer Retrieves the list of group IDs for the user from Entra ID and attaches a filter to the query (details omitted)

Daily Usage Flow

  1. Document Administrator (e.g., Management Department): Uploads original PDFs to Blob Storage and attaches viewable groups and owning department to the custom Blob metadata, such as entra_groups=grp-hr,grp-all-employees and department=HR.
  2. Indexer (Automatic execution every 15 minutes): Detects new/updated Blobs → splits into chunks → generates embeddings → injects into the index.
  3. End User (General Employee): Asks the internal Teams bot a question in Japanese, such as "Who do I submit my telework application to?".
  4. Application Layer: Retrieves the list of the user's group memberships (e.g., grp-hr, grp-all-employees) from Entra ID → sends "Question + User's group list" to Azure AI Search.
  5. Azure AI Search: Performs security trimming using the user's groups, then returns the top 8 chunks via hybrid search (BM25 + vector) and semantic ranker.
  6. Application Layer: Passes the top chunks to gpt-4o as context and returns a natural language answer to the user.

In this setup, Azure AI Search is directly responsible only for step 5. The separation of responsibility where "who belongs to which group" and "how to incorporate the retrieved chunks into the prompt" are the jobs of the application layer is key to this scenario.

Other possible use cases (when reusing the same index for different UIs):

  • An employee in the management department searches for documents from a sidebar filter in a web portal: "Department = Legal AND Updated Date within last 1 year" (Utilizing department facets and updated_at sort/filter).
  • An auditor browses a list of "Recently updated HR policies" using department eq 'HR' + updated_at desc.
  • A mobile FAQ bot fixes the department filter to search only a subset for a specific department.

7.2 Index Schema

First, let's organize the information we want to hold in the index for this scenario.

  • Chunk body + vectors (Main actors of RAG)
  • Key to identify the original (To add "source links" to answers)
  • Owning department (e.g., HR / Legal / Engineering) — Used for filtering/sidebar narrowing in the UI
  • Updated date (Used for excluding old policies and sorting)
  • Array of viewable Entra ID groups (For security trimming)
  • Soft delete flag (For consistency with the logical deletion operations in Chapter 4)

Based on this, here is an example schema. Comments provide the intent for each key.

{
  "name": "internal-docs",
  "fields": [
    // --- Identification ---
    { "name": "id",        "type": "Edm.String", "key": true, "retrievable": true },
    // Key linking the original Blob (parent) and the chunk (child). Automatically set by indexProjections
    { "name": "parent_id", "type": "Edm.String", "filterable": true, "retrievable": true },

    // --- Display / Search Targets ---
    { "name": "title", "type": "Edm.String", "searchable": true, "retrievable": true },
    // Text body for 1 chunk. Target for BM25 full-text search
    { "name": "chunk", "type": "Edm.String", "searchable": true, "retrievable": true },
    // For vector search. Matches the 3072 dimensions of text-embedding-3-large
    {
      "name": "chunk_vector",
      "type": "Collection(Edm.Single)",
      "searchable": true,
      "retrievable": false,              // No need to return. Prevents response bloating
      "dimensions": 3072,
      "vectorSearchProfile": "hnsw-profile"
    },

    // --- Metadata for Filtering / Sorting ---
    // For "I want to see only HR policies" or "Show document counts per department in the sidebar"
    { "name": "department",  "type": "Edm.String",           "filterable": true, "facetable": true, "retrievable": true },
    { "name": "updated_at",  "type": "Edm.DateTimeOffset",   "filterable": true, "sortable": true,  "retrievable": true },
    { "name": "source_path", "type": "Edm.String",           "retrievable": true },

    // --- Access Control / Deletion Management ---
    // For security trimming. Array of Entra ID group IDs permitted to view
    { "name": "entra_groups", "type": "Collection(Edm.String)", "filterable": true, "retrievable": false },
    // Logical deletion flag. Always include `is_deleted eq false` on the query side
    { "name": "is_deleted",   "type": "Edm.Boolean",            "filterable": true }
  ],

  // Vector search configuration block.
  // Declare the ANN algorithm (HNSW) in algorithms,
  // name it in profiles, and reference it from the chunk_vector's vectorSearchProfile
  "vectorSearch": {
    "algorithms": [{ "name": "hnsw-algo",    "kind": "hnsw" }],
    "profiles":   [{ "name": "hnsw-profile", "algorithm": "hnsw-algo" }]
  },

  // Configuration for semantic ranker.
  // Instruct here which fields to weigh as "title / body"
  "semantic": {
    "configurations": [{
      "name": "default-semantic",
      "prioritizedFields": {
        "titleField":    { "fieldName": "title" },
        "contentFields": [{ "fieldName": "chunk" }]
      }
    }]
  }
}

Why this field configuration?

Field Role / Premise
department Required by the 7.1 use case to narrow down by "HR policies only" or "Legal documents only". Since it's used for faceted navigation (sidebar) in the UI, set both filterable and facetable
entra_groups For security trimming (Pattern B in 5.5). The Indexer pulls values pre-assigned to the Blob's custom metadata
is_deleted For the soft delete operation mentioned in Chapter 4. Use query-side filtering to negate "ghosts" between original deletion and Indexer reflection
updated_at Used for "I only want to see the latest policy" or "Sort by updated date descending". Both filterable + sortable enabled
parent_id Required since 1 Blob is split into multiple chunks; provides a unique key to point back to the original Blob for linking in answers

How to read the vectorSearch / semantic blocks

  • vectorSearch.algorithms: Defines the algorithm for vector Approximate Nearest Neighbor search (HNSW in this example). A two-tier structure where you give names in profiles and reference them from the vectorSearchProfile of each vector field.
  • semantic.configurations: Instructions for the semantic ranker. Called at query time with queryType: "semantic" + semanticConfiguration: "default-semantic", it re-ranks using fields specified in prioritizedFields as titles/bodies.

Design considerations

  • chunk_vector should generally be retrievable: false. Including arrays with thousands of dimensions in the response will bloat the payload.
  • entra_groups should also be retrievable: false. Do not return authorization metadata to users.
  • Changing the embedding model (and thus the dimensions) requires an index rebuild. Fixing it to 3-large 3072 dimensions from the start helps avoid total re-ingestion later.

7.3 Indexer + Skillset (Integrated vectorization)

The Skillset constitutes a single pipeline to "chunk splitting → embedding generation via Azure OpenAI → index ingestion per chunk". Comments describe what each step performs.

{
  "name": "internal-docs-skillset",
  "skills": [
    // Step 1: Split long PDF/Word documents every 2000 characters. 500-character overlap to prevent breaking meaning at chunk boundaries
    {
      "@odata.type": "#Microsoft.Skills.Text.SplitSkill",
      "context": "/document",
      "textSplitMode": "pages",
      "maximumPageLength": 2000,
      "pageOverlapLength": 500,
      "inputs":  [{ "name": "text",      "source": "/document/content" }],
      "outputs": [{ "name": "textItems", "targetName": "pages" }]
    },
    // Step 2: Send each split chunk to Azure OpenAI to generate embeddings (Integrated vectorization)
    {
      "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
      "context": "/document/pages/*",                 // Executed "for each chunk"
      "resourceUri":  "https://<your-aoai>.openai.azure.com",
      "deploymentId": "text-embedding-3-large",
      "modelName":    "text-embedding-3-large",
      "inputs":  [{ "name": "text",      "source": "/document/pages/*" }],
      "outputs": [{ "name": "embedding", "targetName": "vector" }]
    }
    // Other skills like OCR, entity extraction, or translation can be added here (omitted)
  ],

  // Step 3: Ingest per chunk. Rules to expand 1 Blob into multiple chunk documents
  "indexProjections": {
    "selectors": [{
      "targetIndexName":    "internal-docs",
      "parentKeyFieldName": "parent_id",            // Automatically set the original Blob key here
      "sourceContext":      "/document/pages/*",    // Generate 1 document per chunk
      "mappings": [
        { "sourceFieldName": "/document/pages/*",               "targetFieldName": "chunk" },
        { "sourceFieldName": "/document/pages/*/vector",        "targetFieldName": "chunk_vector" },
        { "sourceFieldName": "/document/metadata_storage_name", "targetFieldName": "title" },
        { "sourceFieldName": "/document/metadata_storage_path", "targetFieldName": "source_path" },
        // Inherit ACL info and department tags from Blob custom metadata
        { "sourceFieldName": "/document/entra_groups", "targetFieldName": "entra_groups" },
        { "sourceFieldName": "/document/department",   "targetFieldName": "department" }
      ]
    }]
  }
}

Summary of the 3-step Skillset structure

  • SplitSkill: Splits a single long document into chunks (most important step affecting RAG accuracy).
  • AzureOpenAIEmbeddingSkill: Vectorizes each chunk. Since Azure OpenAI is called during indexing, embedding API costs are incurred based on document volume.
  • indexProjections: Instructions to expand into "1 chunk = 1 index document" rather than "1 Blob = 1 index document". Maintains parent-child relationships with parentKeyFieldName.

The Indexer itself defines only the schedule and ingestion scope.

{
  "name": "internal-docs-indexer",
  "dataSourceName":  "internal-docs-blob",    // Data source definition sets Blob connection and soft-delete detection (omitted)
  "targetIndexName": "internal-docs",
  "skillsetName":    "internal-docs-skillset",
  "schedule": { "interval": "PT15M" },        // Incremental ingestion every 15 minutes. Minimum is 5 minutes
  "parameters": {
    "configuration": {
      "dataToExtract": "contentAndMetadata", // Extract body + custom metadata (entra_groups, etc.)
      "parsingMode":   "default"             // Automatically parse PDF/Office documents
    }
  }
}

Design points

  • Chunk sizes (2000 / 500) are only a starting point. Optimal values differ between documents with clear sectioning (like company policies) and flowing text (like meeting minutes). Be sure to run a loop evaluating top N results from the semantic ranker.
  • Embedding model selection directly impacts index rebuilds. Different dimensions mean no compatibility. Fixing to 3-large 3072 dimensions from the start helps avoid total re-ingestion due to future dimension changes.
  • Include entra_groups in Blob custom metadata for the Indexer to pull. If using Azure Storage native ACLs, consider switching to Pattern A (5.5).

7.4 Hybrid search query (with security trimming)

Imagine a scenario where a user asks a chatbot, "What are the application procedures for the telework policy?" The application layer has already retrieved from Entra ID that this user belongs to grp-hr and grp-all-employees, and sends the following request to POST /indexes/internal-docs/docs/search.

{
  // Query string for BM25 (full-text search)
  "search": "テレワーク規程の申請手順",

  // Vector search query. "kind": "text" means
  // "vectorize this text automatically using a vectorizer before searching"
  // (Possible if a vectorizer is linked to the vectorSearchProfile in 7.2)
  "vectorQueries": [
    {
      "kind":   "text",
      "text":   "テレワーク規程の申請手順",
      "fields": "chunk_vector",
      "k":      50   // Retrieve top 50 on the vector side, integrate with BM25 results using RRF
    }
  ],

  // OData filter: Exclude soft-deleted + only documents that belong to any of the user's groups
  "filter": "is_deleted eq false and entra_groups/any(g: search.in(g, 'grp-hr,grp-all-employees'))",

  // Enable semantic ranker (references the setting defined in 7.2)
  "queryType":             "semantic",
  "semanticConfiguration": "default-semantic",

  "select": "id,parent_id,title,chunk,source_path,updated_at", // Explicitly list return fields (do not include vector/ACL)
  "top":    8
}
  • Hybrid search (Keyword + Vector): Specifying both search and vectorQueries allows Azure AI Search to integrate them using RRF (Reciprocal Rank Fusion).
  • Security trimming: search.in(...) on entra_groups is key. Provide all groups the user belongs to.
  • Soft delete: Always exclude logically deleted items using is_deleted eq false.
  • Why use POST: GET is prone to hitting the 8 KB URL limit. For organizations with many groups, assume POST usage from the start.

The subsequent flow—passing the top N chunks from the search results as context to Azure OpenAI's gpt-4o to generate an answer—is standard RAG prompt design and is omitted here.

7.5 Pitfalls unique to this scenario

  1. Authorization change lag. Even if you add a user to an Entra ID group, entra_groups will not be updated until the Indexer runs next (if using the Blob metadata update approach). For documents containing sensitive information like company policies, use a two-pronged approach: shorten the schedule (5-15 mins) and fetch the "latest group membership" in the application layer for every query.
  2. Deletion reflection window. Changes are not reflected in search the moment the original file is deleted. The safest strategy is to combine the soft-delete operation in Chapter 4 with a is_deleted eq false filter on the query side.
  3. Integrated vectorization cost. Azure OpenAI is called every time a document is added or updated. If you don't calculate "monthly document count × average chunk count × embedding unit price" beforehand, you may be surprised by the bill even at the PoC stage. Avoid running periodic re-indexing on documents that are rarely updated.
  4. Meaning breakage at chunk boundaries. You may need to increase pageOverlapLength or design for semantic ranker to rescue chunks that are contextually close. Creating an evaluation query set before starting operations is highly recommended.
  5. Audit logs are insufficient with Azure AI Search alone. You must log "who searched for what and viewed what" separately in the application layer. If there are compliance requirements, design to record this per search call from the beginning.

8. Tier, Region, and Preview Feature Handling

General guidelines for Tier selection

Use Case Recommended
Personal tutorial / Validation Free (subject to 50 MB limit, potential for deletion if inactive for long periods)
Small-scale production (SLA required) Basic (3 replicas)
Typical production workloads S1–S3
Many small indices (Multi-tenant) S3 HD
Large indices with low update frequency L1 / L2 (higher latency) [^7]

Tier changes are now possible

Previously, changing the Tier after creation was impossible, requiring the index to be recreated. Since the February 2025 preview, upgrading/downgrading between Basic and Standard (S1/S2/S3) is possible via the portal and the Update Service (2025-02-01-preview) API [^8]. While this significantly reduces the pressure of initial Tier selection, it is still in preview, so check the latest GA status before applying it to production. Converting to/from the L series remains unsupported.

Feature availability in Free Tier (2025-11)

In November 2025, the semantic ranker and agentic retrieval became available in the Free Tier in selected regions (subject to query limits) [^8]. This lowers the validation cost, making it easier to start RAG proofs-of-concept.

Handling preview features

Below are representative examples currently in preview. Always verify the GA status before production deployment:

  • Logic Apps connectors (broad data source integration)
  • Knowledge base / Knowledge source (core of agentic retrieval)
  • SharePoint in Microsoft 365 indexer
  • Service upgrade / pricing tier change
  • 2025-11-01-preview REST API

9. Conclusion: Pre-deployment Checklist

Here is a summary of points you should not leave ambiguous during design:

  • Tier and region determined (verified storage limits, SLA, and need for CMK)
  • Location of original data organized (design costs and compliance assuming indices are copies stored in internal storage)
  • Push vs. Pull chosen (is sub-5-minute freshness required, and is the data source supported?)
  • Indexer schedule and deletion detection strategy (soft delete) finalized
  • Replica design and scheduling window accounted for performance impacts of foreground indexer execution
  • Field attributes (searchable / filterable / sortable / facetable / retrievable) pre-designed
  • Assumed POST-based requests if filters are likely to become large
  • Access control method (Native ACL / security trimming / agentic retrieval) finalized
  • Vectorization division of responsibility determined (pre-vectorization during Push vs. Integrated vectorization). For documents containing PII, verified the transmission path, permissions, and data retention policy for Azure OpenAI
  • Verified GA schedule and API version compatibility if using preview features

Azure AI Search has a broader scope than just a "search DB," and it is evolving significantly in 2025, especially around agentic retrieval. By keeping the points above in mind, you should be able to avoid the common pitfall of "it worked in the beginning, but failed in production design."


GitHubで編集を提案

Discussion