AWS Bedrock Knowledge Bases: Vector vs. Structured Type Comparison

に公開

Hi, I’m Dang, an AI engineer at Knowledgelabo, Inc. We provide a service called "Manageboard", which supports our clients in aggregating, analyzing, and managing scattered internal business data. Manageboard is set to enhance its AI capabilities in the future.

In this article, I’ll share some insights from our R&D efforts, focusing on how to use AWS Knowledge Bases effectively.

Introduction

AWS Bedrock is a managed service that allows seamless access to multiple large language models (LLMs), such as Claude and Nova.
Among its key features is the Knowledge Base, which enables LLMs to interact with internal documents and structured data.

There are 3 types of knowledge bases available in Bedrock:

  • Vector store-based (for document-like, unstructured data)
  • Structured data-based (SQL-based integration with Redshift)
  • Kendra GenAI Index-based (uses Kendra’s high-performance search)

Note: This verification was conducted in the Tokyo region, where Kendra GenAI Index is currently unavailable. Therefore, only the Vector Store and Structured Data types were tested.

Vector Store-Based Knowledge Base

This type of knowledge base uses document embeddings to retrieve semantically similar paragraphs through vector similarity search.

How It Works

  1. Input data (PDF, TXT, etc.) is split into chunks (chunking)
  2. Each chunk is converted into an embedding vector
  3. User queries are also embedded and used to retrieve the most similar chunks
  4. Retrieved chunks are injected into the prompt and sent to the LLM

Setup Steps

  1. In the Bedrock console, go to “Knowledge Bases” → “Create” → Choose “Knowledge Base with vector store”
  2. Specify service role and data source (e.g., S3)
  3. Configure the data source location (S3)
  4. Choose the embedding model (e.g., Titan Text Embeddings)
  5. Create and sync the knowledge base

Advantages

  • Ideal for FAQs, manuals, and internal documents
  • Easy to use with prompt injection
  • Chunking strategies can be adjusted to improve retrieval accuracy

Limitations

  • Accuracy may drop with images, tables, or diagrams
  • Low relevance scores can reduce answer quality
  • Paragraph design and splitting rules are crucial for better performance

Structured Data-Based Knowledge Base

In this type, the LLM translates natural language queries into SQL, then executes them on Redshift and uses the result to form an answer.

How It Works

  1. LLM generates SQL from natural language
  2. SQL is executed on Redshift
  3. Result is passed into the prompt and sent to the LLM

Setup Steps

  1. In the Bedrock console, go to “Knowledge Bases” → “Create” → Choose “Knowledge Base with structured data store”
  2. Configure the service role
  3. In the query engine settings, connect to your Redshift cluster and database (We verified with Redshift Serverless)
  4. Create the knowledge base
  5. On Redshift, grant permissions to the service role:
CREATE USER "IAMR:[role name]" WITH PASSWORD DISABLE;
GRANT USAGE ON SCHEMA public TO "IAMR:[role name]";
GRANT SELECT ON [table name] TO "IAMR:[role name]";
  1. Sync the knowledge base

Tip: Use CloudWatch to monitor SQL results and execution times.

Advantages

  • Strong for aggregation, filtering, and numerical analysis
  • Handles millions of rows without large token usage
  • Can reflect the latest data from Redshift

Limitations

  • Only Redshift is supported as a data source
  • Accuracy decreases with complex SQL schemas

Practical Comparison: Vector vs. Structured

Aspect Vector Store-Based Structured Data-Based
Data Type Unstructured documents Structured data in Redshift
Accuracy Depends on similarity score Depends on SQL generation accuracy
Latency Short Depends on SQL execution time
Setup Need paragraph design and splitting rules Need schema explanation
Best Use Case FAQs, manuals, internal docs Aggregation, analytics

Conclusion

AWS Bedrock’s knowledge base feature is powerful, but accuracy depends heavily on configuration and use case:

  • Use Vector Store for document-based RAG (retrieval-augmented generation)
  • Use Structured Data (Redshift) for numerical analysis and real-time aggregation

In the next article, I’ll dive into how to improve the accuracy of structured knowledge bases by providing schema descriptions.

References

Discussion