AWS Bedrock Knowledge Bases: Vector vs. Structured Type Comparison
Hi, I’m Dang, an AI engineer at Knowledgelabo, Inc. We provide a service called "Manageboard", which supports our clients in aggregating, analyzing, and managing scattered internal business data. Manageboard is set to enhance its AI capabilities in the future.
In this article, I’ll share some insights from our R&D efforts, focusing on how to use AWS Knowledge Bases effectively.
Introduction
AWS Bedrock is a managed service that allows seamless access to multiple large language models (LLMs), such as Claude and Nova.
Among its key features is the Knowledge Base, which enables LLMs to interact with internal documents and structured data.
There are 3 types of knowledge bases available in Bedrock:
- Vector store-based (for document-like, unstructured data)
- Structured data-based (SQL-based integration with Redshift)
- Kendra GenAI Index-based (uses Kendra’s high-performance search)
Note: This verification was conducted in the Tokyo region, where Kendra GenAI Index is currently unavailable. Therefore, only the Vector Store and Structured Data types were tested.
Vector Store-Based Knowledge Base
This type of knowledge base uses document embeddings to retrieve semantically similar paragraphs through vector similarity search.
How It Works
- Input data (PDF, TXT, etc.) is split into chunks (chunking)
- Each chunk is converted into an embedding vector
- User queries are also embedded and used to retrieve the most similar chunks
- Retrieved chunks are injected into the prompt and sent to the LLM
Setup Steps
- In the Bedrock console, go to “Knowledge Bases” → “Create” → Choose “Knowledge Base with vector store”
- Specify service role and data source (e.g., S3)
- Configure the data source location (S3)
- Choose the embedding model (e.g., Titan Text Embeddings)
- Create and sync the knowledge base
Advantages
- Ideal for FAQs, manuals, and internal documents
- Easy to use with prompt injection
- Chunking strategies can be adjusted to improve retrieval accuracy
Limitations
- Accuracy may drop with images, tables, or diagrams
- Low relevance scores can reduce answer quality
- Paragraph design and splitting rules are crucial for better performance
Structured Data-Based Knowledge Base
In this type, the LLM translates natural language queries into SQL, then executes them on Redshift and uses the result to form an answer.
How It Works
- LLM generates SQL from natural language
- SQL is executed on Redshift
- Result is passed into the prompt and sent to the LLM
Setup Steps
- In the Bedrock console, go to “Knowledge Bases” → “Create” → Choose “Knowledge Base with structured data store”
- Configure the service role
- In the query engine settings, connect to your Redshift cluster and database (We verified with Redshift Serverless)
- Create the knowledge base
- On Redshift, grant permissions to the service role:
CREATE USER "IAMR:[role name]" WITH PASSWORD DISABLE;
GRANT USAGE ON SCHEMA public TO "IAMR:[role name]";
GRANT SELECT ON [table name] TO "IAMR:[role name]";
- Sync the knowledge base
Tip: Use CloudWatch to monitor SQL results and execution times.
Advantages
- Strong for aggregation, filtering, and numerical analysis
- Handles millions of rows without large token usage
- Can reflect the latest data from Redshift
Limitations
- Only Redshift is supported as a data source
- Accuracy decreases with complex SQL schemas
Practical Comparison: Vector vs. Structured
Aspect | Vector Store-Based | Structured Data-Based |
---|---|---|
Data Type | Unstructured documents | Structured data in Redshift |
Accuracy | Depends on similarity score | Depends on SQL generation accuracy |
Latency | Short | Depends on SQL execution time |
Setup | Need paragraph design and splitting rules | Need schema explanation |
Best Use Case | FAQs, manuals, internal docs | Aggregation, analytics |
Conclusion
AWS Bedrock’s knowledge base feature is powerful, but accuracy depends heavily on configuration and use case:
- Use Vector Store for document-based RAG (retrieval-augmented generation)
- Use Structured Data (Redshift) for numerical analysis and real-time aggregation
In the next article, I’ll dive into how to improve the accuracy of structured knowledge bases by providing schema descriptions.
Discussion