iTranslated by AI
The Value of Multimodal Capabilities in Snowflake's COMPLETE Function
Introduction
In Snowflake's generative AI feature suite, Cortex AI, Cortex COMPLETE Multimodal entered public preview on April 14, 2025.
While it has been some time since LLMs became capable of reading images, Snowflake's general-purpose LLM function, the COMPLETE function, can now perform image analysis as well. This is not just about making multimodal LLMs easier and more secure to use; it means that new value can be added to existing data workloads. In this article, I will introduce the multimodal capabilities of this COMPLETE function.
What is Cortex COMPLETE Multimodal?
Cortex COMPLETE Multimodal is a feature that allows you to analyze images using the COMPLETE function. With this capability, the following processes can be achieved using only SQL or Python:
- Image comparison
- Image caption generation
- Image classification
- Entity extraction from images
- Answering questions based on data from graphs and charts
Previously, if image processing was required, you had to either use external APIs and services or implement complex processing by adding libraries like OpenCV to Python. However, by using this feature, you can perform image processing within the flow of data utilization, such as through SQL queries. This simplifies data pipelines and makes it possible to easily attach image processing capabilities to data workloads.
Available Models
Currently, the following models are available for the multimodal capabilities of the COMPLETE function:
Anthropic Claude Series
Claude 4 Opus: Anthropic's highest-performing flagship model.
- Model name to set: claude-4-opus
- Context window: 200,000 tokens
- Supported file types: .jpg .jpeg .png .webp .gif
- Supported file size: 3.75MB
- Maximum images per prompt: 20
Claude 4 Sonnet: A model with an excellent balance between high performance and cost.
- Model name to set: claude-4-sonnet
- Context window: 200,000 tokens
- Supported file types: .jpg .jpeg .png .webp .gif
- Supported file size: 3.75MB
- Maximum images per prompt: 20
Claude 3.7 Sonnet: An improved version of Sonnet.
- Model name to set: claude-3-7-sonnet
- Context window: 200,000 tokens
- Supported file types: .jpg .jpeg .png .webp .gif
- Supported file size: 3.75MB
- Maximum images per prompt: 20
Claude 3.5 Sonnet: Anthropic's multimodal model with advanced visual processing and language understanding capabilities, which has already been adopted by many companies.
- Model name to set: claude-3-5-sonnet
- Context window: 200,000 tokens
- Supported file types: .jpg .jpeg .png .webp .gif
- Supported file size: 3.75MB
- Maximum images per prompt: 20
Mistral AI Series
Pixtral Large: A model by Mistral AI that excels at visual reasoning tasks and supports multiple languages, including Japanese.
- Model name to set: pixtral-large
- Context window: 128,000 tokens
- Supported file types: .jpg .jpeg .png .webp .gif .bmp
- Supported file size: 10MB
- Maximum images per prompt: 1
Meta Llama Series
Llama 4 Maverick: Meta's latest multimodal model.
- Model name to set: llama4-maverick
- Context window: 128,000 tokens
- Supported file types: .jpg .jpeg .png .webp .gif .bmp
- Supported file size: 10MB
- Maximum images per prompt: 10
Llama 4 Scout: An efficient model from the Llama series.
- Model name to set: llama4-scout
- Context window: 128,000 tokens
- Supported file types: .jpg .jpeg .png .webp .gif
- Supported file size: 10MB
- Maximum images per prompt: 10
Model Performance Benchmarks
The performance of each model is evaluated using the following benchmarks:
| Model | MMMU | Mathvista | ChartQA | DocVQA | VQAv2 |
|---|---|---|---|---|---|
| llama4-maverick | 73.4 | 73.7 | 90.0 | 94.4 | - |
| llama4-scout | 69.4 | 70.7 | 88.8 | 94.4 | - |
| claude-3-5-sonnet | 68.0 | 64.4 | 87.6 | 90.3 | 70.7 |
| pixtral-large | 64.0 | 69.4 | 88.1 | 85.7 | 67.0 |
Regional Availability
Cortex COMPLETE Multimodal can be used without cloud or regional restrictions by enabling the Cross-Region Inference feature.
For detailed availability status, please check the official documentation.
Preparation: Creating a Stage for Images
First, create a stage to store the images. You need to enable server-side encryption and directory tables. (While it's written as a SQL query here, it's perfectly fine to operate via the GUI from Snowsight.)
-- Create an internal stage
CREATE OR REPLACE STAGE image_stage
DIRECTORY = ( ENABLE = true )
ENCRYPTION = ( TYPE = 'SNOWFLAKE_SSE' );
Next, upload the images to the stage. Supported image formats are .jpg, .jpeg, .png, .webp, .gif, and .bmp. (.bmp is only supported by Pixtral Large and Llama 4 Maverick.)
-- Uploading an image
PUT file:///path/to/your/image.jpg @image_stage
AUTO_COMPRESS = FALSE;
Practical Examples: Image Analysis
Example 1: Image Caption Generation
This is an example of generating a caption that describes the content of an image. The query structure is very simple; you just add one argument to the existing COMPLETE function and pass the image file on the stage.
-- Image caption generation using Cortex COMPLETE Multimodal
SELECT SNOWFLAKE.CORTEX.COMPLETE('<model_name>',
'Please describe this image briefly in Japanese.',
TO_FILE('@<stage_name>', '<image_filename>'));

The output results for the image above (an image of me singing at work) are as follows. It's great to see that both models understand and output in Japanese!
-- Output from claude-3-5-sonnet
It shows a person in a gray turtleneck sweater singing with a microphone. In the background, an orange and white striped wall is visible. It looks like a scene from a karaoke or live performance.
-- Output from pixtral-large
This image depicts a person in a green knit sweater holding a black camera to their eye. In the background, there are orange and white stripes with green plants visible within them. The person seems to be taking a photo with the camera, and their expression looks focused.
Example 2: Comparing Multiple Images
This is an example of comparing two images and explaining the differences. Please note that the query structure is slightly different from Example 1. You can pass up to 100 images, and currently, the claude and llama series support multi-image processing (pixtral-large only supports one image).
Also, the PROMPT function requires placeholders corresponding to each image file. In the query below, {0} and {1} refer to the "first image file" and "second image file," respectively.
SELECT SNOWFLAKE.CORTEX.COMPLETE('claude-3-5-sonnet',
PROMPT('Extract similarities between the two images {0} and {1}.',
TO_FILE('@<stage_name>', '<image_filename>'),
TO_FILE('@<stage_name>', '<image_filename>')));


The output results for the two images above (an image of me singing at work and an image of me standing at a booth at an exhibition) are as follows. The analysis is much more thorough than expected, correctly identifying that both are the same person.
-- Output from claude-3-5-sonnet
These are the main similarities between these two images:
1. The same person is pictured, with a distinctive hairstyle (black hair with layers and bangs).
2. Both were taken indoors, in what appears to be commercial spaces like exhibition halls or event venues.
3. Exhibition walls or displays are visible in the background (the first is orange and white stripes, the second is a Snowflake booth).
4. The person is dressed in business-casual attire (the first is a gray turtleneck sweater, the second is a professional uniform or business wear).
5. Both photos look like they captured moments at a tech or business-related event or exhibition.
Example 3: Extracting Information from Images
Let's try extracting information from a hand-drawn diagram I just drew in 10 seconds.
SELECT SNOWFLAKE.CORTEX.COMPLETE('<model_name>',
'This image is a hand-drawn architecture diagram. Please explain the configuration in detail.',
TO_FILE('@<stage_name>', '<image_filename>'));

The output results for the image above (a very rough hand-drawn architecture diagram) are as follows. It understands and explains the image with very high accuracy.
-- Output from claude-3-5-sonnet
This diagram shows an architecture for integrating AWS and Snowflake. The main components are as follows:
1. AWS Side:
- An S3 bucket is placed, functioning as the data source.
2. Integration:
- Data transfer is performed using Snowpipe.
- This is an automated data loading feature from AWS S3 to Snowflake.
3. Snowflake Side:
- There is a staging table that receives data from S3.
- There is a description "Copy Info", which likely refers to the process of loading data using the COPY INTO command.
This architecture represents a common data pipeline configuration that uses Snowpipe to load data stored in AWS S3 into Snowflake in near real-time. By using Snowpipe, the mechanism is set up so that loading into Snowflake starts automatically once data is placed in S3.
-- Output from pixtral-large
This image shows an architecture for moving data between AWS and Snowflake. Here is a detailed explanation:
1. **AWS**:
- The AWS section includes a bucket labeled "S3". This represents an Amazon Simple Storage Service (S3) bucket.
2. **Snowflake**:
- The Snowflake section includes buckets labeled "Stage" and "Table".
- "Stage" represents a temporary storage area within Snowflake.
- "Table" represents a table within the Snowflake database.
3. **Snowpipe**:
- There is an arrow labeled "Snowpipe" between the AWS S3 bucket and the Snowflake stage.
- Snowpipe is a Snowflake service for automatically loading data from an S3 bucket to a Snowflake stage.
4. **Data Flow**:
- Data moves from the AWS S3 bucket to the Snowflake stage via Snowpipe.
- From the Snowflake stage, data is loaded into the Snowflake table through a "Copy Into" operation.
In summary, this architecture shows the process for moving data from an AWS S3 bucket to a Snowflake stage and then loading it into a Snowflake table. This process is automated using Snowpipe and uses Snowflake's "Copy Into" command to load data into the table.
Cost Considerations
Fees are charged based on the number of tokens processed. The number of tokens per image varies depending on the vision model's architecture:
- Anthropic (Claude) models: Number of tokens = (image width in pixels × image height in pixels) / 750
- Mistral (Pixtral) models: The image is divided into 16×16 pixel batches, and each batch is converted into 1 token. Total tokens = (image width in pixels / 16) × (image height in pixels / 16)
- Meta (Llama) models: The image is divided into square tiles. Depending on the image's aspect ratio and size, it uses up to 16 tiles, with each tile being approximately 153 tokens.
To manage costs efficiently, it is important to select the appropriate model based on the complexity of the task and the size of the images being processed.
As a technique, you can also optimize costs and improve processing speed by resizing (reducing) images before passing them to the function.
Business Ideas
Multimodal capabilities can be utilized in business scenarios such as the following:
- E-commerce Product Image Management: Automatically generate descriptions and tags from product images.
- Real Estate Photo Analysis: Automatically extract floor plans and features from real estate photos.
- Data Extraction from Document Images: Retrieve structured data from images of invoices, contracts, etc.
- Medical Image Organization and Search: Automatically assign metadata to medical images.
- Social Media Image Analysis: Analyze the content of social media images for marketing purposes.
Furthermore, by combining this with existing Snowflake features such as Streamlit in Snowflake, the possibilities for data applications expand even further. I encourage you to bring your ideas to life using Cortex COMPLETE Multimodal.
Final Thoughts
The multimodal functionality of the COMPLETE function is quite powerful, and it is undoubtedly a positive development that such advanced image processing can be achieved using only standard built-in functions. A key point is that it can be integrated into existing Snowflake workflows to derive even more value from your data. While this article only introduced the basic features, I will be sharing applied examples using Cortex COMPLETE Multimodal soon, so please look forward to it.
Promotion
Virtual Hands-on Delivered at SNOWFLAKE DISCOVER!
I delivered a practical virtual hands-on session titled "Next-Generation VoC (Voice of Customer) Analysis with Snowflake Cortex AI" at SNOWFLAKE DISCOVER Part 2, a large-scale webinar for Snowflake engineers held on July 8-9, 2025. You can experience how to analyze customer voices relevant across many industries using Snowflake's latest features. If you are looking for hints on analyzing unstructured data, I would be delighted if you could watch it!
You can watch it immediately on-demand by registering via the link below.
Spoke at SNOWFLAKE DISCOVER!
I spoke in the very first session, "Snowflake from Scratch: Building a Modern Data & AI Platform," at SNOWFLAKE DISCOVER, a large-scale webinar for Snowflake engineers held on April 24-25, 2025. I explained everything from an overview of Snowflake to the latest updates as clearly as possible, so I would be happy if you could use it to catch up!
You can watch it immediately on-demand by registering via the link below.

Spoke at a Webinar by Generative AI Conf!
I gave a Lightning Talk (LT) on the theme of Data & AI as a Snowflake employee at a webinar titled "Platforms Supporting the Generative AI Era," alongside NVIDIA and my former employer, AWS! The video archive is available below, so please feel free to watch it!
Delivering Snowflake's "What's New" on X
I am delivering update information for Snowflake's "What's New" on X, so please feel free to follow me.
Japanese Version
Snowflake's What's New Bot (Japanese Version)
English Version
Snowflake What's New Bot (English Version)
Change Log
(2025/04/15) New post
(2025/04/16) Added information on supported clouds and regions
(2025/04/21) Fixed supported file sizes and added notes on the PROMPT function
(2025/05/08) Updated promotion section
(2025/06/01) Updated available models (added Claude 4 and Llama 4 series), added model performance benchmark table, and detailed cost considerations
(2025/06/29) Updated promotion section
(2025/09/27) Updated promotion section

Discussion