iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🐕

Foundry Local: Easy Access to Local LLMs via OpenAI-Compatible APIs beyond the CLI (including C# SDK support)

に公開

Overview

Foundry Local is a convenient mechanism that allows you to easily use local LLMs on Windows with just a few commands and shared caching. I also touched on this in the following article.

https://zenn.dev/suusanex/articles/a1a3dae3da02ca

In that article, I introduced it from the perspective of ease of use via the CLI, so I came up with a workaround to force-pass multi-line prompts. However, if you are going that far, it is more convenient to write a program in C# or similar.

While Foundry Local's convenience of instant CLI chatting stands out in demos and such, that is not all it offers. It also exposes an OpenAI API-compatible endpoint, making it easy to call from programs like C#.

When I tried implementing it in C#, it was just as easy to use as the CLI and very convenient, so in this article, I will introduce that implementation example.

  • Note: Foundry Local is still in preview as of the time of this post.

Summary of Conclusions

I have placed a simple app on GitHub that uses the SDK for .NET (Microsoft.AI.Foundry.Local) to perform model selection and loading, and then calls it via Semantic Kernel for chatting. If you want to see the code without the explanation, please check here.

https://github.com/suusanex/sample_azure_ai_foundry_local_chat

Also, the MS Learn explanation can be found on the following page:

https://learn.microsoft.com/ja-jp/azure/ai-foundry/foundry-local/get-started?wt.mc_id=MVP_452751

Explanation

Configuration Explanation

First, when you enable CLI chat with Foundry Local, it doesn't actually run as a standalone CLI; it operates in a configuration like the one shown in the diagram below.

@startuml

component "Foundry Local Windows Service" as service
component "OpenAI API-compatible endpoint" as ep

component "CLI (Chat on command prompt)" as cli

actor "User" as user

service --> ep : Exposes endpoint
cli -u-> ep : Communicate with endpoint to chat
user --> cli : Chat via CLI

@enduml

Therefore, the CLI is not mandatory. As long as the Foundry Local service is running and has loaded a model, you can call the Foundry Local local LLM using the same code you would use to call the cloud-based OpenAI API.

Operating Foundry Local with the SDK for .NET

Add Microsoft.AI.Foundry.Local via NuGet. This is the SDK for .NET.

The API usage is almost identical to that of the CLI. To give an example of the CLI usage, you would run the following commands:

  • Installation
    • winget install Microsoft.FoundryLocal
  • List available models
    • foundry model list
  • Run a specific model
    • foundry model run <model_name>

While installation is something to be performed separately, listing and running models are executed similarly in C#.

However, while the CLI's "run" command handles everything from downloading the cache to loading the model automatically, in C#, you need to use different APIs for each step. It looks like this:

var manager = new FoundryLocalManager();

// Get a list of available models
var catalogModels = await manager.ListCatalogModelsAsync();

// Using the first model as an example here
var modelInfo = catalogModels.First();

// Download the model (not necessary if already cached)
await manager.DownloadModelAsync(modelInfo.ModelId);

// Load the model
await manager.LoadModelAsync(modelInfo.ModelId);

Once these processes are called, the Foundry Local service will be in a state where the model is loaded and the endpoint is exposed. It's just as simple to use as the CLI.

Supplementary notes on omitted parts

  1. If the service is stopped, you must start the service first. You can start it using FoundryLocalManager.StartServiceAsync() without needing to use SCM or similar tools.
  2. The logic for determining whether a model is cached has been omitted in this article.
  3. Once a model is loaded, the Foundry Local service may begin processing using the CPU or GPU. Since Foundry Local seems to automatically determine whether this is necessary, you will need to restart the service to stop it definitively.

Calling the Foundry Local Endpoint

By using the variables manager and model created in the explanation above, you can call the OpenAI API-compatible endpoint.

In the case of the Semantic Kernel example, you use them as follows:


var builder = Kernel.CreateBuilder();
builder.AddOpenAIChatCompletion(
    model.ModelId,
    manager.Endpoint,
    "unused"
);
var kernel = builder.Build();

In the example, "unused" is passed as the third argument for the API key, but since the Foundry Local endpoint doesn't require authentication, anything is fine here.

Now that you can make calls, you don't need to be conscious that the endpoint is Foundry Local when building chat or other processes. If you have an existing app calling the OpenAI API, you can switch it to call Foundry Local just by changing the parameters.

Since the implementation beyond this point is not related to Foundry Local, it is omitted in this article.

Code Sample

A working code sample including the above content is available in the following repository. (It is the same one mentioned at the beginning of the article.)

https://github.com/suusanex/sample_azure_ai_foundry_local_chat

In this sample, the Foundry Local model is loaded from a WinUI 3 screen, and chatting is implemented using Semantic Kernel.

Summary

I think calling Foundry Local from C# code can be written just as easily as with the CLI.

With local LLMs, you can use them immediately without contracts, billing, or obtaining API keys. If local LLMs can be used this easily, I think they can be used for various purposes, such as quickly trying out things you've thought of.

Let's use them conveniently by balancing them with cloud-based LLMs!

Discussion