iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
💡

Source Generator Tips (#1): Use IndentedStringBuilder for Code Formatting

に公開

I plan to write a collection of tips and tricks for Source Generators from time to time.
This time, I'll share a small technique for code generation.

TL;DR

Create a StringBuilder wrapper with code formatting functionality and use it for code generation.

The Issue

As the name suggests, a Source Generator is a mechanism for generating source code programmatically.
Therefore, the simplest form is as follows: [1]

// Write the source code directly as a string
var source = $$"""
    namespace GeneratedNamespace;

    public class GeneratedClass
    {
        public void GeneratedMethod()
        {
            // Some processing
        }
    }
    """;

context.AddSource("GeneratedFile.g.cs", source);

However, in reality, you often want to do more complex things.

An Example

For example, in the previous snippet, the code was output to GeneratedNamespace, but let's consider matching this to the namespace of the user's code.

A naive output would look like this:

var namespaceName = GetNamespaceFromUserCode(parseRecordObject);

var source = $$"""
    namespace {{namespaceName}}
    {
        public class GeneratedClass
        {
            public void GeneratedMethod()
            {
                // Some processing
            }
        }
    }
    """;

In this case, if there is no namespace (i.e., at the global level), it results in namespace { ... }, which causes a compilation error.

So, let let's try to fix it.

var namespaceName = GetNamespaceFromUserCode(parseRecordObject);
var sourceBase = $$"""
    public class GeneratedClass
    {
        public void GeneratedMethod()
        {
            // Some processing
        }
    }
    """;
if (!string.IsNullOrEmpty(namespaceName))
{
    generatedCode = $"""
        namespace {namespaceName}
        {{
        {sourceBase}
        }}
        """;
}
else {
    generatedCode = sourceBase;
}

With this approach, the generated code looks like this:

// Example of generated code
namespace MyApp.Models
{
public class GeneratedClass
{
    public void GeneratedMethod()
    {
        // Some processing
    }
}
}

It's not technically wrong, and it will run without any issues, but the indentation is broken.
If you were to generate internal logic in a separate function, the indentation would get even messier.

// Example of generated code
namespace MyApp.Models
{
public class GeneratedClass
{
    public void GeneratedMethod()
    {
var source = "Some processing";
if(CallAnotherMethod())
{
source += " More processing";
}
Console.WriteLine(source);
    }
}
}

You might say it's fine since it's auto-generated, but it's still a bit bothersome, isn't it?

Solutions

Option 1: Pre-embedding indentation

It looks something like this:

// Generate code with indentation pre-embedded
var generatedMethod = $$***
        public void GeneratedMethod()
        {
            // Some processing
        }
    ***;
var source = $$***
    public class GeneratedClass
    {
    {{generatedMethod}}
    }
    ***;

While this works, maintenance becomes painful. Furthermore, it becomes even harder when the indentation changes dynamically, as in the namespace example mentioned earlier.

Option 2: Applying indentation to function results each time

It looks something like this:

// Generate code normally (e.g., from an external function)
var generatedMethod = $$***
    public void GeneratedMethod()
    {
        // Some processing
    }
    ***;
// Apply (or remove) indentation to the generated code
var generatedMethodWithIndent = Indent(generatedMethod, 1);
// Use the result
var source = $$***
    public class GeneratedClass
    {
    {{generatedMethodWithIndent}}
    }
    ***;

// Example of an indentation function
string Indent(string code, int indentLevel)
{
    const int spacesPerIndent = 4;
    var indent = new string(' ', indentLevel * spacesPerIndent);
    var lines = code.Split(new[] { "\r\n", "\r", "\n" }, StringSplitOptions.None);
    for (int i = 0; i < lines.Length; i++)
    {
        lines[i] = indent + lines[i];
    }
    // Note that Environment.NewLine cannot be used here
    return string.Join('\n', lines);
}

This method works, but having to call it every time is a drawback. Also, the method described later is easier to read, so I recommend that one.

Option 3: Using NormalizeWhitespace

The Roslyn API has a code formatting feature called NormalizeWhitespace.

var tree = CSharpSyntaxTree.ParseText(generatedCode);
var formattedCode = tree.GetRoot().NormalizeWhitespace().ToFullString();

While this seems like the perfect function, it is actually considered a function that should not be used in Source Generators.

https://github.com/dotnet/roslyn/issues/52914

As mentioned in the issue above, it causes a significant performance degradation in Source Generators. Furthermore, no good alternative for performing this at high speed is provided.

Option 4: Using CSharpier

Alternatively, you might think of using an external code formatting tool. CSharpier is a code formatter for .NET that can also be used as a library.

So, it might seem like you could just write it like this...

var formattedCode = CSharpFormatter.Format(generatedCode).Code;

However, using external libraries in Source Generators is extremely tedious. Specifically, you need to include all the package's dependency DLLs in the output, and the configuration is very complex.

By the way, CSharpier itself has this many dependencies (you would need to include all of these transitive dependencies as well).

Therefore, this approach is also not ideal.

Option 5: Using IndentedStringBuilder

This is the method proposed by maintainer Cyrus Najmabadi in the issue mentioned above.

The mechanism is very simple: create a StringBuilder wrapper with indentation management capabilities and use it for code generation. The implementation is as follows:

/// <summary>
/// Utility class for efficiently building indented strings.
/// </summary>
internal class IndentedStringBuilder(StringBuilder stringBuilder, int indentLevel = 0)
{
    public const int IndentSize = 4;

    public IndentedStringBuilder(int indentLevel = 0)
        : this(new StringBuilder(), indentLevel) { }

    public int IndentLevel { get; private set; } = indentLevel;

    public string Indent => new(' ', IndentLevel * IndentSize);

    public void IncreaseIndent() => IndentLevel += 1;

    public void DecreaseIndent() => IndentLevel = Math.Max(0, IndentLevel - 1);

    public void AppendLine(string text)
    {
        var lines = text.Split(["\r\n", "\r", "\n"], StringSplitOptions.None);
        foreach (var line in lines)
        {
            stringBuilder.AppendLine(Indent + line);
        }
    }

    public override string ToString() => stringBuilder.ToString();
}

It has AppendLine and ToString just like a regular StringBuilder, but with the addition of IncreaseIndent/DecreaseIndent. This small addition makes managing indentation during code generation significantly easier.

For example, in a case like "Option 2":

public void Generate()
{
    var builder = new IndentedStringBuilder();
    builder.AppendLine("public class GeneratedClass {");
    builder.IncreaseIndent();
    GenerateMethodCode(builder);
    builder.DecreaseIndent();
    builder.AppendLine("}");
}

public void GenerateMethodCode(IndentedStringBuilder builder)
{
    builder.AppendLine("public void GeneratedMethod() {");
    builder.IncreaseIndent();
    GenerateAnotherMethod(builder);
    builder.DecreaseIndent();
    builder.AppendLine("}");
    return builder;
}

public void GenerateAnotherMethod(IndentedStringBuilder builder)
{
    builder.AppendLine("var source = \"Some processing\";");
    builder.AppendLine("if(CallAnotherMethod()) {");
    builder.IncreaseIndent();
    builder.AppendLine("source += \" More processing\";");
    builder.DecreaseIndent();
    builder.AppendLine("}");
}

In this way, even when delegating code generation to another function, you can maintain the indentation by passing the builder. Additionally, unlike Option 2, you don't need to receive the result as a return value every time, which makes the code look much cleaner.

Further Improvements

Let's try to generate the namespace-aware code mentioned at the beginning using this IndentedStringBuilder.

var builder = new IndentedStringBuilder();
// namespace 
var namespaceName = GetNamespaceFromUserCode(parseRecordObject);
// If a namespace exists, add the namespace declaration and enter the indentation
if (!string.IsNullOrEmpty(namespaceName))
{
    builder.AppendLine($"namespace {namespaceName}");
    builder.AppendLine("{");
    builder.IncreaseIndent();
}
builder.AppendLine($$"""
    public class GeneratedClass
    {
        public void GeneratedMethod()
        {
            // Some processing
        }
    }
    """);
// When finished, restore the indentation and add the closing brace
if (!string.IsNullOrEmpty(namespaceName))
{
    builder.DecreaseIndent();
    builder.AppendLine("}");
}

It's a bit concerning that the same if condition appears in two places.
So, let's try adding the following features:

internal class IndentedStringBuilder(StringBuilder stringBuilder, int indentLevel = 0)
{
    public IndentedStringBuilder(int indentLevel = 0)
        : this(new StringBuilder(), indentLevel) { }

    public const int IndentSize = 4;

    public int IndentLevel { get; private set; } = indentLevel;

    public string Indent => new(' ', IndentLevel * IndentSize);

    public void IncreaseIndent() => IndentLevel += 1;

    public void DecreaseIndent() => IndentLevel = Math.Max(0, IndentLevel - 1);

    public void AppendLine(string text)
    {
        var lines = text.Split(["\r\n", "\r", "\n"], StringSplitOptions.None);
        foreach (var line in lines)
        {
            stringBuilder.AppendLine(Indent + line);
        }
    }

    public override string ToString() => stringBuilder.ToString();

    // ----------------------
    // Additions below

    public void AppendLineIf(bool condition, string text)
    {
        if (condition)
        {
            AppendLine(text);
        }
    }

    public IDisposable? IndentScopeWithBraceIf(bool condition, string open = "{", string close = "}")
    {
        if (condition)
        {
            AppendLine(open);
            IncreaseIndent();
            return new IndentScopeDisposable(this, close);
        }
        return null;
    }

    private sealed class IndentScopeDisposable(IndentedStringBuilder builder, string? closeBraceText = null) : IDisposable
    {
        public void Dispose()
        {
            builder.DecreaseIndent();
            if (closeBraceText is not null)
            {
                builder.AppendLine(closeBraceText);
            }
        }
    }
}

AppendLineIf is a method for adding a line conditionally.
IndentScopeWithBraceIf is a method for managing an indentation scope in combination with a using statement.

Using these, the namespace-aware code can be written as follows:

var builder = new IndentedStringBuilder();
// namespace
var namespaceName = GetNamespaceFromUserCode(parseRecordObject);
var hasNamespace = !string.IsNullOrEmpty(namespaceName);
builder.AppendLineIf(hasNamespace, $"namespace {namespaceName}");
using (builder.IndentScopeWithBraceIf(hasNamespace))
{
    builder.AppendLine($$"""
        public class GeneratedClass
        {
            public void GeneratedMethod()
            {
                // Some processing
            }
        }
        """);
}

What do you think? Doesn't it look much cleaner now?

When you actually run it, the following code is generated.

namespace test // In case of "test"
{
    public class GeneratedClass
    {
        public void GeneratedMethod()
        {
            // Some processing
        }
    }
}
// In case of no namespace
public class GeneratedClass
{
    public void GeneratedMethod()
    {
        // Some processing
    }
}

Summary

In this article, I introduced a method for managing indentation when generating code with Source Generators. By avoiding the use of NormalizeWhitespace and using a wrapper with indentation management functionality like IndentedStringBuilder, you can perform efficient and readable code generation.

脚注
  1. When generating multi-line source code like this, using Raw String Literals such as $$"""...""" is convenient. ↩︎

GitHubで編集を提案

Discussion