iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
💤

Implementing IEnumerable in ZLinq

に公開
6

Due to various circumstances, I will be migrating to Zenn starting this year.
I have written various articles on Qiita as well, so please check them out.

Introduction

ZLinq v1 was released recently, but as stated in the README, it does not implement IEnumerable<T>.

https://github.com/Cysharp/ZLinq

This post is about trying to do something about that!

  • The following benchmark results are from v1. ZLinq v1.1 addresses the issue where the combination of Order + Take, which appears frequently in this post, was slower than standard LINQ.

Review of LINQ (IEnumerable)

Before discussing the implementation of IEnumerable in ZLinq, I'd like to review the benefits of LINQ (IEnumerable).

There are two major benefits:

  • Deferred execution
  • Memory efficiency (!)

Memory Efficiency!?

ZLinq is a set of LINQ methods aiming for zero allocation. In comparison, standard LINQ appears to have unnecessary allocations, but there is a reason for this.

Cases where standard LINQ is superior

In cases like the following, where you select 100 users who meet a specific condition, standard LINQ produces better results.

[Benchmark]
public int WhereTakeLinq()
{
    var ranking = arr.Where(x => x >= 0).Take(100);
    return UploadToDatabase(ranking);
}

[Benchmark]
public int WhereTakeZLinq()
{
    var ranking = arr.AsValueEnumerable().Where(x => x >= 0).Take(100).ToArray();  // 👈 The crux of this post: ToArray()
    return UploadToDatabase(ranking);
}
UploadToDatabase (IEnumerable<int>)

External service APIs often take IEnumerable<T> as an argument, so it is important that no "re-filling" (conversion) is required.

public int UploadToDatabase(IEnumerable<int> ranking)
{
    // 👇 This is just some dummy code for testing.
    int i = 0;
    foreach (var item in ranking)
    {
        i++;
    }
    return i;
}

Benchmark Results

Standard LINQ wins in both allocation and processing speed.

Method N Mean Error StdDev Gen0 Gen1 Gen2 Allocated
WhereTakeLinq 10 53.12 ns 14.528 ns 0.796 ns 0.0124 - - 104 B
WhereTakeZLinq 10 50.66 ns 6.689 ns 0.367 ns 0.0114 - - 96 B
WhereTakeLinq 100 308.07 ns 13.692 ns 0.751 ns 0.0124 - - 104 B
WhereTakeZLinq 100 332.61 ns 33.038 ns 1.811 ns 0.0544 - - 456 B
WhereTakeLinq 1000 311.63 ns 97.130 ns 5.324 ns 0.0124 - - 104 B
WhereTakeZLinq 1000 334.84 ns 33.794 ns 1.852 ns 0.0544 - - 456 B
WhereTakeLinq 10000 308.82 ns 13.124 ns 0.719 ns 0.0124 - - 104 B
WhereTakeZLinq 10000 335.64 ns 44.383 ns 2.433 ns 0.0544 - - 456 B
WhereTakeLinq 10000000 308.31 ns 6.594 ns 0.361 ns 0.0124 - - 104 B
WhereTakeZLinq 10000000 332.69 ns 38.366 ns 2.103 ns 0.0544 - - 456 B

Why?

This result is due to the deferred execution of IEnumerable<T>. Standard LINQ creates an object representing the "aggregation method" (= IEnumerable) and can pass it to methods that need the "aggregation result."

A method that has been passed the aggregation "method" can obtain as many "results" as needed from the aggregation method when they are actually required. (Deferred execution)

Furthermore, there is no need to hold the aggregation results. (Memory efficiency)

// Create an aggregation "method",
var ranking = arr.Where(x => x >= 0).Take(100);

// and pass it to a method that needs the aggregation "result."
UploadToDatabase(ranking);

In contrast, since ZLinq does not implement IEnumerable, it is necessary to pre-determine the aggregation "result" using methods like ToArray before passing it to a method. Of course, allocation is required to store the aggregation results.

var ranking = arr.AsValueEnumerable()
                 .Where(x => x >= 0)
                 .Take(100)
                 .ToArray();  // ZLinq must pass the "aggregation result" instead of the "aggregation method."

// The "aggregation result" is compatible with the "aggregation method," so it can be passed.
UploadToDatabase(ranking);

In addition to memory consumption, since the aggregation is performed all at once, there is also the issue of heavy CPU usage at that specific timing. (Spikes)

LINQ Always Consumes a Constant Amount of Memory

While the aggregation "method" is not affected by the source element count (N) or the number of aggregation results (M), the aggregation "result" is heavily impacted by M. Memory consumption increases as the number of results grows.

This issue, which nullifies ZLinq's zero-allocation benefits, is the primary reason why I want to implement IEnumerable<T>.

Method N M Mean Error StdDev Gen0 Gen1 Allocated
WhereTakeLinq 1000 100 299.1 ns 43.31 ns 2.37 ns 0.0124 - 104 B
WhereTakeZLinq 1000 100 335.3 ns 9.05 ns 0.50 ns 0.0544 - 456 B
WhereTakeLinq 1000 1000 2,633.7 ns 786.79 ns 43.13 ns 0.0114 - 104 B
WhereTakeZLinq 1000 1000 3,118.8 ns 181.92 ns 9.97 ns 0.4807 0.0038 4056 B
WhereTakeLinq 10000 100 309.8 ns 22.35 ns 1.23 ns 0.0124 - 104 B
WhereTakeZLinq 10000 100 339.7 ns 78.87 ns 4.32 ns 0.0544 - 456 B
WhereTakeLinq 10000 1000 2,803.5 ns 507.97 ns 27.84 ns 0.0114 - 104 B
WhereTakeZLinq 10000 1000 3,098.9 ns 195.46 ns 10.71 ns 0.4807 0.0038 4056 B

Cases Where ZLinq is Superior

Not all LINQ operations are executed lazily. As shown below, Order is executed immediately and must hold the sorted results, so ZLinq comes out on top.

Ex) Extracting the top 100 players with the highest scores (illustrative)

[Benchmark]
public int SortTakeLinq()
{
    var ranking = arr.OrderDescending().Take(100);
    return UploadToDatabase(ranking);
}

[Benchmark]
public int SortTakeZLinq()
{
    var ranking = arr.AsValueEnumerable().OrderDescending().Take(100).ToArray();
    return UploadToDatabase(ranking);
}

Benchmark Results

ZLinq is more memory-efficient, but it is a mystery why the processing speed becomes significantly slower. (Sorting algorithm?)

Method N Mean Error StdDev Gen0 Gen1 Gen2 Allocated
SortTakeLinq 10 147.85 ns 6.008 ns 0.329 ns 0.0362 - - 304 B
SortTakeZLinq 10 95.27 ns 12.460 ns 0.683 ns 0.0248 - - 208 B
SortTakeLinq 100 1,603.65 ns 213.537 ns 11.705 ns 0.1221 - - 1024 B
SortTakeZLinq 100 844.46 ns 205.519 ns 11.265 ns 0.0677 - - 568 B
SortTakeLinq 1000 5,903.77 ns 1,111.586 ns 60.930 ns 0.9766 0.0076 - 8224 B
SortTakeZLinq 1000 10,376.86 ns 358.087 ns 19.628 ns 0.0610 - - 600 B
SortTakeLinq 10000 45,274.88 ns 4,608.351 ns 252.599 ns 9.5215 1.1597 - 80224 B
SortTakeZLinq 10000 147,498.03 ns 13,780.986 ns 755.382 ns - - - 600 B
SortTakeLinq 10000000 61,013,314.81 ns 6,759,436.720 ns 370,507.537 ns 444.4444 444.4444 444.4444 80000423 B
SortTakeZLinq 10000000 280,530,650.00 ns 12,023,278.588 ns 659,036.473 ns - - - 968 B

Implementation of IEnumerable<T>

Getting back to the topic, regarding the implementation of IEnumerable in ZLinq, the conclusion is: "It's impossible with .NET 9, but possible with .NET 8."

If you remove net9.0 from the target platforms of the Benchmarks and ZLinq projects downloaded from the official ZLinq repository and add the following code, you will be able to use AsEnumerable().

  • The purpose of removing net9.0 was to quickly eliminate NET9_0_OR_GREATER (i.e., to remove allows ref struct). It's not that .NET 9 itself is bad, but rather that the allows ref struct type constraint is simply too restrictive to make this work!
public struct ZLinqEnumerable<T>
    : IEnumerable<T>
    , IEnumerator<T>
    , IAsyncEnumerator<T>  // It might be better not to implement async
    , IAsyncEnumerable<T>  // Same as above
{
    // Remove readonly if using concrete types
    readonly IValueEnumerator<T> enumerator;
    T current;

    public ZLinqEnumerable(IValueEnumerator<T> enumerator)
    {
        this.enumerator = enumerator;
        this.current = (((default)))!;
    }

    IEnumerator<T> IEnumerable<T>.GetEnumerator() => this;
    IEnumerator IEnumerable.GetEnumerator() => this;
    IAsyncEnumerator<T> IAsyncEnumerable<T>.GetAsyncEnumerator(CancellationToken cancellationToken = default) => this;

    public T Current => this.current;
    object IEnumerator.Current => this.current;

    public bool MoveNext() => enumerator.TryGetNext(out current);

    public void Dispose()
    {
        current = (((default)))!;
        enumerator.Dispose();
    }

    void IEnumerator.Reset() => throw new NotSupportedException();

    public ValueTask<bool> MoveNextAsync()
    {
        var ret = enumerator.TryGetNext(out current);
        return new ValueTask<bool>(ret);
    }

    public ValueTask DisposeAsync()
    {
        Dispose();
        return new(Task.CompletedTask);
    }
}

Extension method.

public static class ZLinqEnumerable
{
    public static ZLinqEnumerable<T> AsEnumerable<T>(this IValueEnumerator<T> enumerator)
    {
        return new(enumerator);
    }
}

Test Code and Benchmark Results

Test code using the above ZLinqEnumerable<T>.

[Benchmark]
public int Linq()
{
    var ranking = arr.OrderDescending().Take(M);
    return UploadToDatabase(ranking);
}

[Benchmark]
public int ZLinq()
{
    var ranking = arr.AsValueEnumerable().OrderDescending().Take(M).ToArray();
    return UploadToDatabase(ranking);
}

[Benchmark]
public int ZLinqEnum()
{
    var ranking = arr.AsValueEnumerable().OrderDescending().Take(M).Enumerator.AsEnumerable();
    return UploadToDatabase(ranking);
}


public int UploadToDatabase(IEnumerable<int> ranking)
{
    int i = 0;
    foreach (var item in ranking)
    {
        i++;
    }
    return i;
}

The execution results are as follows.

As intended, it is no longer affected by the number of "aggregation results." (It's a mystery why the processing speed is slower)

Method N M Mean Error StdDev Gen0 Gen1 Gen2 Allocated
Linq 1000 100 6.945 us 1.7554 us 0.0962 us 0.9766 0.0076 - 8224 B
ZLinq 1000 100 10.863 us 0.1562 us 0.0086 us 0.0610 - - 600 B
ZLinqEnum 1000 100 10.962 us 1.5831 us 0.0868 us 0.0305 - - 296 B
Linq 1000 1000 19.769 us 1.8344 us 0.1006 us 0.9766 - - 8224 B
ZLinq 1000 1000 12.332 us 0.7399 us 0.0406 us 0.4883 - - 4168 B
ZLinqEnum 1000 1000 12.794 us 0.8273 us 0.0453 us 0.0305 - - 296 B
Linq 10000 100 45.892 us 3.2693 us 0.1792 us 9.5215 1.1597 - 80224 B
ZLinq 10000 100 148.827 us 14.4081 us 0.7898 us - - - 600 B
ZLinqEnum 10000 100 150.096 us 2.0773 us 0.1139 us - - - 296 B
Linq 10000 1000 60.922 us 7.0835 us 0.3883 us 9.5215 1.1597 - 80224 B
ZLinq 10000 1000 149.630 us 2.2578 us 0.1238 us 0.4883 - - 4200 B
ZLinqEnum 10000 1000 148.851 us 6.3202 us 0.3464 us - - - 296 B
Linq 10000000 100 52,678.207 us 5,928.5015 us 324.9612 us 400.0000 400.0000 400.0000 80001637 B
ZLinq 10000000 100 266,093.017 us 15,235.8069 us 835.1260 us - - - 968 B
ZLinqEnum 10000000 100 271,718.600 us 100,678.2548 us 5,518.5149 us - - - 664 B
Linq 10000000 1000 60,983.478 us 4,386.5359 us 240.4408 us 444.4444 444.4444 444.4444 80000423 B
ZLinq 10000000 1000 278,373.350 us 14,932.8319 us 818.5189 us - - - 4568 B
ZLinqEnum 10000000 1000 271,367.583 us 15,597.0636 us 854.9277 us - - - 664 B

* Since the processing speed was very slow anyway, here are the results of adding .NET 9 back to the target platforms, restarting, rebuilding, and executing the following.

Update: This has been addressed in the latest version of ZLinq. See the comments section of this post for details.

// This method will only work with ZLinq, but it's unavoidable.
[Benchmark]
public int ZLinqDedicated()
{
    var ranking = arr.AsValueEnumerable().OrderDescending().Take(M);

    int i = 0;
    foreach (var item in ranking)
    {
        i++;
    }
    return i;
}

I feel like I might have made a mistake somewhere, but the execution speed results showed that standard LINQ was faster.

(Is it the Intel CPU? However, since ReadMeBenchmark is faster with ZLinq, it's possible that the combination of LINQ methods in the test code is simply slow.)

In terms of memory efficiency, it exceeds ZLinqEnumerable. Regarding processing speed, they are almost the same, so it seems that method calls via interfaces—which are often said to be slow—had almost no impact.

Method N M Mean Error StdDev Gen0 Allocated
ZLinqDedicated 1000 100 10.89 us 2.980 us 0.163 us 0.0153 144 B
ZLinqDedicated 1000 1000 12.53 us 5.929 us 0.325 us 0.0153 144 B
ZLinqDedicated 10000 100 148.66 us 40.993 us 2.247 us - 144 B
ZLinqDedicated 10000 1000 152.31 us 34.979 us 1.917 us - 144 B
ZLinqDedicated 10000000 100 268,417.80 us 29,667.028 us 1,626.150 us - 512 B
ZLinqDedicated 10000000 1000 271,838.93 us 18,617.717 us 1,020.500 us - 512 B

Postscript: Classification Table for Standard LINQ

  • It seems only a Visual Basic version exists.

https://learn.microsoft.com/ja-jp/dotnet/visual-basic/programming-guide/concepts/linq/classification-of-standard-query-operators-by-manner-of-execution#classification-table

Conclusion

Without an implementation of IEnumerable<T>, ZLinq is essentially just a set of array-generating methods. Furthermore (assuming there were no errors in the ZLinqDedicated() benchmark), it implies that while ref struct contributes to memory reduction, it provides no benefit in terms of processing speed. Therefore, intentionally sticking with .NET 8 might be a viable option.

(Ideally, I'd like a ZLINQ_DISABLE_ALLOWS_REF_STRUCT preprocessor symbol.)

The Fundamental Problem

The root cause is that GetEnumerator() for IEnumerable<T> (which requires casting to an interface) cannot be implemented due to the impact of allows ref struct. Perhaps there's a way to do something hacky using InterceptsLocation?

https://andrewlock.net/exploring-the-dotnet-8-preview-changing-method-calls-with-interceptors/

Making it Readonly

Perhaps the processing speed would improve if the struct were made readonly by making current a reference type, as shown below?

https://github.com/sator-imaging/Unity-Fundamentals/blob/main/Runtime/Threading/WhenEachEnumerator.cs#L74-L102

That's all. Thank you for your time.

Discussion

junerjuner

集計方法って Aggregate だとダメなんでしたっけ……?

https://github.com/Cysharp/ZLinq/blob/6707324219d74e8582f03dfa5bc5212a4283cf9a/src/ZLinq/Linq/Aggregate.cs#L5

追伸: あー ConsumeRanking の引数が IEnumerable<T> だからか。それなら Aggregate の引数パターンで共通関数にすれば良さそう……?

サトー™ @sator_imagingサトー™ @sator_imaging

集計と書いてますが集計その物に興味はなくて、例えば UploadToCloudDatabase(IEnumerable<T>) みたいな API が内部で使う、外部由来の API に対して配列の詰め替えなしで渡したいって感じです。

ZLinq を最大限活用するために API 側を合わせられる環境なら、おっしゃる通りどうとでもなるかと!

Yoshifumi KawaiYoshifumi Kawai

色々検証ありがとうございます!
特にOrderDescendingの性能低下が引っかかったので調べてみたのですが、これは恐らくTakeが原因かな、と。
Takeが原因といってもTakeが遅いわけではなくて、SystemLinqの場合は
Orderの後にTakeが入るとSkipTakeOrderedIteratorというものが生成されて、呼び出し階層が一段階減ります。
やってるのはそれだけのことで凄い特殊な最適化が入っているわけではなさそうなのですが、
この呼び出し階層が一段回減る(親へのMoveNext呼び出しが一個減る)というのが、
些細に見えてマイクロベンチマークを取るとかなり差が現れてしまうのが諸々の検証で分かっているので、そのせいかな、と。
もう一つはTakeの個数で、Nに比べてMが少ないときに上記と相まって差が出てくるっぽいですね(最後のN, Mのグラフでも同数だと差がない)

まぁ、言い訳はともあれ、じゃあ同じようにSkipTakeOrderedIterator的なもの作ればいいじゃんというのも解なのですが
コピペでばーっと作る分にはそんな難しい話でもないのですが、
OrderByのコード量が多いことと相まって、ちょっと食い合わせの悪さがあって、やや躊躇うところがあります……。

追記:
と思ったんですが、全体的に別に↑で全く説明ついてないし、差が大きすぎるので、ちゃんと見直したら
SystemLinqのSkipTakeOrderedIteratorはtakeの量によってソート量を削減するロジックが入ってますね(PartialQuickSort)。
このロジックの有無で、NとMの差が開いているときにLinqとZLinqで、ソート省いている量の差がが広がって性能差になっているようでした……!
というわけでこれもちゃんとできるように出直してきます、ありがとうございます!!!

サトー™ @sator_imagingサトー™ @sator_imaging

お返事遅くなりました!

私もちょうど処理速度を調べていたら、標準 Linq は Take の接続なしだと ZLinq と同等に遅くなりました。サンプルで書くときにぱっと思いつく組み合わせなので、C# 歴史を感じさせる最適化ですねーー。

OrderBy は Take の接続の有る無しに関わらず、maxCount(デフォルト値 int.MaxValue)とか持たせるのもアリかもですね。遅くはなりますがコードのコピペ削減は出来そうです。
※ Take は接続時前のオペレーターの maxCount を設定するだけ。

Yoshifumi KawaiYoshifumi Kawai

v1.1でこちら対応しました、ありがとうございます……!

なお、ref structに関してですが、これはSpan対応のために必要、以外の理由はないです!
実際、Spanで動くということよりもAsEnumerableできるほうが利便性は圧倒的に上ですね。
とはいえ(マーケティング的な意味(?))でSpan対応のほうが必須度合いが高かったので
そこに関してはしょーがない、という判断を取りました。
ref struct的なものを抱えていたら全体がref structになり、そうでなければstructとなる。
みたいな言語機能がつけば最高なのですが、まぁさすがにそんなものが来ることはないでしょうねぇ。