iTranslated by AI
Implementing IEnumerable in ZLinq
Due to various circumstances, I will be migrating to Zenn starting this year.
I have written various articles on Qiita as well, so please check them out.
Introduction
ZLinq v1 was released recently, but as stated in the README, it does not implement IEnumerable<T>.
This post is about trying to do something about that!
- The following benchmark results are from v1. ZLinq v1.1 addresses the issue where the combination of Order + Take, which appears frequently in this post, was slower than standard LINQ.
Review of LINQ (IEnumerable)
Before discussing the implementation of IEnumerable in ZLinq, I'd like to review the benefits of LINQ (IEnumerable).
There are two major benefits:
- Deferred execution
- Memory efficiency (!)
Memory Efficiency!?
ZLinq is a set of LINQ methods aiming for zero allocation. In comparison, standard LINQ appears to have unnecessary allocations, but there is a reason for this.
Cases where standard LINQ is superior
In cases like the following, where you select 100 users who meet a specific condition, standard LINQ produces better results.
[Benchmark]
public int WhereTakeLinq()
{
var ranking = arr.Where(x => x >= 0).Take(100);
return UploadToDatabase(ranking);
}
[Benchmark]
public int WhereTakeZLinq()
{
var ranking = arr.AsValueEnumerable().Where(x => x >= 0).Take(100).ToArray(); // 👈 The crux of this post: ToArray()
return UploadToDatabase(ranking);
}
UploadToDatabase (IEnumerable<int>)
External service APIs often take IEnumerable<T> as an argument, so it is important that no "re-filling" (conversion) is required.
public int UploadToDatabase(IEnumerable<int> ranking)
{
// 👇 This is just some dummy code for testing.
int i = 0;
foreach (var item in ranking)
{
i++;
}
return i;
}
Benchmark Results
Standard LINQ wins in both allocation and processing speed.
| Method | N | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
|---|---|---|---|---|---|---|---|---|
| WhereTakeLinq | 10 | 53.12 ns | 14.528 ns | 0.796 ns | 0.0124 | - | - | 104 B |
| WhereTakeZLinq | 10 | 50.66 ns | 6.689 ns | 0.367 ns | 0.0114 | - | - | 96 B |
| WhereTakeLinq | 100 | 308.07 ns | 13.692 ns | 0.751 ns | 0.0124 | - | - | 104 B |
| WhereTakeZLinq | 100 | 332.61 ns | 33.038 ns | 1.811 ns | 0.0544 | - | - | 456 B |
| WhereTakeLinq | 1000 | 311.63 ns | 97.130 ns | 5.324 ns | 0.0124 | - | - | 104 B |
| WhereTakeZLinq | 1000 | 334.84 ns | 33.794 ns | 1.852 ns | 0.0544 | - | - | 456 B |
| WhereTakeLinq | 10000 | 308.82 ns | 13.124 ns | 0.719 ns | 0.0124 | - | - | 104 B |
| WhereTakeZLinq | 10000 | 335.64 ns | 44.383 ns | 2.433 ns | 0.0544 | - | - | 456 B |
| WhereTakeLinq | 10000000 | 308.31 ns | 6.594 ns | 0.361 ns | 0.0124 | - | - | 104 B |
| WhereTakeZLinq | 10000000 | 332.69 ns | 38.366 ns | 2.103 ns | 0.0544 | - | - | 456 B |
Why?
This result is due to the deferred execution of IEnumerable<T>. Standard LINQ creates an object representing the "aggregation method" (= IEnumerable) and can pass it to methods that need the "aggregation result."
A method that has been passed the aggregation "method" can obtain as many "results" as needed from the aggregation method when they are actually required. (Deferred execution)
Furthermore, there is no need to hold the aggregation results. (Memory efficiency)
// Create an aggregation "method",
var ranking = arr.Where(x => x >= 0).Take(100);
// and pass it to a method that needs the aggregation "result."
UploadToDatabase(ranking);
In contrast, since ZLinq does not implement IEnumerable, it is necessary to pre-determine the aggregation "result" using methods like ToArray before passing it to a method. Of course, allocation is required to store the aggregation results.
var ranking = arr.AsValueEnumerable()
.Where(x => x >= 0)
.Take(100)
.ToArray(); // ZLinq must pass the "aggregation result" instead of the "aggregation method."
// The "aggregation result" is compatible with the "aggregation method," so it can be passed.
UploadToDatabase(ranking);
In addition to memory consumption, since the aggregation is performed all at once, there is also the issue of heavy CPU usage at that specific timing. (Spikes)
LINQ Always Consumes a Constant Amount of Memory
While the aggregation "method" is not affected by the source element count (N) or the number of aggregation results (M), the aggregation "result" is heavily impacted by M. Memory consumption increases as the number of results grows.
This issue, which nullifies ZLinq's zero-allocation benefits, is the primary reason why I want to implement IEnumerable<T>.
| Method | N | M | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
|---|---|---|---|---|---|---|---|---|
| WhereTakeLinq | 1000 | 100 | 299.1 ns | 43.31 ns | 2.37 ns | 0.0124 | - | 104 B |
| WhereTakeZLinq | 1000 | 100 | 335.3 ns | 9.05 ns | 0.50 ns | 0.0544 | - | 456 B |
| WhereTakeLinq | 1000 | 1000 | 2,633.7 ns | 786.79 ns | 43.13 ns | 0.0114 | - | 104 B |
| WhereTakeZLinq | 1000 | 1000 | 3,118.8 ns | 181.92 ns | 9.97 ns | 0.4807 | 0.0038 | 4056 B |
| WhereTakeLinq | 10000 | 100 | 309.8 ns | 22.35 ns | 1.23 ns | 0.0124 | - | 104 B |
| WhereTakeZLinq | 10000 | 100 | 339.7 ns | 78.87 ns | 4.32 ns | 0.0544 | - | 456 B |
| WhereTakeLinq | 10000 | 1000 | 2,803.5 ns | 507.97 ns | 27.84 ns | 0.0114 | - | 104 B |
| WhereTakeZLinq | 10000 | 1000 | 3,098.9 ns | 195.46 ns | 10.71 ns | 0.4807 | 0.0038 | 4056 B |
Cases Where ZLinq is Superior
Not all LINQ operations are executed lazily. As shown below, Order is executed immediately and must hold the sorted results, so ZLinq comes out on top.
Ex) Extracting the top 100 players with the highest scores (illustrative)
[Benchmark]
public int SortTakeLinq()
{
var ranking = arr.OrderDescending().Take(100);
return UploadToDatabase(ranking);
}
[Benchmark]
public int SortTakeZLinq()
{
var ranking = arr.AsValueEnumerable().OrderDescending().Take(100).ToArray();
return UploadToDatabase(ranking);
}
Benchmark Results
ZLinq is more memory-efficient, but it is a mystery why the processing speed becomes significantly slower. (Sorting algorithm?)
| Method | N | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
|---|---|---|---|---|---|---|---|---|
| SortTakeLinq | 10 | 147.85 ns | 6.008 ns | 0.329 ns | 0.0362 | - | - | 304 B |
| SortTakeZLinq | 10 | 95.27 ns | 12.460 ns | 0.683 ns | 0.0248 | - | - | 208 B |
| SortTakeLinq | 100 | 1,603.65 ns | 213.537 ns | 11.705 ns | 0.1221 | - | - | 1024 B |
| SortTakeZLinq | 100 | 844.46 ns | 205.519 ns | 11.265 ns | 0.0677 | - | - | 568 B |
| SortTakeLinq | 1000 | 5,903.77 ns | 1,111.586 ns | 60.930 ns | 0.9766 | 0.0076 | - | 8224 B |
| SortTakeZLinq | 1000 | 10,376.86 ns | 358.087 ns | 19.628 ns | 0.0610 | - | - | 600 B |
| SortTakeLinq | 10000 | 45,274.88 ns | 4,608.351 ns | 252.599 ns | 9.5215 | 1.1597 | - | 80224 B |
| SortTakeZLinq | 10000 | 147,498.03 ns | 13,780.986 ns | 755.382 ns | - | - | - | 600 B |
| SortTakeLinq | 10000000 | 61,013,314.81 ns | 6,759,436.720 ns | 370,507.537 ns | 444.4444 | 444.4444 | 444.4444 | 80000423 B |
| SortTakeZLinq | 10000000 | 280,530,650.00 ns | 12,023,278.588 ns | 659,036.473 ns | - | - | - | 968 B |
Implementation of IEnumerable<T>
Getting back to the topic, regarding the implementation of IEnumerable in ZLinq, the conclusion is: "It's impossible with .NET 9, but possible with .NET 8."
If you remove net9.0 from the target platforms of the Benchmarks and ZLinq projects downloaded from the official ZLinq repository and add the following code, you will be able to use AsEnumerable().
- The purpose of removing
net9.0was to quickly eliminateNET9_0_OR_GREATER(i.e., to removeallows ref struct). It's not that .NET 9 itself is bad, but rather that theallows ref structtype constraint is simply too restrictive to make this work!
public struct ZLinqEnumerable<T>
: IEnumerable<T>
, IEnumerator<T>
, IAsyncEnumerator<T> // It might be better not to implement async
, IAsyncEnumerable<T> // Same as above
{
// Remove readonly if using concrete types
readonly IValueEnumerator<T> enumerator;
T current;
public ZLinqEnumerable(IValueEnumerator<T> enumerator)
{
this.enumerator = enumerator;
this.current = (((default)))!;
}
IEnumerator<T> IEnumerable<T>.GetEnumerator() => this;
IEnumerator IEnumerable.GetEnumerator() => this;
IAsyncEnumerator<T> IAsyncEnumerable<T>.GetAsyncEnumerator(CancellationToken cancellationToken = default) => this;
public T Current => this.current;
object IEnumerator.Current => this.current;
public bool MoveNext() => enumerator.TryGetNext(out current);
public void Dispose()
{
current = (((default)))!;
enumerator.Dispose();
}
void IEnumerator.Reset() => throw new NotSupportedException();
public ValueTask<bool> MoveNextAsync()
{
var ret = enumerator.TryGetNext(out current);
return new ValueTask<bool>(ret);
}
public ValueTask DisposeAsync()
{
Dispose();
return new(Task.CompletedTask);
}
}
Extension method.
public static class ZLinqEnumerable
{
public static ZLinqEnumerable<T> AsEnumerable<T>(this IValueEnumerator<T> enumerator)
{
return new(enumerator);
}
}
Test Code and Benchmark Results
Test code using the above ZLinqEnumerable<T>.
[Benchmark]
public int Linq()
{
var ranking = arr.OrderDescending().Take(M);
return UploadToDatabase(ranking);
}
[Benchmark]
public int ZLinq()
{
var ranking = arr.AsValueEnumerable().OrderDescending().Take(M).ToArray();
return UploadToDatabase(ranking);
}
[Benchmark]
public int ZLinqEnum()
{
var ranking = arr.AsValueEnumerable().OrderDescending().Take(M).Enumerator.AsEnumerable();
return UploadToDatabase(ranking);
}
public int UploadToDatabase(IEnumerable<int> ranking)
{
int i = 0;
foreach (var item in ranking)
{
i++;
}
return i;
}
The execution results are as follows.
As intended, it is no longer affected by the number of "aggregation results." (It's a mystery why the processing speed is slower)
| Method | N | M | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|
| Linq | 1000 | 100 | 6.945 us | 1.7554 us | 0.0962 us | 0.9766 | 0.0076 | - | 8224 B |
| ZLinq | 1000 | 100 | 10.863 us | 0.1562 us | 0.0086 us | 0.0610 | - | - | 600 B |
| ZLinqEnum | 1000 | 100 | 10.962 us | 1.5831 us | 0.0868 us | 0.0305 | - | - | 296 B |
| Linq | 1000 | 1000 | 19.769 us | 1.8344 us | 0.1006 us | 0.9766 | - | - | 8224 B |
| ZLinq | 1000 | 1000 | 12.332 us | 0.7399 us | 0.0406 us | 0.4883 | - | - | 4168 B |
| ZLinqEnum | 1000 | 1000 | 12.794 us | 0.8273 us | 0.0453 us | 0.0305 | - | - | 296 B |
| Linq | 10000 | 100 | 45.892 us | 3.2693 us | 0.1792 us | 9.5215 | 1.1597 | - | 80224 B |
| ZLinq | 10000 | 100 | 148.827 us | 14.4081 us | 0.7898 us | - | - | - | 600 B |
| ZLinqEnum | 10000 | 100 | 150.096 us | 2.0773 us | 0.1139 us | - | - | - | 296 B |
| Linq | 10000 | 1000 | 60.922 us | 7.0835 us | 0.3883 us | 9.5215 | 1.1597 | - | 80224 B |
| ZLinq | 10000 | 1000 | 149.630 us | 2.2578 us | 0.1238 us | 0.4883 | - | - | 4200 B |
| ZLinqEnum | 10000 | 1000 | 148.851 us | 6.3202 us | 0.3464 us | - | - | - | 296 B |
| Linq | 10000000 | 100 | 52,678.207 us | 5,928.5015 us | 324.9612 us | 400.0000 | 400.0000 | 400.0000 | 80001637 B |
| ZLinq | 10000000 | 100 | 266,093.017 us | 15,235.8069 us | 835.1260 us | - | - | - | 968 B |
| ZLinqEnum | 10000000 | 100 | 271,718.600 us | 100,678.2548 us | 5,518.5149 us | - | - | - | 664 B |
| Linq | 10000000 | 1000 | 60,983.478 us | 4,386.5359 us | 240.4408 us | 444.4444 | 444.4444 | 444.4444 | 80000423 B |
| ZLinq | 10000000 | 1000 | 278,373.350 us | 14,932.8319 us | 818.5189 us | - | - | - | 4568 B |
| ZLinqEnum | 10000000 | 1000 | 271,367.583 us | 15,597.0636 us | 854.9277 us | - | - | - | 664 B |
* Since the processing speed was very slow anyway, here are the results of adding .NET 9 back to the target platforms, restarting, rebuilding, and executing the following.
Update: This has been addressed in the latest version of ZLinq. See the comments section of this post for details.
// This method will only work with ZLinq, but it's unavoidable.
[Benchmark]
public int ZLinqDedicated()
{
var ranking = arr.AsValueEnumerable().OrderDescending().Take(M);
int i = 0;
foreach (var item in ranking)
{
i++;
}
return i;
}
I feel like I might have made a mistake somewhere, but the execution speed results showed that standard LINQ was faster.
(Is it the Intel CPU? However, since ReadMeBenchmark is faster with ZLinq, it's possible that the combination of LINQ methods in the test code is simply slow.)
In terms of memory efficiency, it exceeds ZLinqEnumerable. Regarding processing speed, they are almost the same, so it seems that method calls via interfaces—which are often said to be slow—had almost no impact.
| Method | N | M | Mean | Error | StdDev | Gen0 | Allocated |
|---|---|---|---|---|---|---|---|
| ZLinqDedicated | 1000 | 100 | 10.89 us | 2.980 us | 0.163 us | 0.0153 | 144 B |
| ZLinqDedicated | 1000 | 1000 | 12.53 us | 5.929 us | 0.325 us | 0.0153 | 144 B |
| ZLinqDedicated | 10000 | 100 | 148.66 us | 40.993 us | 2.247 us | - | 144 B |
| ZLinqDedicated | 10000 | 1000 | 152.31 us | 34.979 us | 1.917 us | - | 144 B |
| ZLinqDedicated | 10000000 | 100 | 268,417.80 us | 29,667.028 us | 1,626.150 us | - | 512 B |
| ZLinqDedicated | 10000000 | 1000 | 271,838.93 us | 18,617.717 us | 1,020.500 us | - | 512 B |
Postscript: Classification Table for Standard LINQ
- It seems only a Visual Basic version exists.
Conclusion
Without an implementation of IEnumerable<T>, ZLinq is essentially just a set of array-generating methods. Furthermore (assuming there were no errors in the ZLinqDedicated() benchmark), it implies that while ref struct contributes to memory reduction, it provides no benefit in terms of processing speed. Therefore, intentionally sticking with .NET 8 might be a viable option.
(Ideally, I'd like a ZLINQ_DISABLE_ALLOWS_REF_STRUCT preprocessor symbol.)
The Fundamental Problem
The root cause is that GetEnumerator() for IEnumerable<T> (which requires casting to an interface) cannot be implemented due to the impact of allows ref struct. Perhaps there's a way to do something hacky using InterceptsLocation?
Making it Readonly
Perhaps the processing speed would improve if the struct were made readonly by making current a reference type, as shown below?
That's all. Thank you for your time.
Discussion
集計方法って Aggregate だとダメなんでしたっけ……?
追伸: あー
ConsumeRankingの引数がIEnumerable<T>だからか。それなら Aggregate の引数パターンで共通関数にすれば良さそう……?集計と書いてますが集計その物に興味はなくて、例えば
UploadToCloudDatabase(IEnumerable<T>)みたいな API が内部で使う、外部由来の API に対して配列の詰め替えなしで渡したいって感じです。ZLinq を最大限活用するために API 側を合わせられる環境なら、おっしゃる通りどうとでもなるかと!
色々検証ありがとうございます!
特にOrderDescendingの性能低下が引っかかったので調べてみたのですが、これは恐らくTakeが原因かな、と。
Takeが原因といってもTakeが遅いわけではなくて、SystemLinqの場合は
Orderの後にTakeが入るとSkipTakeOrderedIteratorというものが生成されて、呼び出し階層が一段階減ります。
やってるのはそれだけのことで凄い特殊な最適化が入っているわけではなさそうなのですが、
この呼び出し階層が一段回減る(親へのMoveNext呼び出しが一個減る)というのが、
些細に見えてマイクロベンチマークを取るとかなり差が現れてしまうのが諸々の検証で分かっているので、そのせいかな、と。
もう一つはTakeの個数で、Nに比べてMが少ないときに上記と相まって差が出てくるっぽいですね(最後のN, Mのグラフでも同数だと差がない)
まぁ、言い訳はともあれ、じゃあ同じようにSkipTakeOrderedIterator的なもの作ればいいじゃんというのも解なのですが
コピペでばーっと作る分にはそんな難しい話でもないのですが、
OrderByのコード量が多いことと相まって、ちょっと食い合わせの悪さがあって、やや躊躇うところがあります……。
追記:
と思ったんですが、全体的に別に↑で全く説明ついてないし、差が大きすぎるので、ちゃんと見直したら
SystemLinqのSkipTakeOrderedIteratorはtakeの量によってソート量を削減するロジックが入ってますね(PartialQuickSort)。
このロジックの有無で、NとMの差が開いているときにLinqとZLinqで、ソート省いている量の差がが広がって性能差になっているようでした……!
というわけでこれもちゃんとできるように出直してきます、ありがとうございます!!!
お返事遅くなりました!
私もちょうど処理速度を調べていたら、標準 Linq は Take の接続なしだと ZLinq と同等に遅くなりました。サンプルで書くときにぱっと思いつく組み合わせなので、C# 歴史を感じさせる最適化ですねーー。
OrderBy は Take の接続の有る無しに関わらず、maxCount(デフォルト値 int.MaxValue)とか持たせるのもアリかもですね。
遅くはなりますがコードのコピペ削減は出来そうです。※ Take は接続時前のオペレーターの maxCount を設定するだけ。
v1.1でこちら対応しました、ありがとうございます……!
なお、ref structに関してですが、これはSpan対応のために必要、以外の理由はないです!
実際、Spanで動くということよりもAsEnumerableできるほうが利便性は圧倒的に上ですね。
とはいえ(マーケティング的な意味(?))でSpan対応のほうが必須度合いが高かったので
そこに関してはしょーがない、という判断を取りました。
ref struct的なものを抱えていたら全体がref structになり、そうでなければstructとなる。
みたいな言語機能がつけば最高なのですが、まぁさすがにそんなものが来ることはないでしょうねぇ。
実装速い!
(釈迦に説法ですが、Shuffle とかも同様の最適化できそうな?)