InlineIfLambdaを用いて効率の良いコンピュテーション式ビルダを作成する
はじめに
F#で,task
コンピュテーション式が実装されました.このtask{}
を実装するために,効率的な状態機械を生成するためのFS-1087が実装されました.それ以外にも,InlineIfLambda属性(FS-1098)が新しく登場し,task{}
のような状態機械が必要なコンピュテーション式だけでなく,状態機械が不要なコンピュテーション式も効率よく実装できるようになりました.
今回は,Option
コンピュテーション式のビルダをInlineIfLambda
を用いた方法とそうでない方法の2通りで実装し,パフォーマンスを測ってみます.
InlineIfLambda
を用いないOption
コンピュテーション式
ぜくるさんの記事「Applicative Computation Expressions」のようにコンピュテーション式で実行する処理を書いてみます.
type OldOptionBuilder () =
member inline __.Bind<'T, 'T2>(x : 'T voption, f : 'T -> 'T2 voption) : 'T2 voption =
ValueOption.bind f x // let!
member inline __.BindReturn<'T, 'T2>(x : 'T voption, f : 'T -> 'T2) : 'T2 voption =
ValueOption.map f x // let! .. return
member inline __.Return<'T>(x : 'T) : 'T voption =
ValueSome x // return
member inline __.MergeSources<'T, 'T2>(x1 : 'T voption, x2 : 'T2 voption) : struct('T * 'T2) voption =
ValueOption.map2 (fun x y -> struct(x, y)) x1 x2 // and!
これを用いて,
let! v1 = ValueSome 8
let! v2 = ValueSome 4
and! v3 = ValueSome 2
return v1 + v2 + v3
という式をコンパイルしてみると
ValueOption.bind (fun v1 -> ValueOption.map (fun v2 v3 -> v1 + v2 + v3) (ValueOption.map2 (fun v2 v3 -> struct(v2, v3)) (ValueSome 4) (ValueSome 2))) (ValueSome 8)
のようなコードが生成されます.
コードを整形しているので,アロケーションやメソッド呼び出しが無さそう(実行時にインライン展開されそう)ですが,v1
についてクロージャが生成されてしまうように,アロケーションやメソッド呼び出しがけっこう起こります.
InlineIfLambda
を用いたOption
コンピュテーション式
type OptionCode<'T> = unit -> 'T voption
type NewOptionBuilder () =
member inline __.Run<'T>([<InlineIfLambda>]code : 'T OptionCode) : 'T voption = code ()
member inline __.Bind<'T, 'T2>(x : 'T voption, [<InlineIfLambda>] code : 'T -> 'T2 OptionCode) : 'T2 OptionCode =
(fun () -> match x with
| ValueNone -> ValueNone
| ValueSome v -> (code v) ()) // let!
member inline __.BindReturn<'T, 'T2>(x : 'T voption, [<InlineIfLambda>] f : 'T -> 'T2) : 'T2 OptionCode =
(fun () -> match x with
| ValueNone -> ValueNone
| ValueSome v -> ValueSome (f v)) // let! .. return
member inline __.Return<'T>(x : 'T) : 'T OptionCode =
(fun () -> ValueSome x) // return
member inline __.MergeSources<'T, 'T2>(x1 : 'T voption, x2 : 'T2 voption) : struct('T * 'T2) voption =
if x1.IsSome && x2.IsSome then ValueSome struct(x1.Value, x2.Value) else ValueNone // and!
OptionCode<'T>
は関数であり,前のバージョンでvoption
としていたところをOptionCode
に置き換えました.(MergeSources
以外)voption
でも効率の良いコードを生成しますが,F#コンパイラのテストを見ると,関数を返す形になっているので,そうしてみます.(MergeSources
はエラーになりました.)
そして,Bind
の引数code
にはInlineIfLambda
属性を付与し,関数適用のところを(code v) ()
としています.ここで注意したいのが,(code v ())
ではインライン展開されません.F#コンパイラは後者の書き方だと真面目にcode.Invoke(v, new unit())
のようなプログラムを生成しようとして,最適化しないようです.
ValueOption.map
などのFSharp.Core
にある関数を使いたくなるところですが,残念ながらこれら関数の引数にInlineIfLambda
属性がついていません.そのため,使ってしまうとインライン展開されず,効率の良いコード生成ができなくなります.残念.
OptionCode
を返すようにしたので.Run
メソッドを定義しています.このメソッドが無いと,コンピュテーション式は単にOptionCode<'T>
を返却してしまいます.アロケーションが発生してしまいますし,何より意図した動作ではありません.
この新しい方法で書くと,前の例は,次のようなILのプログラムを出力します.(意訳)
match ValueSome 8 with
| ValueSome v1 ->
match struct(ValueSome 4, ValueSome 2) with
| struct(ValueSome v2, ValueSome v3) ->
match ValueSome struct(v2, v3) with
| ValueSome struct(v2, v3) -> ValueSome (v1 + v2 + v3)
| ValueNone -> ValueNone
| _ -> ValueNone
| ValueNone -> ValueNone
ValueTuple
, ValueOption
関連以外の関数呼び出しが無くなり,ヒープへのアロケーションの無いプログラムが生成されます.
ベンチマーク
ソースコード: https://gist.github.com/GnicoJP/a53479b7dbadf739b8ef98df6bfb6371
環境
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
AMD Ryzen 7 PRO 5850U with Radeon Graphics, 1 CPU, 16 logical and 8 physical cores
.NET SDK=6.0.401
[Host] : .NET 6.0.9 (6.0.922.41905), X64 RyuJIT DEBUG
DefaultJob : .NET 6.0.9 (6.0.922.41905), X64 RyuJIT
実行するコンピュテーション式
let mutable res = 0
for i in 1..10000000 do
let a = someOption {
let! a = if i % 1 = 0 then ValueSome 1 else ValueNone
let! b = if i % 2 = 0 then ValueSome 2 else ValueNone
let! c = if i % 3 = 0 then ValueSome 3 else ValueNone
let! d = if i % 4 = 0 then ValueSome 4 else ValueNone
let! e = if i % 5 = 0 then ValueSome 5 else ValueNone
res <- a + b + c + d + e + res
return a + b + c + d + e
}
a |> ignore
res
つまり,i
を1から10^7まで回して,1~5の約数(60の約数)であれば,res
に15が加算されます.
F#5.0で登場したand!
を用いるバージョンも用意しました.
let mutable res = 0
for i in 1..10000000 do
let a = someOption {
let! a = if i % 1 = 0 then ValueSome 1 else ValueNone
and! b = if i % 2 = 0 then ValueSome 2 else ValueNone
and! c = if i % 3 = 0 then ValueSome 3 else ValueNone
and! d = if i % 4 = 0 then ValueSome 4 else ValueNone
and! e = if i % 5 = 0 then ValueSome 5 else ValueNone
res <- a + b + c + d + e + res
return a + b + c + d + e
}
a |> ignore
res
作成したビルダ
まず,従来のビルダは次の通りです.
-
old
:let!
を用いた. -
oldAnd
:and!
を用いた.MergeSources
のみを実装した. -
oldAnd5
:and!
を用いた.MergeSources5
,Bind5Return
まで実装した.
そして,効率的なビルダは次の通りです.
-
efficient
:let!
を用いた.OptionCode<'T>
を返す実装. -
efficient2
:let!
を用いた.voption<'T>
を返す実装. -
efficientAnd
:and!
を用いた.MergeSources
,BindReturn
のみを実装した. -
efficientAnd5
:and!
を用いた.MergeSources5
,Bind5Return
まで実装した.
ベンチマーク結果
BenchmarkDotNetで計測した結果です.
Method | Mean | Error | StdDev | Gen 0 | Allocated |
---|---|---|---|---|---|
old | 312.01 ms | 6.084 ms | 10.165 ms | 112000.0000 | 940,000,592 B |
oldAnd | 440.20 ms | 8.707 ms | 12.763 ms | 28000.0000 | 240,000,504 B |
oldAnd5 | 99.64 ms | 1.719 ms | 1.688 ms | - | 111 B |
efficient | 15.67 ms | 0.203 ms | 0.170 ms | - | 8 B |
efficient2 | 14.20 ms | 0.165 ms | 0.146 ms | - | 8 B |
efficientAnd | 409.58 ms | 4.002 ms | 3.744 ms | - | 480 B |
efficientAnd5 | 28.05 ms | 0.558 ms | 0.705 ms | - | 15 B |
まず,すこしだけInlineIfLambda
を使わない場合のベンチマーク結果を見比べてみます.
old
とoldAnd
を比べると,MergeSources5
,Bind5Return
を実装していないにもかかわらず,アロケーションを3分の1近くにまで削減しています.しかし,実行時間は伸びてしまいました.これはValueTuple
の生成をするようになって,クロージャの生成が減った代わりに,ValueTuple
のコンストラクタの実行で時間を消費してしまったことが理由かもしれません.(プロファイラ見たけどよくわからなかった)
しかし,oldAnd5
では,Bind5Return
の1回で済むので,非常に少ないアロケーションかつ短い時間で処理を終えることができました.(というかほとんどインライン展開されていた)
次に,InlineIfLambda
を使ったビルダでは,old
に比べて,アロケーション・実行時間ともに非常に少なくなりました.old
では1GB近くアロケーションが起こっていたのに対し,efficient
ではほとんどありません.実行時間も20分の1と非常に少なくなりました.
そして,efficient
とefficient2
では実行時間・アロケーションともに差がありませんでした.デコンパイルしてみるとほぼ同じILのプログラムが出力されていることが確認できました.
efficientAnd
ではefficient
よりも遅くなってしまいました.なぜかというと,ValueTuple
の生成で時間を消費してしまったからです.デコンパイル結果がすごいことになっていました.
(int, (int, (int, (int, int)))) item = fSharpValueOption10.Item;
(int, (int, (int, (int, int)))) tuple = item;
(int, (int, (int, (int, int)))) tuple2 = tuple;
(int, (int, (int, int))) item2 = tuple2.Item2;
(int, (int, int)) item3 = item2.Item2;
(int, int) item4 = item3.Item2;
int e = item4.Item2;
(int, (int, (int, int))) item5 = tuple2.Item2;
(int, (int, int)) item6 = item5.Item2;
(int, int) item7 = item6.Item2;
int d = item7.Item1;
(int, (int, (int, int))) item8 = tuple2.Item2;
(int, (int, int)) item9 = item8.Item2;
int c = item9.Item1;
(int, (int, (int, int))) item10 = tuple2.Item2;
int b = item10.Item1;
int a2 = tuple2.Item1;
res = a2 + b + c + d + e + res;
fSharpValueOption11 = FSharpValueOption<int>.NewValueSome(a2 + b + c + d + e);
しかし,efficientAnd5
では,非常にスリムな形で出力されており,実行時間が10分の1以上となりました.ただ,and!
を使わない実装よりはやっぱり遅くなってしまっています.ValueTuple
の生成に時間がかかってしまい,遅くなったのだと考えられます.(プロファイラ見てもわからない)
インライン展開されなかった時とは違い,なんでもかんでもand!
で実装すればよいのではなく,bind
が時間がかかる場合にのみ実装するという使い分けが重要なのかもしれません.(TODO: あるいはValueTuple
ではなくてTuple
を使った方が良い?)
結び
InlineIfLambda
を用いて非常に効率良くコンピュテーション式を使えるようになりました.ただ,効率の良いプログラムを生成するにはビルダを慎重に作る必要がありそうです.
English Version
Title: Create an efficient computation expression builder using InlineIfLambda
Sorry if I misuse English.
Introduction
task
computation expression was implemented in F# 6.0. To implement task{}
, FS-1087 was impelemented for efficient state machines. Other than that, InlineIfLambda Attribute(FS-1098) was also introduced so we can implement efficient computation expressions even if state machines are not needed.
In this article, I implement the builders for the Option
computation expression with and without InlineIfLambda
and compare their performance.
Option
computation expression without InlineIfLambda
Let's begin to write the Option
computation expression without InlineIfLambda
like zecl's article "Applicative Computation Expressions".
type OldOptionBuilder () =
member inline __.Bind<'T, 'T2>(x : 'T voption, f : 'T -> 'T2 voption) : 'T2 voption =
ValueOption.bind f x // let!
member inline __.BindReturn<'T, 'T2>(x : 'T voption, f : 'T -> 'T2) : 'T2 voption =
ValueOption.map f x // let! .. return
member inline __.Return<'T>(x : 'T) : 'T voption =
ValueSome x // return
member inline __.MergeSources<'T, 'T2>(x1 : 'T voption, x2 : 'T2 voption) : struct('T * 'T2) voption =
ValueOption.map2 (fun x y -> struct(x, y)) x1 x2 // and!
Using this, I compiled an expression below.
let! v1 = ValueSome 8
let! v2 = ValueSome 4
and! v3 = ValueSome 2
return v1 + v2 + v3
Then the F# compiler generates an IL program like this.
ValueOption.bind (fun v1 -> ValueOption.map (fun v2 v3 -> v1 + v2 + v3) (ValueOption.map2 (fun v2 v3 -> struct(v2, v3)) (ValueSome 4) (ValueSome 2))) (ValueSome 8)
Now that I have refactored this code, so it looks that there are no allocations or method callings(It seems to be applied inline expansion), but actually there are many allocations or method callings such as a closure for v1
.
Option
computation expression with InlineIfLambda
type OptionCode<'T> = unit -> 'T voption
type NewOptionBuilder () =
member inline __.Run<'T>([<InlineIfLambda>]code : 'T OptionCode) : 'T voption = code ()
member inline __.Bind<'T, 'T2>(x : 'T voption, [<InlineIfLambda>] code : 'T -> 'T2 OptionCode) : 'T2 OptionCode =
(fun () -> match x with
| ValueNone -> ValueNone
| ValueSome v -> (code v) ()) // let!
member inline __.BindReturn<'T, 'T2>(x : 'T voption, [<InlineIfLambda>] f : 'T -> 'T2) : 'T2 OptionCode =
(fun () -> match x with
| ValueNone -> ValueNone
| ValueSome v -> ValueSome (f v)) // let! .. return
member inline __.Return<'T>(x : 'T) : 'T OptionCode =
(fun () -> ValueSome x) // return
member inline __.MergeSources<'T, 'T2>(x1 : 'T voption, x2 : 'T2 voption) : struct('T * 'T2) voption =
if x1.IsSome && x2.IsSome then ValueSome struct(x1.Value, x2.Value) else ValueNone // and!
OptionCode<'T>
is a function type, I replaced voption
with OptionCode
from the previous version. (Except for MergeSorces
) Even if voption
can produce efficient programs, but, in test of F# compiler returns functions, so I follow. (MergeSources
throws a compiler error.)
Then I added InlineIfLambda
attribute to the argument code
in Bind
function, and I wrote (code v) ()
in the application. Note that (code v ())
cannot be inline-expanded. The F# compiler does not optimize it because the F# compiler produces a honest program like code.Invoke(v, new unit())
.
We want to use the functions in FSharp.Core
such as ValueOption.map
, but unfortunately these functions do not have InlineIfLambda
attribute. So that, if you use, the inline expansion does not perform and you cannot produce effecient programs...
I make it return OptionCode
, so I defined the Run
method. Without this method, the computation expression solely returns OptionCode<'T>
, it makes allocations and it is not an expected behaviour.
If you use this new way, the previous example will be below IL program. (free translation)
match ValueSome 8 with
| ValueSome v1 ->
match struct(ValueSome 4, ValueSome 2) with
| struct(ValueSome v2, ValueSome v3) ->
match ValueSome struct(v2, v3) with
| ValueSome struct(v2, v3) -> ValueSome (v1 + v2 + v3)
| ValueNone -> ValueNone
| _ -> ValueNone
| ValueNone -> ValueNone
There are not function callings except ValueTuple
or ValueOption
related, and (or few) heap allocations.
Benchmark
Source Code : https://gist.github.com/GnicoJP/a53479b7dbadf739b8ef98df6bfb6371
Environment
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
AMD Ryzen 7 PRO 5850U with Radeon Graphics, 1 CPU, 16 logical and 8 physical cores
.NET SDK=6.0.401
[Host] : .NET 6.0.9 (6.0.922.41905), X64 RyuJIT DEBUG
DefaultJob : .NET 6.0.9 (6.0.922.41905), X64 RyuJIT
A computation expression to be executed
let mutable res = 0
for i in 1..10000000 do
let a = someOption {
let! a = if i % 1 = 0 then ValueSome 1 else ValueNone
let! b = if i % 2 = 0 then ValueSome 2 else ValueNone
let! c = if i % 3 = 0 then ValueSome 3 else ValueNone
let! d = if i % 4 = 0 then ValueSome 4 else ValueNone
let! e = if i % 5 = 0 then ValueSome 5 else ValueNone
res <- a + b + c + d + e + res
return a + b + c + d + e
}
a |> ignore
res
Repeat i
from 1 to 10^7, then if i
can be dividable by 1 ~ 5(or 60), then res
will be added by 15.
I prepared an and!
version, and!
appeared in F# 5.0.
let mutable res = 0
for i in 1..10000000 do
let a = someOption {
let! a = if i % 1 = 0 then ValueSome 1 else ValueNone
and! b = if i % 2 = 0 then ValueSome 2 else ValueNone
and! c = if i % 3 = 0 then ValueSome 3 else ValueNone
and! d = if i % 4 = 0 then ValueSome 4 else ValueNone
and! e = if i % 5 = 0 then ValueSome 5 else ValueNone
res <- a + b + c + d + e + res
return a + b + c + d + e
}
a |> ignore
res
Builders
First, previous builders are:
-
old
: I usedlet!
. -
oldAnd
: I usedand!
and implemented onlyMergeSources
. -
oldAnd5
: I usedand!
and implementedMergeSources5
,Bind5Return
.
Second, efficient builders are:
-
efficient
: I usedlet!
. It returnsOptionCode<'T>
. -
efficient2
: I usedlet!
. It returnsvoption<'T>
. -
efficientAnd
: I usedand!
and implemented onlyMergeSources
. -
efficientAnd5
: I usedand!
and implementedMergeSources5
,Bind5Return
.
The result
I measured using BenchmarkDotNet.
Method | Mean | Error | StdDev | Gen 0 | Allocated |
---|---|---|---|---|---|
old | 312.01 ms | 6.084 ms | 10.165 ms | 112000.0000 | 940,000,592 B |
oldAnd | 440.20 ms | 8.707 ms | 12.763 ms | 28000.0000 | 240,000,504 B |
oldAnd5 | 99.64 ms | 1.719 ms | 1.688 ms | - | 111 B |
efficient | 15.67 ms | 0.203 ms | 0.170 ms | - | 8 B |
efficient2 | 14.20 ms | 0.165 ms | 0.146 ms | - | 8 B |
efficientAnd | 409.58 ms | 4.002 ms | 3.744 ms | - | 480 B |
efficientAnd5 | 28.05 ms | 0.558 ms | 0.705 ms | - | 15 B |
Let's compare the benchmark results with and without InlineIfLambda
.
Comparing old
and oldAnd
, oldAnd
's allocation is reduced, it is one third of old
's even if it is not implemented MergeSources5
,Bind5Return
, but its wallclock time gets longer. I think, it creates ValueTuple
instead of closures, so it may consume time by ValueTuple
's constructors. (I did not catch the meaning from the profiler...)
However oldAnd5
calls only Bind5Return
, so only very low allocations and short execution time are needed. (Almost inline-expanded.)
Next, the builders with InlineIfLambda
(efficient
, efficient2
) have lower allocation and shorter execution time than old
. In old
, it requires almost 1GB allocation, but efficient
does not. Its wallclock time is also a twentieth as long as old
's. It found that efficient
and efficient2
have no difference on execution time and allocation, because they almost the same IL program.
efficientAnd
is slower than efficient
because ValueTuple
consumes a lot of time. Its decompiled code is very complex:
(int, (int, (int, (int, int)))) item = fSharpValueOption10.Item;
(int, (int, (int, (int, int)))) tuple = item;
(int, (int, (int, (int, int)))) tuple2 = tuple;
(int, (int, (int, int))) item2 = tuple2.Item2;
(int, (int, int)) item3 = item2.Item2;
(int, int) item4 = item3.Item2;
int e = item4.Item2;
(int, (int, (int, int))) item5 = tuple2.Item2;
(int, (int, int)) item6 = item5.Item2;
(int, int) item7 = item6.Item2;
int d = item7.Item1;
(int, (int, (int, int))) item8 = tuple2.Item2;
(int, (int, int)) item9 = item8.Item2;
int c = item9.Item1;
(int, (int, (int, int))) item10 = tuple2.Item2;
int b = item10.Item1;
int a2 = tuple2.Item1;
res = a2 + b + c + d + e + res;
fSharpValueOption11 = FSharpValueOption<int>.NewValueSome(a2 + b + c + d + e);
However efficientAnd5
generates a very smart code and a tenth as long as old
. But, it is slower than efficient
, it may takes time on constructions of ValueTuple
.(I did not catch the meaning from the profiler...)
It may important to think about whether implement and!
only if bind
takes for a long time, unlike no-inline-expanded computation expression. (TODO: Or, prefer Tuple
to ValueTuple
?)
Conclusion
We can use computation expressions efficiently with InlineIfLambda
. However we must design carefully to generate efficient programs.
Discussion