🐞

複数のストラテジーの出現頻度について

2022/07/10に公開

Test

Property-based testing

hypothesis

tech

TL;DR

HypothesisではPropErの frequency/1 相当の複数のストラテジーから任意の確率で値を生成するような方法は提供する計画はなく、one_of を使って一様に選択することを推奨している。

詳細

たとえば次のようなストラテジーがあったとします。

@composite
def my_text(draw: DrawFn, max_size: int = 50) -> str:
    alpha = integers(min_value=ord("a"), max_value=ord("z"))
    num = integers(min_value=ord("0"), max_value=ord("9"))
    l = draw(lists(one_of(alpha, num), max_size=max_size))
    return "".join([chr(c) for c in l])

これは小文字のアルファベットと数字の最大50文字の文字列を生成するストラテジーですが、このとき alpha と num は one_of でくくられているので、アルファベットと数字は一様に選択されます。しかし、時にはアルファベットと数字を3:1という具合に確率を選択したいことがあります。

たとえばこれがQuickCheckやPropErだとストラテジー（ジェネレーター）の選択確率を調整するような関数があります。上のストラテジーは、次のPropErのジェネレーターの例をポートしてみようとしたもの。

text_like() ->
    list(frequency([{80, range($a, $z)},              % letters
                    {10, $\s},                        % whitespace
                    {1,  $\n},                        % linebreak
                    {1, oneof([$., $-, $!, $?, $,])}, % punctuation
                    {1, range($0, $9)}                % numbers
                   ])).

frequency/1相当のものをHypothesis本体で探してみたけれども、見当たらないのでStack Overflowで検索してみると、どうもそもそもの思想や方針として導入しない様子。

実際に上の質問のなかでメインメンテナーのZac氏は「Hypothesisはユニットテスト規模でのPBTを提供することを考えていて、数日かけて行うようなファジングを実行する場合にはそのような機能があったほうがいいと思うが、いまの意図では one_of を使って生成空間を広く取るほうが良いと思っている」とのこと。回答が数年前のものなので、いまもこの方針が変わっていないか確認してみた。

同じくZac氏がこのあたりはコアストラテジーなので、非常に安定していて、方針が変わることはないということでした。このあたりの思想というのは、そもそものツールの設計に関わってくる部分なので知れてよかったと思います。

This is a very stable core API, so no changes - st.one_of() is still the recommended way to take the union of strategies, and there are (still) no options to control the probabilities. There are a couple of reasons for that second point:

Probability options make the API larger and harder to learn, which is a worse user experience.

They tend to focus users on constructing particular kinds of data, rather than thinking about a whole domain of inputs.

Bugs tend to hide precisely in the places you didn't think about testing, so we think that Hypothesis (perhaps augmented with HypoFuzz and Crosshair) can actually outperform almost all users at finding error-inducing inputs.

TL;DR

詳細

Discussion