🎃

【PHP】拡張書記素クラスターを含む文字列を配列に変換する

2023/07/07に公開

intl 関数、正規表現を使った処理方法は次の通り。

$str = "竈門禰\u{E0100}豆子";

var_dump(
    5 === count(grapheme_to_array($str)),
    5 === count(grapheme_to_array2($str)),
    5 === count(grapheme_to_array3($str))
);

function grapheme_to_array(string $str): array
{
    $length = grapheme_strlen($str);
    $ret = [];

    for ($i = 0; $i < $length; ++$i) {
        $ret[] = grapheme_substr($str, $i, 1);
    }

    return $ret;
}

function grapheme_to_array2(string $str): array
{
    // https://stackoverflow.com/a/55783469/531320
    return preg_split('/\X\K/u', $str, 0, PREG_SPLIT_NO_EMPTY);
}

function grapheme_to_array3(string $str): array
{
    // https://stackoverflow.com/a/55783469/531320
    return mb_split('\X\K(?!$)', $str);
}

速度の計測は次の通り。

$timer = timer([
    'grapheme_substr' => function() use($str) { grapheme_to_array($str); },
    'preg_split' => function() use($str) { grapheme_to_array2($str); },
    'mb_split' => function() use($str) { grapheme_to_array3($str); },
]);

foreach ($timer as $message => $time) {
    echo $message, PHP_EOL, $time, PHP_EOL;
}
function timer(array $callables, int $repeat = 100000): array
{

    $ret = [];
    $save = $repeat;

    foreach ($callables as $key => $callable) {

        $start = hrtime(true);

        do {      
            $callable();
        } while($repeat -= 1);

        $stop = hrtime(true);
        $ret[$key] = $stop - $start;
        $repeat = $save;
    }

    return $ret;
}

結果は次の通り。

> php test.php
grapheme_substr
580686100
preg_split
76916300
mb_split
271907400

Discussion