🎃
【PHP】拡張書記素クラスターを含む文字列を配列に変換する
intl 関数、正規表現を使った処理方法は次の通り。
$str = "竈門禰\u{E0100}豆子";
var_dump(
5 === count(grapheme_to_array($str)),
5 === count(grapheme_to_array2($str)),
5 === count(grapheme_to_array3($str))
);
function grapheme_to_array(string $str): array
{
$length = grapheme_strlen($str);
$ret = [];
for ($i = 0; $i < $length; ++$i) {
$ret[] = grapheme_substr($str, $i, 1);
}
return $ret;
}
function grapheme_to_array2(string $str): array
{
// https://stackoverflow.com/a/55783469/531320
return preg_split('/\X\K/u', $str, 0, PREG_SPLIT_NO_EMPTY);
}
function grapheme_to_array3(string $str): array
{
// https://stackoverflow.com/a/55783469/531320
return mb_split('\X\K(?!$)', $str);
}
速度の計測は次の通り。
$timer = timer([
'grapheme_substr' => function() use($str) { grapheme_to_array($str); },
'preg_split' => function() use($str) { grapheme_to_array2($str); },
'mb_split' => function() use($str) { grapheme_to_array3($str); },
]);
foreach ($timer as $message => $time) {
echo $message, PHP_EOL, $time, PHP_EOL;
}
function timer(array $callables, int $repeat = 100000): array
{
$ret = [];
$save = $repeat;
foreach ($callables as $key => $callable) {
$start = hrtime(true);
do {
$callable();
} while($repeat -= 1);
$stop = hrtime(true);
$ret[$key] = $stop - $start;
$repeat = $save;
}
return $ret;
}
結果は次の通り。
> php test.php
grapheme_substr
580686100
preg_split
76916300
mb_split
271907400
Discussion