📝

HaxeでUnicode文字列を扱う

2021/08/17に公開

haxe

tech

Haxe 4.0から追加された UnicodeString を使えばいい。

検証環境

Haxe 4.2.3 (Try Haxeを使用）

文字数のカウント

コード

function main() {
  trace("𩸽".length);
  trace(new UnicodeString("𩸽").length);
}

出力結果

2
1

ただし、JSコンパイル後のコードはこういう感じになるので、なんどもlengthを参照する場合は一旦変数に入れた方がいい。そもそもJSにはcodepointでカウントする方法がないので、仕方ない。

class UnicodeString {
	static get_length(this1) {
		let l = 0;
		let _g_offset = 0;
		while(_g_offset < this1.length) {
			let index = _g_offset++;
			let c = this1.charCodeAt(index);
			if(c >= 55296 && c <= 56319) {
				c = c - 55232 << 10 | this1.charCodeAt(index + 1) & 1023;
			}
			if(c >= 65536) {
				++_g_offset;
			}
			++l;
		}
		return l;
	}
}

1文字ずつループする

UnicodeStringをiteratorで回せばよいが、文字コードを返してくるので、String.fromCharCode()で変換する必要がある。

コード

function main() {
  for (code in new UnicodeString("abc𩸽⚡🐐")) {
    trace(String.fromCharCode(code));
  }
}

出力結果

a
b
c
𩸽
⚡
🐐

その他の操作

indexOf charAt substr などの文字列としての基本的な操作が可能。

https://api.haxe.org/UnicodeString.html

UnicodeStringの実体

abstract UnicodeString(String) from String to String という定義なので、実体はStringである。String / UnicodeStringの変換にオーバーヘッドはないし、暗黙的に型変換もされる。

前述のコードは便宜的にnew UnicodeString()の形式で書いたが、別に型変換するようなコードで書いても違いはない。

final str1:UnicodeString = "foo";
final str2 = ("bar" : UnicodeString);

Try Haxe

Try Haxeを使えばWeb上で動作確認ができる。

https://try.haxe.org/#E80e42FB