🤯

ArkRegexを使った正規表現の型推論

に公開

TypeScriptのバリデータの1つであるArkTypeから、ArkRegexなるものが飛び出てきた。

https://x.com/arktypeio/status/1983210635266498649

https://arktype.io/docs/blog/arkregex

雑にいうと

  • regex(...) ラッパーが new RegExp() と互換性を保ちつつ、文字列・フラグ・名前付きキャプチャの型を推論してくれる
  • 既存のRegExp()を置き換え、正規表現を型安全にする目的
  • ランタイムのオーバーヘッドがゼロ

導入

bun install arktype arkregex

遊んでみる.

メールアドレス

import { regex } from "arkregex";
import { type } from "arktype";

// Typed email regex example used both at type level and runtime.
const emailRegex = regex(
  "^(?<local>[\\w.+-]+)@(?<domain>[\\w-]+)\\.(?<tld>[a-z]{2,})$",
  "i",
);

// inferを使う時点で `${string}@${string}.${string}` に型推論ができている
type EmailAddress = typeof emailRegex.infer;

const ContactForm = type({
  email: emailRegex,
  message: "string",
});

const submission = ContactForm({
  email: "support@example.com",
  message: "ArkRegex keeps my regexes honest!",
});

if (submission instanceof type.errors) {
  console.error(submission.summary);
} else {
  console.log(`Valid email received for ${submission.email as EmailAddress}`);

  const match = emailRegex.exec(submission.email);

  if (match?.groups) {
    const { local, domain, tld } = match.groups;
    console.log(`Parsed: local=${local}, domain=${domain}, tld=${tld}`);
  }
}
// 結果
Valid email received for support@example.com
Parsed: local=support, domain=example, tld=com

すごいし面白い
さらに遊ぶ

FeatureFlag

// "prod.checkout.enable-new-ui" のような identifierを想定
const featureFlagId = regex(
  "^(?<environment>prod|staging|dev)\\.(?<service>[a-z][\\w-]{2,})\\.(?<flag>[a-z][\\w-]{2,})$",
);

// type FeatureFlagId = `staging.${string}.${string}` | `dev.${string}.${string}` | `prod.${string}.${string}`
type FeatureFlagId = typeof featureFlagId.infer;
export const FeatureFlag = type({
  id: featureFlagId,
  cohorts: "string[]",
});

export function parseFeatureFlag(id: string) {
  if (!featureFlagId.test(id)) {
    return null;
  }

  // Narrowed to FeatureFlagId inside the branch.
  const match = featureFlagId.exec(id);

  if (!match?.groups) {
    return null;
  }

  const { environment, service, flag } = match.groups;

  return { environment, service, flag } as const;
}

ISO Date

// Demonstrates using regex.as when inference would otherwise blow up.
const isoDate = regex.as<
  `${string}-${string}-${string}`,
  {
    captures: [string, string, string];
    names: { year: string; month: string; day: string };
  }
>("^(?<year>\\d{4})-(?<month>0[1-9]|1[0-2])-(?<day>0[1-9]|[12]\\d|3[01])$");

type IsoDate = typeof isoDate.infer;

export function ensureIsoDate(input: string): IsoDate | null {
  const match = isoDate.exec(input);

  if (!match) {
    return null;
  }

  const [, year, month, day] = match;

  return `${year}-${month}-${day}`;
}

日本の電話番号正規表現を型推論してみる

https://akinov.hatenablog.com/entry/2017/05/31/194421

上記の記事を参考にしてみた

import { regex } from "arkregex";
import { type } from "arktype";

const japanesePhonePattern = [
  String.raw`^(?:`,
  String.raw`(?<tollFree>0120-?\d{3}-?\d{3})`,
  String.raw`|(?<mobile>0(?:50|70|80|90)-?\d{4}-?\d{4})`,
  String.raw`|(?<geographic>0(?:\d-?\d{4}|\d{2}-?\d{3}|\d{3}-?\d{2}|\d{4}-?\d)-?\d{4})`,
  String.raw`|(?<subscriber>[1-9]\d{0,3}-?\d{4})`,
  String.raw`)$`,
].join("");

// symbol化
declare const japanesePhoneBrand: unique symbol;

export type JapanesePhone = string & { readonly [japanesePhoneBrand]: true };

const japanesePhone = regex.as<
  JapanesePhone,
  {
    names: {
      tollFree: string | undefined;
      mobile: string | undefined;
      geographic: string | undefined;
      subscriber: string | undefined;
    };
  }
>(japanesePhonePattern);

type JapanesePhoneGroups = typeof japanesePhone.inferNamedCaptures;
type JapanesePhoneKind = keyof JapanesePhoneGroups;

type ClassifiedPhone = {
  kind: JapanesePhoneKind;
  normalized: string;
  raw: JapanesePhone;
};

export const JapaneseContact = type({
  name: "string",
  phone: japanesePhone,
});

export function classifyJapanesePhone(phone: JapanesePhone): ClassifiedPhone {
  const match = japanesePhone.exec(phone);

  if (!match?.groups) {
    throw new Error(
      "Input already satisfied regex, exec() should never be null.",
    );
  }

  const { tollFree, mobile, geographic, subscriber } = match.groups;
  const normalized = phone.replace(/\D/g, "");

  if (tollFree) {
    return { kind: "tollFree", normalized, raw: phone };
  }

  if (mobile) {
    return { kind: "mobile", normalized, raw: phone };
  }

  if (geographic) {
    return { kind: "geographic", normalized, raw: phone };
  }

  return { kind: "subscriber", normalized, raw: phone };
}

const sampleNumbers = [
  "03-1234-5678",
  "075-123-4567",
  "045-123-4567",
  "090-1234-5678",
  "050-1234-5678",
  "020-1234-5678",
  "0120-123-456",
  "01564-2-3456",
  "1234-5678",
  "03-5678",
  "000-0000-0000",
] as const;

for (const phone of sampleNumbers) {
  if (!japanesePhone.test(phone)) {
    console.log(`❌ Invalid: ${phone}`);
    continue;
  }

  // classifyJapanesePhone("03-1234-5678") はsymbol化によりコンパイルエラーになる
  // 必ず正規表現testを通過していること
  const classification = classifyJapanesePhone(phone);
  console.log(
    `${classification.raw}${classification.kind} (normalized: ${classification.normalized})`,
  );
}

実行結果

bun run japanese-phone.ts 
✅ 03-1234-5678 → geographic (normalized: 0312345678)
✅ 075-123-4567 → geographic (normalized: 0751234567)
✅ 045-123-4567 → geographic (normalized: 0451234567)
✅ 090-1234-5678 → mobile (normalized: 09012345678)
✅ 050-1234-5678 → mobile (normalized: 05012345678)
❌ Invalid: 020-1234-5678
✅ 0120-123-456 → tollFree (normalized: 0120123456)
✅ 01564-2-3456 → geographic (normalized: 0156423456)
✅ 1234-5678 → subscriber (normalized: 12345678)
❌ Invalid: 03-5678
❌ Invalid: 000-0000-0000

ざっとドキュメント見て書いた記事なのでツッコミあったらよろしくです。

Discussion