Open14

WebGPUを使ってみる

bellbindbellbind

WebGPU仕様

WebGPUはブラウザに搭載される、GPU(とHTML canvas要素)を使用するためのJavaScript API。
WGSL(WebGPU Shader Language)は、WebGPU APIで直接渡せる標準のシェーダー言語仕様。

現状活発に仕様更新がなされているので、各実装配布が最新仕様に追いついていないことが多いので注意。

WebGPU Shader Language

Rust風の構文を採用したWebGPU標準のシェーダープログラム記述言語。仕様上にはRustの名前は一切出てこない。

あくまで構文をRustに似せてあるだけで、Rustの意味論までは採用されていない。
たとえば現時点では、ifは文の一つであり、Rustのように三項演算子用途での右辺式で利用できない(この目的のためには、select(f, t, cond)ビルトイン関数が使える)。逆にRustにはないi++i--がWGSLには存在する。

WebMetal

もともとAppleがWebGPUとしてWebKitに実装し、提案していたのはMetalベースのAPIであり、のちにWebMetalと名前が変えられた。

その後、VulkanベースのAPIがMozillaにより提案され、現在のWebGPU APIの源流となっている。一方、スクリーン座標系はいまのWebGPUでもMetalに準じたもの(左下のxyが(-1,-1)で、zは最手前が0、最奥が1)である。

bellbindbellbind

WebGPU実装状況

bellbindbellbind

参考: ChromeのOrigin Trial

ChromeのOrigin Trialは、実験的機能を、特定Origin限定かつ期限付き で使えるようにするための仕組み。

開発者はWebサイト上で、実験的機能に応じたOrigin Trialトークンを各Web Originごとに許諾同意した上で登録する。
(Web OriginはURLのうちのプロトコル、ドメイン名、ポート番号、つまりhttp://localhost:80 までの部分)

誰でもどのOriginについてもトークンを登録できるようになっている。たとえばlocalhostやexample.com、gist.githack.comなども各人で登録して使用できる。

HTTPレスポンスヘッダもしくは、HTML metaタグで、HTMLへOrigin Trialトークンを与えることで、Chromeがその機能を有効化するようになっている。

metaタグを使用する場合では、

<meta http-equiv="origin-trial" content="200文字ほどのトークン..." />

をhead部分に追加する。

HTML metaタグやレスポンスヘッダに複数のOrigin-Trialトークンを埋め込んでも機能する。

どれか一つがヒットすれば有効化するので、たとえば、localhostとgist.githack.comの両方のmetaタグをHTMLに埋め込んでおけば、localhostで実験し、gistで公開するのにコードを変更する必要がなくなる。

Origin-Trial Tokenの更新

更新切れしたあとで、"My Registrations"タブで、切れたドメインのどれか一つを選び、ページ下部の”FEEDBACK"ボタンからアンケートに答えることですべてのドメインの有効期限が伸びた。

bellbindbellbind

WebGPU APIのコード例リンク集

WebGPUコード例の検索では、wgpu利用のRustコードであったり、WebGPUをバックエンドで使うJavaScriptライブラリ用のコード例がヒットすることも多いので注意。

bellbindbellbind

Chrome-98以降のWebGPU仕様変更と実装追従

WGSL: blockアトリビュート廃止

WGSLでbind groupで使うバッファデータの構造体に指定していたblockアトリビュートがなくなった。

  • chrome-98: blockアトリビュートがないと機能しない
  • chrome-99-101: blockアトリビュートは無視される
  • firefox-nightly-99/deno-1.19: blockアトリビュートがあるとエラー

WGSL: アトリビュート構文が変更

もともとは[[attr1, attr2(param)]]形式だったのが、@attr1 @attr2(param)形式になった。

  • chrome-98/firefox-nightly-99/deno-1.19: [[attr]]形式のみ
  • chrome-99-101: @attr形式だが、警告付きで[[attr]]形式も受け付ける

Pass Encoderのendメソッド

もともとendPass()メソッドだったが、end()メソッドに変更された。

  • chrome-98,99/firefox-nightly-99/deno-1.19: endPass()のみ
  • chrome-100,101: end()を受け付けるが、警告付きでendPass()も受け付ける

RenderPassのクリアカラーパラメータ

もともと、loadValue: {r, g, b, a}だったのを、loadOp: "clear", clearValue: {r, g, b, a}(デフォルト値はrgbsすべて0)に変わった。

  • chrome-98,99/firefox-nightly-99/deno-1.19: loadValueのみ
  • chrome-100,101: loadOp,clearValueを受け付けるが、警告付きでloadValueも受け付ける

createRenderPass()のパラメータdepthStencilAttachment中のstencilLoadOpstencilStoreOp属性の有無の必須条件

stencil領域がないフォーマット("depth32float"など)のとき、属性stencilLoadOpstencilStoreOpのありなしでエラーになる条件が変更

  • chrome-100以前: stencilLoadOpstencilStoreOpが両方ないとエラー
  • chrome-101(.0.4947.0)以降: stencilLoadOpstencilStoreOpがどちらかでもあるとエラー

stencilを使わなくてもフォーマット"depth24plus-stencil8"を使い、stencilLoadOpstencilStoreOpをつけておけば両対応できる。

ComputePassEncoderのdispathメソッド

  • chrome-102以前: dispatch(x, y, z)
  • chrome-103以降: dispatchWorkgroups(x, y, z)

WGSL: structメンバーの区切り

  • chrome-101以前: ;
  • chrome-102以降: ,
struct Out {
  @builtin(position) pos: vec4<f32>,
  @location(0) uv: vec2<f32>,
};

WebGPU Canvas: compositingAlphaModeのデフォルト値

  • chrome-101以前: "premultiplied"
  • chrome-102以降: "opaque"

クリアカラーのアルファ値が透過のとき違いが出る?

const gpu = canvas.getContext("webgpu");
const format = gpu.getPreferredFormat(adapter);
const size = [canvas.width, canvas.height]
gpu.configure({device, format, size, compositingAlphaMode: "premultiplied"});

WGSL: stageアトリビュート構文の変更

  • 古い構文(chrome-103以前): @stage(compute) @stage(vertex) @stage(fragment)
  • firefox-nightly-101: @compute @vertex @fragment

WGSL: override定数

モジュールスコープの定数overrideは、pipeline作成時に値を上書きできる定数。

  • どの実装も未実装
bellbindbellbind

RustでのWebGPU実装: wgpuとnaga

WebGPU API実装ライブラリがwgpu。
wgpuが使うWGSLを含むシェーダ言語ライブラリがnaga。

firefoxとdenoはwgpuをバックエンドで使用している。
これらのWGSLの実装状況を調べるならnaga(src/front/wgsl)を見ると良い。

C++でのWebGPU実装: dawnとtint

chromeではWebGPUをdawnで実装し、tintでWGSLを実装している。dawnのnodejsバインディングもあるよう。

bellbindbellbind

WebGPUプログラム例: Compute編

配列内の各要素を自乗する演算例。Chrome-98/Firefox-nightly-99/deno-1.19で実行可能(WebGPU実行可能なページ上のWeb Console上でコピペして実行)。

compute-example.js
// Compute example for WebGPU API: https://www.w3.org/TR/webgpu/
// [Usage] Paste whole codes into Web Comsole of the WebGPU demo page, then output 1024 Float32Array of squares
const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice();

// WGSL shaders: https://www.w3.org/TR/WGSL/
// NOTE: attribute syntax is changed from [[attr]] to @attr at 2022/01/19; not yet supported on chrome98
// NOTE: from chrome99, [[block]] attribute should be removed
const blockAttr = navigator.userAgent?.match(/Chrome\/98/) ? "[[block]]" : "";
const workgroupSize = 64;
const computeWgsl = `
${blockAttr} struct IO {
  values: array<i32>;
};
[[binding(0), group(0)]] var<storage, read> input: IO;
[[binding(1), group(0)]] var<storage, write> output: IO;
[[stage(compute), workgroup_size(${workgroupSize})]] fn square([[builtin(global_invocation_id)]] giid: vec3<u32>) {
  output.values[giid.x] = input.values[giid.x] * input.values[giid.x];
}
`;
const computeShader = device.createShaderModule({code: computeWgsl});

// pipeline
const pipeline = device.createComputePipeline({
  compute: {module: computeShader, entryPoint: "square"},
});


// data
const count = 1024;
const input = new Int32Array([...Array(count).keys()]);

// buffers
const inputBuffer = device.createBuffer({size: input.byteLength, usage: GPUBufferUsage.STORAGE, mappedAtCreation: true});
new Int32Array(inputBuffer.getMappedRange()).set(input);
inputBuffer.unmap();
const outputBuffer = device.createBuffer({size: input.byteLength, usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC});
const readBuffer = device.createBuffer({size: input.byteLength, usage: GPUBufferUsage.MAP_READ | GPUBufferUsage.COPY_DST});

// bind group
const bindGroupLayout = pipeline.getBindGroupLayout(0);
const bindGroup = device.createBindGroup({
  layout: bindGroupLayout,
  entries: [
    {binding: 0, resource: {buffer: inputBuffer}},
    {binding: 1, resource: {buffer: outputBuffer}},
  ]
});

// command encoder
const commandEncoder = device.createCommandEncoder();
const passEncoder = commandEncoder.beginComputePass();
passEncoder.setPipeline(pipeline);
passEncoder.setBindGroup(0, bindGroup);
passEncoder.dispatch(count / workgroupSize);
passEncoder.endPass(); //[chrome100] endPass() => end()

commandEncoder.copyBufferToBuffer(outputBuffer, 0, readBuffer, 0, input.byteLength);
device.queue.submit([commandEncoder.finish()]);

// read and copy
await readBuffer.mapAsync(GPUMapMode.READ);
const output = new Int32Array(readBuffer.getMappedRange().slice());
readBuffer.unmap();
console.log(output); // 1024-result of square 0-1046529

コード中の対応関係

  • WGSL中の values: array<i32> と、データ読み書きで使う Int32Array
  • WGSL中の group(0)0と 、pipeline.getBindGroupLayout(0)0 と, passEncoder.setBindGroup(0, bindGrpup)0
  • WGSL中の stage(compute) と、 compute: {module: computeShader, entryPoint: "square"}
  • WGSL中の binding(0)0と、 device.createBindGroup() のパラメータ中のbinding: 00
  • WGSL中の workgroup_size() の引数数と、WGSL中の giid の次元数と、passEncoder.dispatch() の引数数(最大3)
  • passEncoder.dispatch()の引数値とWGSL中の workgroup_size() の引数値との積と、 outputBuffer
    の要素数
  • WGSL中のvar<storage>と、 device.createBuffer()中の usage: GPUBufferUsage.STORAGE
  • commandEncoder.copyBufferToBuffer()の第1引数と、 device.createBuffer()中の usage: GPUBufferUsage.COPY_SRC
  • commandEncoder.copyBufferToBuffer()の第3引数と、 device.createBuffer()中の usage: GPUBufferUsage.COPY_DST
  • readBuffer.mapAsync(GPUMapMode.READ)
    と、 device.createBuffer()中の usage: GPUBufferUsage.MAP_READ

NOTE:

  • 計算はworkgroup単位で行う
  • Bind Group Layoutはpipelineのコード内容から取り出せる
  • 入出力用バッファはBind Groupとしてセットする
  • 計算結果はMAP_READバッファへコピーして取り出す(読込中にunmapすると0になる)
  • workgroup_sizeの最大数は、adapter.limits.maxInvocationPerWorkgroupで制限される
    • 多次元の場合は、(要素ごとではなく)その総積値が超えないようにしなくてはいけない
    • 各要素ごとの最大数としては、maxComputeWorkgroupSizeXmaxComputeWorkgroupSizeYmaxComputeWorkgroupSizeZがある
bellbindbellbind

WebGPUプログラム例: Render編

四角形の描画 (要 <canvas id="canvas"></canvas>)。Chrome-98/Firefox-nightly-99で実行可能

main.js
// Simple example for WebGPU API: https://www.w3.org/TR/webgpu/
const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice();

// Buffer of triangle-strip square: 4-(x,y)
const square = new Float32Array([-1/2, -1/2, -1/2, +1/2, +1/2, -1/2, +1/2, +1/2,]);
const vertexBuffer = device.createBuffer({size: square.byteLength, usage: GPUBufferUsage.VERTEX, mappedAtCreation: true});
new Float32Array(vertexBuffer.getMappedRange()).set(square);
vertexBuffer.unmap();
const stride = {arrayStride: 2 * square.BYTES_PER_ELEMENT, attributes: [{shaderLocation: 0, offset: 0, format: "float32x2"}]};

// WGSL shaders: https://www.w3.org/TR/WGSL/
// NOTE: attribute syntax is changed from [[attr]] to @attr at 2022/01/19; not yet supported on chrome98
const vertexWgsl = `
struct Out {
  [[builtin(position)]] pos: vec4<f32>;
  [[location(0)]] vert: vec2<f32>;
};
[[stage(vertex)]] fn main([[location(0)]] xy: vec2<f32>) -> Out {
  return Out(vec4<f32>(xy, 0.0, 1.0), xy + 0.5);
}
`;
const vertexShader = device.createShaderModule({code: vertexWgsl});
const fragmentWgsl = `
[[stage(fragment)]] fn main([[builtin(position)]] pos: vec4<f32>, [[location(0)]] vert: vec2<f32>) -> [[location(0)]] vec4<f32> {
  return vec4<f32>(vert, 0.0, 1.0);
}
`;
const fragmentShader = device.createShaderModule({code: fragmentWgsl});

// gpu config for canvas
const canvas = document.getElementById("canvas");
const gpu = canvas.getContext("webgpu");
const format = gpu.getPreferredFormat(adapter);
gpu.configure({device, format, size: [canvas.width, canvas.height]});

// pipeline
const pipeline = device.createRenderPipeline({
  primitive: {topology: "triangle-strip"},
  vertex: {module: vertexShader, entryPoint: "main", buffers: [stride]},
  fragment: {module: fragmentShader, entryPoint: "main", targets: [{format}]},
});

// render
const render = () => {
  const view = gpu.getCurrentTexture().createView();
  //[chrome100] loadValue: {r,g,b,a} => loadOp: "clear", clearValue: {r,g,b,a} (clearValue default is RGBA=0000)
  const renderPass = {colorAttachments: [{view, loadValue: {r: 0, g: 0, b: 0, a: 0}, storeOp: "store"}]};
  const commandEncoder = device.createCommandEncoder();
  const passEncoder = commandEncoder.beginRenderPass(renderPass);
  passEncoder.setPipeline(pipeline);
  passEncoder.setVertexBuffer(0, vertexBuffer);
  passEncoder.draw(4,  1); // 4-vertex 1-instance
  passEncoder.endPass(); //[chrome100] endPass() => end()
  device.queue.submit([commandEncoder.finish()]);
};
(function loop() {
  render();
  requestAnimationFrame(loop);
})();

コード中の対応関係

  • const square = new Float32Array([...]); のデータは、2次元4頂点をフラットに並べたデータ
    • stridearrayStride: 2 * square.BYTES_PER_ELEMENTが2次元頂点データ1つのバイトサイズ
    • stride format: "float32x2" は2次元頂点データ1つのデータ型
    • vertWgsl中の [[location(0)]] xy: vec2<f32> が、2次元の1頂点データ
    • passEncoder.draw(4, 1)が、4頂点(1インスタンス)の描画命令
  • passEncoder.setVertexBuffer(0, vertexBuffer)
    と、 device.createBuffer() のパラメータ usage: GPUBufferUsage.VERTEX
  • vertWgsl 中の stage(vertex)と、 vertex: {module: vertexShader, entryPoint: "main", buffers: [stride]}
  • vertWgsl 中の [[location(0)]] xy: vec2<f32>0 と、passEncoder.setVertexBuffer(0, vertexBuffer)0
  • vertWgsl 中の [[location(0)]] vert: vec2<f32>; と、fragmentWgsl中の [[location(0)]] vert: vec2<f32>
  • fragmentWgsl 中の stage(fragment) と、 fragment: {module: fragmentShader, entryPoint: "main", targets: [{format}]}

NOTE:

  • WebGPUは左手系: 左下(x,y)=(-1,-1)、右上(x,y)=(1,1)、手前がz=0、奥がz=1
  • WebGPUでは、 triangle-listとtriangle-stripは使えるが、triangle-fanは使えない
  • fragmentWgslのpos引数は不要で削ってよい(builtin例として入れておいた。一方、vertexWgslの戻り値中で builtin(position) なメンバー変数1つを返すことは必須)
bellbindbellbind

WebGPUプログラム例: Compute+Render編

ライフゲーム(要 <canvas id="canvas"></canvas>)。Chrome-98/Firefox-nightly-99で実行可能

// Game of Life as a simple Compute+Render example for WebGPU API: https://www.w3.org/TR/webgpu/
const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice();

//[compute setup]
// buffers of cell tables
const width = 256, height = 256;
const cells = new Uint32Array(width * height);
for (let i = 0; i < 10000;) {
  const n = Math.trunc(Math.random() * width * height);
  if (cells[n] === 1) continue;
  cells[n] = 1;
  i++;
}
const buffer0 = device.createBuffer({size: cells.byteLength, usage: GPUBufferUsage.STORAGE | GPUBufferUsage.VERTEX, mappedAtCreation: true});
new Uint32Array(buffer0.getMappedRange()).set(cells);
buffer0.unmap();
const buffer1 = device.createBuffer({size: cells.byteLength, usage: GPUBufferUsage.STORAGE | GPUBufferUsage.VERTEX});

// compute shader of step
const blockAttr = navigator.userAgent?.match(/Chrome\/98/) ? "[[block]]" : "";
const workgroupSize = adapter.limits.maxComputeInvocationsPerWorkgroup ?
      2 ** Math.trunc(Math.log2(Math.sqrt(adapter.limits.maxComputeInvocationsPerWorkgroup))) : 16;
const computeWgsl = `
${blockAttr} struct Cells {
  data: array<u32>;
};
[[binding(0), group(0)]] var<storage, read> current: Cells;
[[binding(1), group(0)]] var<storage, write> next: Cells;
fn index(x: i32, y: i32) -> u32 {
  let w = ${width};
  let h = ${height};
  return u32(((y + h) % h) * w + ((x + w) % w));
}
fn cell(x: i32, y: i32) -> u32 {
  return current.data[index(x, y)];
}
fn neighbors(x: i32, y: i32) -> u32 {
  return cell(x - 1, y - 1) + cell(x, y - 1) + cell(x + 1, y - 1) +
         cell(x - 1, y) +                      cell(x + 1, y) +
         cell(x - 1, y + 1) + cell(x, y + 1) + cell(x + 1, y + 1);
}
[[stage(compute), workgroup_size(${workgroupSize}, ${workgroupSize})]] fn step_next([[builtin(global_invocation_id)]] giid: vec3<u32>) {
  let x = i32(giid.x);
  let y = i32(giid.y);
  let n = neighbors(x, y);
  next.data[index(x, y)] = select(u32(n == 3u), u32(n == 2u || n == 3u), cell(x, y) == 1u);
}
`;
const computeShader = device.createShaderModule({code: computeWgsl});

// compute pipeline
const computePipeline = device.createComputePipeline({
  compute: {module: computeShader, entryPoint: "step_next"},
});
const bindGroupLayout = computePipeline.getBindGroupLayout(0);
const bindGroup0 = device.createBindGroup({
  layout: bindGroupLayout,
  entries: [
    {binding: 0, resource: {buffer: buffer0}},
    {binding: 1, resource: {buffer: buffer1}},
  ]
});
const bindGroup1 = device.createBindGroup({
  layout: bindGroupLayout,
  entries: [
    {binding: 0, resource: {buffer: buffer1}},
    {binding: 1, resource: {buffer: buffer0}},
  ]
});


//[render setup]
// cell vertex buffer
const corners = new Uint32Array([0, 0, 0, 1, 1, 0, 1, 1]); // 4-offset of (x, y)
const cornersBuffer = device.createBuffer({size: corners.byteLength, usage: GPUBufferUsage.VERTEX, mappedAtCreation: true});
new Uint32Array(cornersBuffer.getMappedRange()).set(corners);
cornersBuffer.unmap();
const cornersStride = {arrayStride: 2 * corners.BYTES_PER_ELEMENT, stepMode: "vertex", attributes: [{shaderLocation: 1, offset: 0, format: "uint32x2"}]};

// stride for buffer0/buffer1 as instances
const cellsStride = {arrayStride: cells.BYTES_PER_ELEMENT, stepMode: "instance", attributes: [{shaderLocation: 0, offset: 0, format: "uint32"}]};

// cell shader
const vertexWgsl = `
struct Out {
[[builtin(position)]] pos: vec4<f32>;
[[location(0)]] cell: f32;
};
[[stage(vertex)]] fn main([[builtin(instance_index)]] i: u32, [[location(0)]] cell: u32, [[location(1)]] v: vec2<u32>) -> Out {
  let w = ${width}u;
  let h = ${height}u;
  let x = (f32(i % w + v.x) / f32(w) - 0.5) * 2.0;
  let y = (f32((i - (i % w)) / w + v.y) / f32(h) - 0.5) * 2.0;
  return Out(vec4<f32>(x, y, 0.0, 1.0), f32(cell));
}
`;
const vertexShader = device.createShaderModule({code: vertexWgsl});
const fragmentWgsl = `
[[stage(fragment)]] fn main([[location(0)]] cell: f32) -> [[location(0)]] vec4<f32> {
  return vec4<f32>(f32(cell), f32(cell), f32(cell), 1.0);
}
`;
const fragmentShader = device.createShaderModule({code: fragmentWgsl});

// gpu config for canvas
const canvas = document.getElementById("canvas");
const gpu = canvas.getContext("webgpu");
const format = gpu.getPreferredFormat(adapter);
gpu.configure({device, format, size: [canvas.width, canvas.height]});

// render pipeline
const renderPipeline = device.createRenderPipeline({
  primitive: {topology: "triangle-strip"},
  vertex: {module: vertexShader, entryPoint: "main", buffers: [cellsStride, cornersStride]},
  fragment: {module: fragmentShader, entryPoint: "main", targets: [{format}]},
});


//[command part]
// render
const render = (t, compute = true, render = true) => {
  const view = gpu.getCurrentTexture().createView();
  const renderPass = {colorAttachments: [{view, loadValue: {r: 0, g: 0, b: 0, a: 0}, storeOp: "store"}]};
  const commandEncoder = device.createCommandEncoder();
  if (compute) {
    const passEncoder = commandEncoder.beginComputePass();
    passEncoder.setPipeline(computePipeline);
    passEncoder.setBindGroup(0, t % 2 === 0 ? bindGroup0 : bindGroup1);
    passEncoder.dispatch(width / workgroupSize, height / workgroupSize);
    passEncoder.endPass();
  }
  if (render) {
    const passEncoder = commandEncoder.beginRenderPass(renderPass);
    passEncoder.setPipeline(renderPipeline);
    passEncoder.setVertexBuffer(0, t % 2 === 0 ? buffer1 : buffer0);
    passEncoder.setVertexBuffer(1, cornersBuffer);
    passEncoder.draw(4,  width * height);
    passEncoder.endPass();
  }
  device.queue.submit([commandEncoder.finish()]);
};
(function loop(t) {
  render(t);
  requestAnimationFrame(() => loop(t ^ 1));
})(0);

概要

  1. セル配列(0空白、1生存)のバッファを2つ用意し、入出力の組をBind Groupで2つ用意して交互に切り替え、ライフゲームのステップをcomputeで実行させる。
  2. 描画では、四角形の4頂点の(左下からの)オフセットの頂点バッファを別途用意し、compute結果の01セル配列をinstance用の頂点バッファとして扱う。
  3. 頂点シェーダーでは、ビルトイン引数であるinstance_indexの整数値から基準座標のxyを計算することで四角形を描き、セル配列の値をフラグメントシェーダーへそのまま渡すことで色をつける(生存部が白になる)
bellbindbellbind

WebGPUプログラム例: 3Dアニメーション描画

正四面体の周りをカメラで回り描画するアニメーション(要 <canvas id="canvas"></canvas>)。Chrome-100以降用

main.js
// Simple example for WebGPU API for Chrome-100: https://www.w3.org/TR/webgpu/
const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice();

// utility
const createBuffer = (device, desc, data) => {
  const buffer = device.createBuffer(Object.assign({size: data.byteLength, mappedAtCreation: true}, desc));
  new Uint8Array(buffer.getMappedRange()).set(new Uint8Array(data.buffer, data.byteOffset, data.byteLength));
  buffer.unmap();
  return buffer;
};

// vertex stride
const stride = {arrayStride: 6 * Float32Array.BYTES_PER_ELEMENT, attributes: [
  {shaderLocation: 0, offset: 0, format: "float32x3"},
  {shaderLocation: 1, offset: 3 * Float32Array.BYTES_PER_ELEMENT, format: "float32x3"},
]};
// WGSL shaders: https://www.w3.org/TR/WGSL/
const vertexWgsl = `
struct Perspective {
  fov: f32;
  aspect: f32;
  near: f32;
  far: f32;
};
@group(0) @binding(0) var<uniform> perspective: Perspective;
fn perspectiveMatrix(perspective: Perspective) -> mat4x4<f32> {
  let zoom = 1.0 / tan(perspective.fov / 2.0);
  return mat4x4<f32>(
    zoom / perspective.aspect, 0.0, 0.0, 0.0,
    0.0, zoom, 0.0, 0.0,
    0.0, 0.0, perspective.far / (perspective.near - perspective.far), -1.0,
    0.0, 0.0, perspective.near * perspective.far / (perspective.near - perspective.far), 0.0
  );
};

struct LookAt {
  eye: vec3<f32>;
  target: vec3<f32>;
  up: vec3<f32>;
}
@group(0) @binding(1) var<uniform> lookAt: LookAt;
fn lookAtMatrix(lookAt: LookAt) -> mat4x4<f32> {
  let lz = normalize(lookAt.eye - lookAt.target);
  let lx = normalize(cross((lookAt.up, lz));
  let ly = cross(lz, lx);
  return mat4x4<f32>(
    lx.x, ly.x, lz.x, 0.0,
    lx.y, ly.y, lz.y, 0.0,
    lx.z, ly.z, lz.z, 0.0,
    -dot(lookAt.eye, lx), -dot(lookAt.eye, ly), -dot(lookAt.eye, lz), 1.0
  );
};

struct Light {
  color: vec3<f32>;
  dir: vec3<f32>;
};
@group(0) @binding(2) var<uniform> light: Light;

struct ModelView {
  m: mat4x4<f32>;
};
@group(1) @binding(0) var<uniform> modelView: ModelView;
struct Material {
  color: vec3<f32>;
  ambient: f32;
  diffuse: f32;
  specular: f32;
};
@group(1) @binding(1) var<uniform> material: Material;

fn lighting(light: Light, matelial: Material, pos: vec3<f32>, norm: vec3<f32>) -> vec3<f32> {
  let refl = normalize(reflect(-light.dir, norm));
  let dir = normalize(-pos);
  let spec = material.specular * max(dot(dir, refl), 0.0);
  let diff = material.diffuse * max(dot(norm, light.dir), 0.0);
  let d = material.ambient + diff;
  return d * material.color + spec * light.color;
}

struct Out {
  @builtin(position) pos: vec4<f32>;
  @location(0) color: vec3<f32>;
};
@stage(vertex) fn main(@location(0) pos: vec3<f32>, @location(1) norm: vec3<f32>) -> Out {
  return Out(
    perspectiveMatrix(perspective) * lookAtMatrix(lookAt) * modelView.m * vec4<f32>(pos, 1.0),
    lighting(light, material, pos, normalize(norm)));
}
`;
const vertexShader = device.createShaderModule({code: vertexWgsl});
const fragmentWgsl = `
@stage(fragment) fn main(@location(0) color: vec3<f32>) -> @location(0) vec4<f32> {
  return vec4<f32>(color, 1.0);
}
`;
const fragmentShader = device.createShaderModule({code: fragmentWgsl});

// gpu config for canvas
const canvas = document.getElementById("canvas");
const size = [canvas.width, canvas.height];
const gpu = canvas.getContext("webgpu");
const format = gpu.getPreferredFormat(adapter);
gpu.configure({device, format, size});

// Texture for multisampling ant-alias(MSAA)
const sampleCount = 4; // NOTE: values except 4 are not available
const msaaTexture = device.createTexture({usage: GPUTextureUsage.RENDER_ATTACHMENT, format, size, sampleCount});
const msaaView = msaaTexture.createView();

// depth buffer preparation
const depthFormat = "depth24plus-stencil8";
const depthTexture = device.createTexture({usage: GPUTextureUsage.RENDER_ATTACHMENT, format: depthFormat, size, sampleCount});
const depthView = depthTexture.createView();
const depthStencilAttachment = {view: depthView, depthLoadOp: "clear", depthClearValue: 1, depthStoreOp: "store", stencilLoadOp: "clear", stencilStoreOp: "store"};

// blend
const blend = {
  color: {srcFactor: "src-alpha", dstFactor: "one-minus-src-alpha", operation: "add"},
  alpha: {srcFactor: "one", dstFactor: "one", operation: "add"},
};

// pipeline
const pipeline = device.createRenderPipeline({
  primitive: {topology: "triangle-list", cullMode: "back"},
  vertex: {module: vertexShader, entryPoint: "main", buffers: [stride]},
  fragment: {module: fragmentShader, entryPoint: "main", targets: [{format, blend}]},
  depthStencil: {depthWriteEnabled: true, depthCompare: "less", format: depthFormat},
  multisample: {count: sampleCount},
});

// [schene]
// camera
const perspective = new Float32Array([Math.PI / 2, 1.0, 0.1, 12.0]); // fov, aspect, near, far,
const perspectiveBuffer = createBuffer(device, {usage: GPUBufferUsage.UNIFORM}, perspective);
// NOTE: uniform buffer has alignments: f32 = 4, vec2<f32> = 8, vec3<f32>,vec4<f32> = 16
const lookAt = new Float32Array([
  0, 0, -6, // eye at
  0, // padding for next vec3's alignment 16
  0, 0, 0,    // target at
  0, // padding for next vec3's alignment 16
  0, 1, 0,     // upside direction
]);
const lookAtBuffer = createBuffer(device, {usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST}, lookAt);

// light setting
const light = new Float32Array([
  0.5, 0.5, 0.5, // RGB
  0, // padding for next vec3's alignment 16
  5, 15, 0, // direction
]);
const lightBuffer = createBuffer(device, {usage: GPUBufferUsage.UNIFORM}, light);

// bind group
const bindGroup = device.createBindGroup({
  layout: pipeline.getBindGroupLayout(0),
  entries: [
    {binding: 0, resource: {buffer: perspectiveBuffer}},
    {binding: 1, resource: {buffer: lookAtBuffer}},
    {binding: 2, resource: {buffer: lightBuffer}},
  ]
});


// Tetrahedron data
const Tetrahedron = (color, ambient, diffuse, specular) => {
  // [cube index: vertex]
  const cube = [
    [-1, +1, -1],
    [-1, -1, -1],
    [+1, -1, -1],
    [+1, +1, -1],
    [+1, +1, +1],
    [+1, -1, +1],
    [-1, -1, +1],
    [-1, +1, +1],
  ];
  // [layout of cube index]
  // rear =) front face = rear face (= front
  // - 7 =) 0 - 3 = 4 - 7 (= 0 -
  //   |  ) |   |   |   | (  |
  // - 6 =) 1 - 2 = 5 - 6 (= 1 - 
  //
  // [tetrahedron faces]
  // normal cube-index: (ccw) triangle cube-index list 
  // 0: 1-3-7
  // 2: 3-1-5
  // 4: 5-7-3
  // 6: 7-5-1
  const vertex = new Float32Array([
    [1, 3, 7], [3, 1, 5], [5, 7, 3], [7, 5, 1]
  ].flatMap((face, fid) => face.flatMap(v => [v, fid * 2])).flatMap(v => cube[v]));
  
  // uniform
  const modelView = new Float32Array([
    1, 0, 0, 0,
    0, 1, 0, 0,
    0, 0, 1, 0,
    0, 0, 0, 1,
  ]);
  const material = new Float32Array([
    color[0], color[1], color[2], // RGB
    ambient, diffuse, specular, // ambient, diffuse, specular  
  ]);
  return {vertex, modelView, material, count: 12};
};
const prepareBindGroup = obj => {
  obj.vertexBuffer = createBuffer(device, {usage: GPUBufferUsage.VERTEX}, obj.vertex);
  obj.modelViewBuffer = createBuffer(device, {usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST}, obj.modelView);
  obj.materialBuffer = createBuffer(device, {usage: GPUBufferUsage.UNIFORM}, obj.material);
  obj.bindGroup = device.createBindGroup({
    layout: pipeline.getBindGroupLayout(1),
    entries: [
      {binding: 0, resource: {buffer: obj.modelViewBuffer}},
      {binding: 1, resource: {buffer: obj.materialBuffer}},
    ]
  });
  return obj;
};

const tetrahedron0 = prepareBindGroup(Tetrahedron([1, 0, 0], 0.1, 0.3, 0.5));
const tetrahedron1 = prepareBindGroup(Tetrahedron([0, 1, 0], 0.3, 0.5, 0.1));
const tetrahedron2 = prepareBindGroup(Tetrahedron([0, 0, 1], 0.5, 0.1, 0.3));

// render
const render = (t) => {
  // update uniform values with queue.writeBuffer
  lookAt[0] = lookAt[1]  = 6 * Math.sin(Math.PI * t / 720) / (2 ** 0.5);
  lookAt[2] = -6 * Math.cos(Math.PI * t / 720);
  tetrahedron0.modelView[12] = tetrahedron1.modelView[13] = tetrahedron2[14] = 3 * Math.sin(Math.PI * t / 180);
  device.queue.writeBuffer(lookAtBuffer, 0, lookAt.buffer);
  device.queue.writeBuffer(tetrahedron0.modelViewBuffer, 0, tetrahedron0.modelView.buffer);
  device.queue.writeBuffer(tetrahedron1.modelViewBuffer, 0, tetrahedron1.modelView.buffer);
  device.queue.writeBuffer(tetrahedron2.modelViewBuffer, 0, tetrahedron2.modelView.buffer);

  // NOTE: must getCurrentTexture().createView() everytime for animation updating
  const resolveTarget = gpu.getCurrentTexture().createView();
  const colorAttachment = {view: msaaView, resolveTarget, loadOp: "clear", clearValue: {r: 0, g: 0, b:0, a: 0.5}, storeOp: "discard"};
  const renderPass = {colorAttachments: [colorAttachment], depthStencilAttachment};
  
  const commandEncoder = device.createCommandEncoder();
  const passEncoder = commandEncoder.beginRenderPass(renderPass);
  passEncoder.setPipeline(pipeline);
  passEncoder.setBindGroup(0, bindGroup);
  
  passEncoder.setBindGroup(1, tetrahedron0.bindGroup);
  passEncoder.setVertexBuffer(0, tetrahedron0.vertexBuffer);
  passEncoder.draw(tetrahedron0.count, 1);

  passEncoder.setBindGroup(1, tetrahedron1.bindGroup);
  passEncoder.setVertexBuffer(0, tetrahedron1.vertexBuffer);
  passEncoder.draw(tetrahedron1.count, 1);

  passEncoder.setBindGroup(1, tetrahedron2.bindGroup);
  passEncoder.setVertexBuffer(0, tetrahedron2.vertexBuffer);
  passEncoder.draw(tetrahedron2.count, 1);

  passEncoder.end();
  device.queue.submit([commandEncoder.finish()]);
};
(function loop(t) {
  render(t);
  requestAnimationFrame(() => loop((t + 1) % 1440));
})(0);

概要

  • 中心が(0,0,0)の立方体の頂点座標から、正四面体と法線データを作成(反時計回り面)
  • 頂点バッファは、(頂点座標, 非正規化法線ベクトル)のリスト
  • 3D行列計算は全部頂点シェーダー内で行い、uniformのBindGroupで必要なパラメータをすべて渡す
  • アニメーションは0,0,0のxz平面でカメラ座標を回転させる。uniform値の更新はqueue.writeBuffer()で行う
  • depth testのためにdepth textureを用意し、pipelineとrender passで有効化させる設定を追加する
  • ジャギー消しのために、multisampling anti-alias を行う

NOTE:

  • WebGPUの描画対象領域はz=(0,1)なので、perspective行列は描画対象がz=(-1,1)なWebGLとは違うものになるので注意
  • アニメーションで描画更新させるには毎回getCurrentTexture()する必要がある
  • uniformバッファには、structメンバーの変数型ごとのアラインメントがあるので、バッファ上の変数値の開始位置を調整する必要がある
    • f32のアラインメントは4(バイト)で、vec3<f32>のアラインメントは16(バイト)なので、たとえばuniformの structで先頭のf32メンバーのすぐ後ろにvec3<f32>メンバーがある場合には、f32メンバーの値の後ろに12バイト分のパディングを入れ、パディングの後に3つのf32の値を入れる必要がある。
    • 一つのpipeline上で同時に使えるBind Group数はadapter.limits.maxBindGroups で制限される(デフォルトは4)
bellbindbellbind

WebGPUプログラム例: 画像テクスチャ

SVG画像を表示 (要 <canvas id="canvas"></canvas>)。Chrome-100以降用

main.js
// Image texture example for WebGPU API for Chrome-100: https://www.w3.org/TR/webgpu/
const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice();

// prepare image
const img = new Image();
img.src = "data:image/svg+xml," + encodeURIComponent(`
<svg xmlns="http://www.w3.org/2000/svg" width="900" height="600">
  <rect fill="#ffffff" width="900" height="600" />
  <circle fill="#bc002d" cx="450" cy="300" r="180" />
</svg>
`);
await img.decode();
const bitmap = await createImageBitmap(img);
const max = Math.max(bitmap.width, bitmap.height);
const [w, h] = [bitmap.width / max, bitmap.height / max];

// triangle-strip square: 4-(x,y, u, v); top-left: (u,v)=(0,0)
const square = new Float32Array([
  -w, -h, 0, 1,
  -w, +h, 0, 0,
  +w, -h, 1, 1,
  +w, +h, 1, 0,
]);
const vertexBuffer = device.createBuffer({size: square.byteLength, usage: GPUBufferUsage.VERTEX, mappedAtCreation: true});
new Float32Array(vertexBuffer.getMappedRange()).set(square);
vertexBuffer.unmap();
const stride = {arrayStride: 4 * square.BYTES_PER_ELEMENT, attributes: [
  {shaderLocation: 0, offset: 0, format: "float32x2"},
  {shaderLocation: 1, offset: 2 * square.BYTES_PER_ELEMENT, format: "float32x2"},
]};

// WGSL shaders: https://www.w3.org/TR/WGSL/
const vertexWgsl = `
struct Out {
@builtin(position) pos: vec4<f32>;
@location(0) uv: vec2<f32>;
};
@stage(vertex) fn main(@location(0) xy: vec2<f32>, @location(1) uv: vec2<f32>) -> Out {
  return Out(vec4<f32>(xy, 0.0, 1.0), uv);
}
`;
const vertexShader = device.createShaderModule({code: vertexWgsl});
const fragmentWgsl = `
@group(0) @binding(0) var samp: sampler;
@group(0) @binding(1) var tex: texture_2d<f32>;
@stage(fragment) fn main(@location(0) uv: vec2<f32>) -> @location(0) vec4<f32> {
  return textureSample(tex, samp, uv);
}
`;
const fragmentShader = device.createShaderModule({code: fragmentWgsl});

// gpu config for canvas
const canvas = document.getElementById("canvas");
const gpu = canvas.getContext("webgpu");
const format = gpu.getPreferredFormat(adapter);
gpu.configure({device, format, size: [canvas.width, canvas.height]});

// texture and sampler
const samp = device.createSampler({minFilter: "linear", magFilter: "linear"});
const tex = device.createTexture({
  format: "rgba8unorm", size: [bitmap.width, bitmap.height],
  usage: GPUTextureUsage.TEXTURE_BINDING | GPUTextureUsage.COPY_DST | GPUTextureUsage.RENDER_ATTACHMENT,
});
device.queue.copyExternalImageToTexture({source: bitmap}, {texture: tex}, [bitmap.width, bitmap.height]);

// pipeline
const pipeline = device.createRenderPipeline({
  primitive: {topology: "triangle-strip"},
  vertex: {module: vertexShader, entryPoint: "main", buffers: [stride]},
  fragment: {module: fragmentShader, entryPoint: "main", targets: [{format}]},
});

// bind group
const bindGroupLayout = pipeline.getBindGroupLayout(0);
const bindGroup = device.createBindGroup({
  layout: bindGroupLayout,
  entries: [
    {binding: 0, resource: samp},
    {binding: 1, resource: tex.createView()},
  ]
});

// render
const render = () => {
  const view = gpu.getCurrentTexture().createView();
  const renderPass = {colorAttachments: [{view, loadOp: "clear", clearValue: {r: 0, g: 0, b: 0, a: 1}, storeOp: "store"}]};
  const commandEncoder = device.createCommandEncoder();
  const passEncoder = commandEncoder.beginRenderPass(renderPass);
  passEncoder.setPipeline(pipeline);
  passEncoder.setBindGroup(0, bindGroup);
  passEncoder.setVertexBuffer(0, vertexBuffer);
  passEncoder.draw(4,  1);
  passEncoder.end();
  device.queue.submit([commandEncoder.finish()]);
};
(function loop() {
  render();
  requestAnimationFrame(loop);
})();

NOTE:

  • テクスチャから色をピックアップするための sampler を明示的に作成し、 bind groupで渡し、シェーダー中で使用する
  • テクスチャデータはImageBitmap経由でdevice.queue.copyExternalImageToTexture()で読み込む
  • テクスチャ座標は左上が(0, 0)
  • bind groupでシェーダーへ渡すのはtextureのcreateView()
bellbindbellbind

WebGPUプログラム例: 頂点バッファなしの図形描画

正多角形アニメーション (要 <canvas id="canvas"></canvas>)。firefox-nightly-99非対応

main.js
// Simple example for WebGPU API: https://www.w3.org/TR/webgpu/
const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice();

// WGSL shaders: https://www.w3.org/TR/WGSL/
const wgsl = `
struct Uniforms {
  t: u32;
  corners: u32;
};
@group(0) @binding(0) var<uniform> uniforms: Uniforms;

struct IO {
  @builtin(position) pos: vec4<f32>;
  @location(0) hsv: vec2<f32>;
};

@stage(vertex) fn vmain(@builtin(vertex_index) i: u32) -> IO {
  let vid = i % 3u;
  let tid = (i - vid) / 3u;
  let cid = (uniforms.t + tid) % uniforms.corners; 
  let a = 360.0 / f32(uniforms.corners);
  if (vid == 0u) {return IO(vec4<f32>(0.0, 0.0, 0.0, 1.0), vec2<f32>(f32(cid) * a, 0.0));}
  let t = radians(f32(tid + vid - 1u) * a);
  return IO(vec4<f32>(-sin(t), cos(t), 0.0, 1.0), vec2<f32>(f32(cid + vid - 1u) * a, 1.0));
}

fn hsv2rgb(h: f32, s: f32, v: f32, a: f32) -> vec4<f32> {
  let h_ = (h % 360.0) / 60.0;
  let f = modf(h_).fract;
  let m = v * (1.0 - s);
  let n = v * (1.0 - s * f);
  let k = v * (1.0 - s * (1.0 - f));
  if (h_ < 1.0) {return vec4<f32>(v, k, m, a);}
  if (h_ < 2.0) {return vec4<f32>(n, v, m, a);}
  if (h_ < 3.0) {return vec4<f32>(m, v, k, a);}
  if (h_ < 4.0) {return vec4<f32>(m, n, v, a);}
  if (h_ < 5.0) {return vec4<f32>(k, m, v, a);}
  if (h_ < 6.0) {return vec4<f32>(v, m, n, a);}
  return vec4<f32>(0.0, 0.0, 0.0, a);
}

@stage(fragment) fn fmain(io: IO) -> @location(0) vec4<f32> {
  return hsv2rgb(io.hsv.x, io.hsv.y, 0.75, 1.0);
}
`;
const shader = device.createShaderModule({code: wgsl});

// gpu config for canvas
const canvas = document.getElementById("canvas");
const gpu = canvas.getContext("webgpu");
const format = gpu.getPreferredFormat(adapter);
gpu.configure({device, format, size: [canvas.width, canvas.height]});

// pipeline
const pipeline = device.createRenderPipeline({
  primitive: {topology: "triangle-list", cullMode: "back"},
  vertex: {module: shader, entryPoint: "vmain", buffers: []},
  fragment: {module: shader, entryPoint: "fmain", targets: [{format}]},
});

// bind group
const uniforms = new Uint32Array([0, 3]);
const uniformsBuffer = device.createBuffer({usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST, size: uniforms.byteLength});
const bindGroup = device.createBindGroup({
  layout: pipeline.getBindGroupLayout(0),
  entries: [{binding: 0, resource: {buffer: uniformsBuffer}}],
});

// render with no vertex buffer
const corners = 64;
const render = (t) => {
  const c = 3 + Math.round((corners - 3) / 2 * (-Math.cos(t / 80) + 1));
  [uniforms[0], uniforms[1]] = [t / 10, c];
  device.queue.writeBuffer(uniformsBuffer, 0, uniforms.buffer);
  
  const view = gpu.getCurrentTexture().createView();
  const clearValue = {r: 0, g: 0, b: 0, a: 1};
  const renderPass = {colorAttachments: [{view, loadOp: "clear", clearValue, loadValue: clearValue, storeOp: "store"}]}; //[chrome-99] loadValue
  const commandEncoder = device.createCommandEncoder();
  const passEncoder = commandEncoder.beginRenderPass(renderPass);
  passEncoder.setPipeline(pipeline);
  passEncoder.setBindGroup(0, bindGroup);
  passEncoder.draw(3 * c,  1);
  (passEncoder.end ?? passEncoder.endPass).call(passEncoder); //[chrome-99] endPass
  device.queue.submit([commandEncoder.finish()]);
};
(function loop(t) {
  render(t);
  requestAnimationFrame(() => loop(t + 1));
})(0);

NOTE:

  • builtin(vertex_index) を使って頂点IDから、頂点座標を算出する
  • タイムスタンプなどのパラメータはBind Groupによって uniform変数 で受け取る
bellbindbellbind

WebGPUプログラム例: 計算シェーダでビットマップからパーティクル位置生成

https://github.com/austinEng/webgpu-samples/tree/main/src/sample/particles で計算シェーダでやっていることを整理したもの。Chrome-100用。

main.js
// Image texture example for WebGPU API for Chrome-100: https://www.w3.org/TR/webgpu/
const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice();

// arranged from: https://github.com/austinEng/webgpu-samples/tree/main/src/sample/particles

// prepare image
const bitmapSize = 512;
const emoji = "🎎";
//const emoji = "🌄";
console.log(emoji);
const img = new Image();
img.src = "data:image/svg+xml," + encodeURIComponent(`
<svg xmlns="http://www.w3.org/2000/svg" width="${bitmapSize}" height="${bitmapSize}">
  <text x="0" y="${bitmapSize - bitmapSize / 8}" font-size="${bitmapSize}">${emoji}</text>
</svg>
`);
await img.decode();
const bitmap = await createImageBitmap(img);


//[A. generate mipmap] copy bitmap alpha into prob mipmap
// - bitmap alpha value as probability
// - mipmap levels from original texture [width, height] to [1, 1]; step by [1/2, 1/2] e.g. mip level count of 8x8 texture = 4
// - each smaller mipmap (r, g, b, a) as
//   4x larger mipmap (top-left, top-left + top-right, top-left + top-right + bottom-left, top-left + top-right + bottom-left + bottom-right) probabilities
//   (0 <= prob <= 1) 

// 1. copy alpha value in bitmap into the initial buffer
const ws = 256;
const copyAlphaWgsl = `
struct Buf {
  alphas: array<f32>;
};
@group(0) @binding(0) var<storage, write> alpha_out: Buf;
@group(0) @binding(1) var bitmap: texture_2d<f32>;
let ws = ${ws};
let width = ${bitmapSize}u;
@stage(compute) @workgroup_size(ws) fn copy_alpha(@builtin(global_invocation_id) giid: vec3<u32>) {
  alpha_out.alphas[giid.y * width + giid.x] = textureLoad(bitmap, vec2<i32>(giid.xy), 0).a;
} 
`;
const copyAlphaShader = device.createShaderModule({code: copyAlphaWgsl});

// 2. shrink half&half from bufferA to bufferB, then write shrinked value into texture mipmap
const shrinkMipmapWgsl = `
struct Buf {
  alphas: array<f32>;
};
@group(0) @binding(0) var<storage, read> alpha_in: Buf;
@group(0) @binding(1) var<storage, write> alpha_out: Buf;
@group(0) @binding(2) var mipmap: texture_storage_2d<rgba8unorm, write>;
let ws = ${ws};
let width = ${bitmapSize}u;
@stage(compute) @workgroup_size(ws) fn shrink_mipmap(@builtin(global_invocation_id) giid: vec3<u32>) {
  if (!all(giid.xy < vec2<u32>(textureDimensions(mipmap)))) {return;}
  let dst = giid.y * width + giid.x;
  let offs = 2u * giid.y * width + 2u * giid.x;
  let tl = alpha_in.alphas[offs];
  let tr = alpha_in.alphas[offs + 1u];
  let bl = alpha_in.alphas[offs + width];
  let br = alpha_in.alphas[offs + width + 1u];
  let total = tl + tr + bl + br;
  alpha_out.alphas[dst] = total / 4.0;
  if (total == 0.0) {
    textureStore(mipmap, vec2<i32>(giid.xy), vec4<f32>(0.0, 0.0, 0.0, 0.0));
  } else {
    textureStore(mipmap, vec2<i32>(giid.xy), vec4<f32>(tl, tl + tr, tl + tr + bl, tl + tr + bl + br) / total);
  } 
}
`;
const shrinkMipmapShader = device.createShaderModule({code: shrinkMipmapWgsl});

// texture and buffers
const mipLevelCount = Math.log2(bitmapSize) + 1;
const texture = device.createTexture({
  size: [bitmapSize, bitmapSize], mipLevelCount, format: "rgba8unorm", 
  usage: GPUTextureUsage.TEXTURE_BINDING | GPUTextureUsage.STORAGE_BINDING | GPUTextureUsage.COPY_DST | GPUTextureUsage.RENDER_ATTACHMENT,
});
device.queue.copyExternalImageToTexture({source: bitmap}, {texture: texture}, [bitmapSize, bitmapSize]);
                                         
const alphasBuffer0 = device.createBuffer({size: bitmapSize * bitmapSize * Float32Array.BYTES_PER_ELEMENT, usage: GPUBufferUsage.STORAGE});
const alphasBuffer1 = device.createBuffer({size: bitmapSize * bitmapSize * Float32Array.BYTES_PER_ELEMENT, usage: GPUBufferUsage.STORAGE});

// pipelines
const copyAlphaPipeline = device.createComputePipeline({
  compute: {module: copyAlphaShader, entryPoint: "copy_alpha"}
});
const shrinkMipmapPipeline = device.createComputePipeline({
  compute: {module: shrinkMipmapShader, entryPoint: "shrink_mipmap"}
});

// command encoder
const initEncoder = device.createCommandEncoder();
{//copy alpha
  const bindGroup = device.createBindGroup({
    layout: copyAlphaPipeline.getBindGroupLayout(0),
    entries: [
      {binding: 0, resource: {buffer: alphasBuffer0}},
      {binding: 1, resource: texture.createView({format: "rgba8unorm", dimension: "2d", baseMipLevel: 0, mipLevelCount: 1})},
    ]
  });
  
  const passEncoder = initEncoder.beginComputePass();
  passEncoder.setPipeline(copyAlphaPipeline);
  passEncoder.setBindGroup(0, bindGroup);
  passEncoder.dispatch(bitmapSize / ws, bitmapSize);
  passEncoder.end();
}
for (let level = 1; level < mipLevelCount; level++) {//shrink mipmap
  const destSize = bitmapSize >> level;
  const [alphaIn, alphaOut] = (level % 2 === 1) ? [alphasBuffer0, alphasBuffer1] : [alphasBuffer1, alphasBuffer0];
  const bindGroup = device.createBindGroup({
    layout: shrinkMipmapPipeline.getBindGroupLayout(0),
    entries: [
      {binding: 0, resource: {buffer: alphaIn}},
      {binding: 1, resource: {buffer: alphaOut}},
      {binding: 2, resource: texture.createView({format: "rgba8unorm", dimension: "2d", baseMipLevel: level, mipLevelCount: 1})},
    ]
  });
  
  const passEncoder = initEncoder.beginComputePass();
  passEncoder.setPipeline(shrinkMipmapPipeline);
  passEncoder.setBindGroup(0, bindGroup);
  passEncoder.dispatch(Math.ceil(destSize / ws), destSize);
  passEncoder.end();
}
device.queue.submit([initEncoder.finish()]);


//[B. sampling particle from prob mipmap] particle generator
const generateParticlesWgsl = `
struct Particle {
  rgba: vec4<f32>;
  uv: vec2<f32>;
  // pad_for_align16: array<f32, 2>;
};
struct Particles {
  list: array<Particle>;
};
@group(0) @binding(0) var<storage, read_write> particles: Particles;
@group(0) @binding(1) var texture: texture_2d<f32>;
var<private> rand_seed : vec2<f32>;
fn rand() -> f32 {
  rand_seed.x = fract(cos(dot(rand_seed, vec2<f32>(23.14077926, 232.61690225))) * 136.8168);
  rand_seed.y = fract(cos(dot(rand_seed, vec2<f32>(54.47856553, 345.84153136))) * 534.7645);
  return rand_seed.y;
}
fn born() -> Particle {
  var pos = vec2<i32>(0, 0);
  for (var level = textureNumLevels(texture) - 1; level > 0; level = level - 1) {
    let r = rand();
    let probs = textureLoad(texture, pos, level);
    if (r < probs.r) {
      pos = vec2<i32>(pos.x * 2, pos.y * 2);
    } else if (r < probs.g) {
      pos = vec2<i32>(pos.x * 2 + 1, pos.y * 2);
    } else if (r < probs.b) {
      pos = vec2<i32>(pos.x * 2, pos.y * 2 + 1);
    } else  {
      pos = vec2<i32>(pos.x * 2 + 1, pos.y * 2 + 1);
    }
  }
  let uv = vec2<f32>(pos) / vec2<f32>(textureDimensions(texture));
  let rgba = textureLoad(texture, pos, 0);
  return Particle(rgba, uv);
}
let ws = ${ws};
@stage(compute) @workgroup_size(ws) fn generate_particles(@builtin(global_invocation_id) giid: vec3<u32>) {
  rand_seed = vec2<f32>(giid.xy);
  particles.list[giid.x] = born();
}
`;
const generateParticlesShader = device.createShaderModule({code: generateParticlesWgsl});

const particleSize = (4 + 2 + 2/*as pading for vec4<f32>'s align 16*/) * Float32Array.BYTES_PER_ELEMENT ;
const wsCount = 64;
const particleCount = ws * wsCount;
const particlesBuffer = device.createBuffer({size: particleSize * particleCount, usage: GPUBufferUsage.STORAGE | GPUBufferUsage.VERTEX});

const generateParticlesPipeline = device.createComputePipeline({
  compute: {module: generateParticlesShader, entryPoint: "generate_particles"}
});
const particlesEncoder = device.createCommandEncoder();
{
  const bindGroup = device.createBindGroup({
    layout: generateParticlesPipeline.getBindGroupLayout(0),
    entries: [
      {binding: 0, resource: {buffer: particlesBuffer}},
      {binding: 1, resource: texture.createView()},
    ]});
  const passEncoder = particlesEncoder.beginComputePass();
  passEncoder.setPipeline(generateParticlesPipeline);
  passEncoder.setBindGroup(0, bindGroup);
  passEncoder.dispatch(wsCount);
  passEncoder.end();
}
device.queue.submit([particlesEncoder.finish()]);


//[C. render particle] as regular polygons
const shape = 4;
const perR = 64;
const renderWgsl = `
struct Particle {
  @location(0) rgba: vec4<f32>;
  @location(1) uv: vec2<f32>;
};
struct InOut {
  @builtin(position) pos: vec4<f32>;
  @location(0) color: vec4<f32>;
};
@stage(vertex) fn vmain(@builtin(vertex_index) i: u32, particle: Particle) -> InOut {
  let center = vec2<f32>(2.0 * (particle.uv.x - 0.5), -2.0 * (particle.uv.y - 0.5));
  let vid = i % 3u;
  let tid = (i - vid) / 3u;
  if (vid == 0u) {
    return InOut(vec4<f32>(center, 0.0, 1.0), particle.rgba);
  }
  let t = radians(f32(tid + vid - 1u) / f32(${shape}) * 360.0);
  let r = 1.0 / f32(${perR});
  return InOut(vec4<f32>(center + vec2<f32>(-sin(t), cos(t)) * r, 0.0, 1.0), particle.rgba);
}
@stage(fragment) fn fmain(io: InOut) -> @location(0) vec4<f32> {
  return io.color;
};
`;
const renderShader = device.createShaderModule({code: renderWgsl});
const stride = {
  stepMode: "instance",
  arrayStride: particleSize,
  attributes: [
    {shaderLocation: 0, offset: 0, format: "float32x4"},
    {shaderLocation: 1, offset: 4 * Float32Array.BYTES_PER_ELEMENT, format: "float32x2"},
  ]
};

// gpu config for canvas
const canvas = document.getElementById("canvas");
const gpu = canvas.getContext("webgpu");
const format = gpu.getPreferredFormat(adapter);
gpu.configure({device, format, size: [canvas.width, canvas.height]});

const renderPipeline = device.createRenderPipeline({
  primitive: {topology: "triangle-list", cullMode: "back"},
  vertex: {module: renderShader, entryPoint: "vmain", buffers: [stride]},
  fragment: {module: renderShader, entryPoint: "fmain", targets: [{format}]},
});

{
  const view = gpu.getCurrentTexture().createView();
  const clearValue = {r: 0, g: 0, b: 0, a: 1};
  const renderPass = {colorAttachments: [{view, loadOp: "clear", clearValue, loadValue: clearValue, storeOp: "store"}]};
  const commandEncoder = device.createCommandEncoder();
  const passEncoder = commandEncoder.beginRenderPass(renderPass);
  passEncoder.setPipeline(renderPipeline);
  passEncoder.setVertexBuffer(0, particlesBuffer);
  passEncoder.draw(3 * shape,  particleCount);
  passEncoder.end();
  device.queue.submit([commandEncoder.finish()]);
}

NOTE:

  • ビットマップテクスチャは、カラー絵文字(🎎)をSVGに描画させたものを使用
  • ビットマップのアルファ値(0-1)を確率値として使う
  • 元のビットマップを1x1サイズまで半減させ続けたサイズのmipmapを用意する
    • 注意: 8x8のビットマップなら、8x8,4x4,2x2,1x1の4レベルぶん必要
  • mipmapの各RGBAには、二倍拡大mipmapでの4マスぶんの確率値を詰める
    • 利用しやすいよう、Rが左上、Gが左上+右上、Bが左上+右上+左下、Aが4マス全部の和、の確率値を入れる
  • 同サイズのSTORAGEバッファを2つ用意し、mipmapレベルぶんの段数で計算実行させる
    • 初段目の計算シェーダでは、STORAGEバッファにα値のみを埋める
    • 二段目以降の計算シェーダでは、読み込みバッファの4マスの平均値を、書き込みバッファに書き込み、上記のRGBAを計算しmipmapに書き込む
  • 乱数はシェーダー上でglobal_invocation_idを素にした擬似乱数を計算して使う
  • パーティクル生成では、パーティクルの個数分、1x1のmipmapの(0,0)から、ビットマップの座標まで、疑似乱数を使いテクスチャ座標を計算する
    • 割り出したビットマップの座標からビットマップ上の色を取り出す
  • 描画では、計算したパーティクル情報のビットマップ座標と色を、instanceの頂点バッファとして使用し、ビットマップ座標を中心とした小さな正多角形で色を塗る
    • テクスチャ座標は左上が(0,0)で右下が(1,1)なので 左下が(-1,-1)で右上が(1,1)になるよう変換して描画する