🖼️

AppleのMetalフレームワークを使用したホモグラフィ変換の実装

2024/05/16に公開

iOS

ホモグラフィ変換の基本

ホモグラフィ(Homography)変換:
- 平面上の点を別の平面上の点に変換する射影変換です。
- 3x3のホモグラフィ行列を使用して座標を変換します。
- 座標は正規化が必要です。

Apple Metalフレームワークの基本概念

Metalとは:
- Metalは、Appleが提供する低レベルのグラフィックスおよびコンピュートAPIで、iOS、macOS、tvOSで使用できます。
- 高効率なGPUアクセスを提供し、グラフィックスレンダリングやデータ並列計算のパフォーマンスを向上させます。
- DirectXやOpenGLのような他のグラフィックスAPIに比べて、より直接的かつ効率的なハードウェア制御を可能にします。
テクスチャ:
- 画像データをGPUメモリ上に格納し、シェーダーからアクセスするためのオブジェクト。
- テクスチャ座標（UV座標）を使用して特定の位置を参照します。
コマンドバッファ:
- GPUに送信する一連のコマンドを記録するオブジェクト。
- 複数のコマンドを一括して送信し、効率的な処理を実現します。
- 同期と依存関係の管理、非同期実行をサポートします。
コマンドエンコーダー:
- コマンドバッファにコマンドを記録するためのオブジェクト。
- レンダーコマンドエンコーダー、コンピュートコマンドエンコーダー、ブラスターコマンドエンコーダーがあります。
パイプラインステート:
- GPUにおける描画や計算の設定をカプセル化したオブジェクト。
- レンダーパイプラインステートとコンピュートパイプラインステートがあります。
- シェーダーの設定、レンダリングターゲットのフォーマット、ブレンドモード、深度・ステンシルテストの設定などが含まれます。

ホモグラフィ変換の実装

シェーダーコード

入力テクスチャを読み込み、ホモグラフィ行列を使用して座標を変換し、出力テクスチャに書き込みます。
バックワードマッピングを使用して、画素抜けを防ぐために逆ホモグラフィ行列を使用します。

#include <metal_stdlib>
using namespace metal;

kernel void homographyTransform(
    texture2d<float, access::read> inputTexture [[texture(0)]],   // 入力テクスチャ
    texture2d<float, access::write> outputTexture [[texture(1)]], // 出力テクスチャ
    constant float4x4 &inverseHomographyMatrix [[buffer(0)]],     // 逆ホモグラフィ行列
    uint2 gid [[thread_position_in_grid]])                        // スレッドの位置
{
    if (gid.x >= outputTexture.get_width() || gid.y >= outputTexture.get_height()) {
        return;
    }

    float2 outputCoord = float2(gid) / float2(outputTexture.get_width(), outputTexture.get_height());

    float3 inputCoord = float3(outputCoord, 1.0) * inverseHomographyMatrix;
    inputCoord /= inputCoord.z;

    float2 textureCoord = float2(inputCoord.x, inputCoord.y);

    if (textureCoord.x >= 0.0 && textureCoord.x <= 1.0 && textureCoord.y >= 0.0 && textureCoord.y <= 1.0) {
        outputTexture.write(inputTexture.sample(sampler(mip_filter::linear), textureCoord), gid);
    } else {
        outputTexture.write(float4(0.0), gid);
    }
}

ホストコード

逆ホモグラフィ行列を計算し、シェーダーに渡します。
テクスチャを作成し、コマンドバッファとコマンドエンコーダーを使用して処理を実行します。

import Metal
import MetalKit

class HomographyRenderer {
    var device: MTLDevice!
    var commandQueue: MTLCommandQueue!
    var pipelineState: MTLComputePipelineState!

    init() {
        device = MTLCreateSystemDefaultDevice()
        commandQueue = device.makeCommandQueue()
        
        let library = device.makeDefaultLibrary()
        let kernelFunction = library?.makeFunction(name: "homographyTransform")
        pipelineState = try! device.makeComputePipelineState(function: kernelFunction!)
    }

    func applyHomography(inputImage: UIImage, homographyMatrix: float4x4) async -> UIImage? {
        guard let cgImage = inputImage.cgImage else { return nil }
        
        let inverseMatrix = homographyMatrix.inverse
        
        let textureLoader = MTKTextureLoader(device: device)
        let inputTexture = try! textureLoader.newTexture(cgImage: cgImage, options: nil)
        
        let inputWidth = inputTexture.width
        let inputHeight = inputTexture.height
        
        let boundingBox = calculateBoundingBox(homographyMatrix: inverseMatrix, width: inputWidth, height: inputHeight)
        
        let outputWidth = Int(boundingBox.max.x - boundingBox.min.x)
        let outputHeight = Int(boundingBox.max.y - boundingBox.min.y)
        
        let descriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .rgba8Unorm, width: outputWidth, height: outputHeight, mipmapped: false)
        descriptor.usage = [.shaderRead, .shaderWrite]
        let outputTexture = device.makeTexture(descriptor: descriptor)!
        
        let commandBuffer = commandQueue.makeCommandBuffer()!
        let commandEncoder = commandBuffer.makeComputeCommandEncoder()!
        commandEncoder.setComputePipelineState(pipelineState)
        commandEncoder.setTexture(inputTexture, index: 0)
        commandEncoder.setTexture(outputTexture, index: 1)
        
        var inverseMatrixCopy = inverseMatrix
        let matrixBuffer = device.makeBuffer(bytes: &inverseMatrixCopy, length: MemoryLayout<float4x4>.size, options: .storageModeShared)
        commandEncoder.setBuffer(matrixBuffer, offset: 0, index: 0)
        
        let threadGroupCount = MTLSize(width: 16, height: 16, depth: 1)
        let threadGroups = MTLSize(width: (outputTexture.width + 15) / 16, height: (outputTexture.height + 15) / 16, depth: 1)
        commandEncoder.dispatchThreadgroups(threadGroups, threadsPerThreadgroup: threadGroupCount)
        commandEncoder.endEncoding()
        
        commandBuffer.commit()
        await withCheckedContinuation { continuation in
            commandBuffer.addCompletedHandler { _ in
                continuation.resume()
            }
        }
        
        let outputCGImage = outputTexture.toImage()
        return UIImage(cgImage: outputCGImage)
    }
}

extension MTLTexture {
    func toImage() -> CGImage {
        let width = self.width
        let height = self.height
        let rowBytes = width * 4
        
        let colorSpace = CGColorSpaceCreateDeviceRGB()
        let bitmapInfo = CGBitmapInfo(rawValue: CGImageAlphaInfo.premultipliedLast.rawValue)
        let context = CGContext(data: nil, width: width, height: height, bitsPerComponent: 8, bytesPerRow: rowBytes, space: colorSpace, bitmapInfo: bitmapInfo.rawValue)!
        
        let pixels = UnsafeMutableRawPointer.allocate(byteCount: rowBytes * height, alignment: MemoryLayout<UInt8>.alignment)
        self.getBytes(pixels, bytesPerRow: rowBytes, from: MTLRegionMake2D(0, 0, width, height), mipmapLevel: 0)
        
        context.data!.copyMemory(from: pixels, byteCount: rowBytes * height)
        let cgImage = context.makeImage()!
        pixels.deallocate()
        
        return cgImage
    }
}

func transformedPoint(point: SIMD3<Float>, matrix: float4x4) -> SIMD2<Float> {
    let transformedPoint = matrix * point
    return SIMD2<Float>(transformedPoint.x / transformedPoint.z, transformedPoint.y / transformedPoint.z)
}

func calculateBoundingBox(homographyMatrix: float4x4, width: Int, height: Int) -> (min: SIMD2<Float>, max: SIMD2<Float>) {
    let corners: [SIMD3<Float>] = [
        SIMD3<Float>(0, 0, 1),
        SIMD3<Float>(Float(width), 0, 1),
        SIMD3<Float>(0, Float(height), 1),
        SIMD3<Float>(Float(width), Float(height), 1)
    ]

    var minX = Float.greatestFiniteMagnitude
    var minY = Float.greatestFiniteMagnitude
    var maxX = -Float.greatestFiniteMagnitude
    var maxY = -Float.greatestFiniteMagnitude

    for corner in corners {
        let transformed = transformedPoint(point: corner, matrix: homographyMatrix)
        minX = min(minX, transformed.x)
        minY = min(minY, transformed.y)
        maxX = max(maxX, transformed.x)
        maxY = max(maxY, transformed.y)
    }

    return (min: SIMD2<Float>(minX, minY), max: SIMD2<Float>(maxX, maxY))
}

画素抜けの対策

バックワードマッピング（逆変換）:
- 変換後の画像から元の画像に対応するピクセルを見つける方法。
- 逆ホモグラフィ行列を使用して元の座標を計算し、補間によるサンプリングを行います。
- これにより、変換後の画像の画素抜けを防ぎます。

まとめ

本記事では、AppleのMetalフレームワークを使用したホモグラフィ変換の基本概念、実装方法、および画素抜け対策について詳細に説明しました。ホモグラフィ変換は、画像の射影変換を行う強力な手法であり、正しい実装と補間技術を用いることで、高品質な画像変換を実現できます。

GitHubで編集を提案

ホモグラフィ変換の基本

Apple Metalフレームワークの基本概念

ホモグラフィ変換の実装

シェーダーコード

ホストコード

画素抜けの対策

まとめ

Discussion