🌟

[BEV Projection] Forward and Backward Projection

に公開

BEV Projection

Forward Projection vs Backward Projection

Comparison of Projection Methods in BEV Perception

Aspect Forward Projection (e.g., LSS, BEVDepth) Backward/Query-Based Projection (e.g., BEVFormer)
Memory Cost Smaller(depth estimation module) Larger(Deformable CrossAttn+Temporal SelfAttn+FNN)
Data Scalability Easy: Can be pre-trained with any camera data. Hard: Requires large datasets with 3D infos(3D Labels,3D Egomotion)
Explanability Strong Weak(maybe can check sampling points)
BEV Density Sparse Dense
Calibration Error Sensitive Robust
Depth Scale Umbiguity Sensitive Robust
Image Discretization Discreet. Forward Project Discreet Feature Map Continuous: Backward Project to Continuous Image by bilinear sample
Iterative Refinement No Yes
Parallelization Harder: Random Write(Atomic) Easy: Random Read
Occlusion Struggles Difficult to know occluded or empty voxel grid Better: Can be learned whether occluded or empty voxel
Temporal Modeling Limited Sophisticated: Temporal Self Attention

Reference

BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection
Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?

Discussion