Estimating 2D Camera Motion with Hybrid Motion Basis

Haipeng Li1*, Tianhao Zhou1*, Zhanglei Yang1, Yi Wu2, Yan Chen2, Zijing Mao2, Shen Cheng3, Bing Zeng1, Shuaicheng Liu1†
1University of Electronic Science and Technology of China, 2Xiaomi Corporation, 3Dexmal
*Equal contribution, Corresponding author

Abstract

Estimating 2D camera motion is a fundamental computer vision task that models the projection of 3D camera movements onto the 2D image plane. Current methods rely on either homography-based approaches, limited to planar scenes, or meshflow techniques that use grid-based local homographies but struggle with complex non-linear transformations. A key insight of our work is that combining flow fields from different homographies creates motion patterns that cannot be represented by any single homography. We introduce CamFlow, a novel framework that represents camera motion using hybrid motion bases: physical bases derived from camera geometry and stochastic bases for complex scenarios. Our approach includes a hybrid probabilistic loss function based on the Laplace distribution that enhances training robustness. For evaluation, we create a new benchmark by masking dynamic objects in existing optical flow datasets to isolate pure camera motion. Experiments show CamFlow outperforms state-of-the-art methods across diverse scenarios, demonstrating superior robustness and generalization in zero-shot settings. Code and datasets are available at our project page: https://lhaippp.github.io/CamFlow/.

Motivation

Non-linearity of flow addition

Non-linearity of Flow Addition

Key Insight: Two homography matrices generate flow1 and flow2. Adding these flows (flow3) differs from the flow derived by multiplying the original homography matrices (homo3). When sampling points from flow3 to solve for a homography, we get inconsistent solutions, proving that combined flow fields cannot be represented by a single homography.

This fundamental limitation of traditional homography-based methods motivates our hybrid motion basis approach, which combines physical bases derived from camera geometry with stochastic bases to handle complex non-linear transformations that cannot be captured by any single homography.

💡 More bases → Better camera motion representation

Method Overview

Based on the insight that combined flow fields cannot be represented by a single homography, our CamFlow framework introduces hybrid motion bases for robust 2D camera motion estimation:

CamFlow Method Framework
Figure: Our proposed motion estimation framework. Given image pair (Ia, Ib), features are extracted through a multi-scale pyramid and processed by the motion estimation transformer (MET) to compute weights for physical (blue) and noisy (red) motion bases. These weights linearly combine predefined motion bases to generate flow maps for warping. A mask generator predicts uncertainty masks dab and dba to reject unreliable regions, enhancing estimation robustness.
  • Physical Motion Bases: Derived from camera geometry principles for fundamental motion patterns
  • Stochastic Motion Bases: Learned representations for complex non-linear transformations that cannot be captured by single homographies
  • Motion Estimation Transformer (MET): Computes adaptive weights for combining different motion bases
  • Uncertainty Prediction: Mask generator rejects unreliable regions for enhanced robustness
  • Hybrid Probabilistic Loss: Laplace distribution-based loss function for enhanced training robustness

Results

Representative Qualitative Results

Our method demonstrates superior performance across challenging scenarios including low-light and adverse weather conditions. The following comparisons show CamFlow (ours) vs. state-of-the-art baselines.

Dark Environment

Dark environment comparison

CamFlow maintains robust camera motion estimation even in challenging low-light conditions where traditional methods struggle.

Rainy Weather

Rainy weather comparison

Our hybrid motion basis approach effectively handles complex motion patterns in adverse weather conditions with rain and reduced visibility.

Method Comparison: Ours vs GT Homo vs GT Flow

Comprehensive comparison across challenging scenarios. Each video shows: Original Images | Ours | GT Homo | GT Flow

Dark Environment

Dark environment method comparison

Foggy Weather

Foggy weather method comparison

Rainy Weather

Rainy weather method comparison

Snowy Environment

Snowy environment method comparison

Quantitative Results

CAHomo Benchmark (PME)

Methods AVG
SIFT + MAGSAC1.34
SPSG + MAGSAC0.63
RealSH0.34
DMHomo0.31
BasesHomo0.50
HomoGAN0.39
Ours0.32

GHOF-Cam (EPE)

Methods AVG
SIFT2.82
SPSG3.07
CAHomo2.81
BasesHomo1.74
Meshflow2.15
RANSAC-F3.26
Ours1.10

GHOF Test (PME)

Methods AVG
SIFT4.80
SPSG4.47
RealSH1.72
DMHomo1.75
BasesHomo2.28
HomoGAN1.95
Ours1.23

Comprehensive Results on GHOF-Cam: PSNR, SSIM, LPIPS

Method AVG RE FOG DARK RAIN SNOW
PSNR↑SSIM↑LPIPS↓ PSNR↑SSIM↑LPIPS↓ PSNR↑SSIM↑LPIPS↓ PSNR↑SSIM↑LPIPS↓ PSNR↑SSIM↑LPIPS↓ PSNR↑SSIM↑LPIPS↓
I₃ₓ₃24.050.74030.083621.060.69000.075026.570.77110.082125.700.85060.078521.530.53350.141125.370.85620.0412
GT-Homo32.780.91870.057028.390.86970.054935.230.95080.049231.880.94050.057530.110.85110.103338.310.98140.0199
SIFT28.440.90740.078129.230.91480.054529.420.90160.076827.370.90740.098230.000.86320.105526.160.94970.0558
SPSG28.010.86970.079621.830.75930.088630.880.90490.064527.600.90190.096628.860.82700.110330.880.95560.0379
CAHomo25.290.78370.084122.670.73410.080527.510.80480.075126.120.87430.084622.950.61300.142027.200.89240.0384
BasesHomo29.610.90260.067225.080.85220.066631.060.91700.062730.050.93030.070229.580.85120.107132.300.96220.0292
MeshFlow29.910.92390.068828.570.92160.057628.680.92800.074229.410.92540.077430.680.87470.104932.230.97000.0298
HM_Mix25.770.88960.088226.090.87210.059626.560.87530.088226.430.90370.100228.200.86720.110721.580.92960.0820
RANSAC-F26.040.83480.089026.090.88120.066529.220.89440.080127.290.90310.092321.680.55850.149525.900.93710.0566
Ours32.090.91420.057527.080.86150.055834.170.93710.051232.360.94210.056530.520.86080.102136.350.96920.0218

Comprehensive quantitative comparison across different environmental conditions using PSNR (higher better), SSIM (higher better), and LPIPS (lower better) metrics.

Citation

@inproceedings{li2025estimating, title={Estimating 2D Camera Motion with Hybrid Motion Basis}, author={Li, Haipeng and Zhou, Tianhao and Yang, Zhanglei and Wu, Yi and Chen, Yan and Mao, Zijing and Cheng, Shen and Zeng, Bing and Liu, Shuaicheng}, booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}, pages={xxxx--xxxx}, year={2025} }