Abstract
Estimating 2D camera motion is a fundamental computer vision task that models the projection of 3D camera movements onto the 2D image plane. Current methods rely on either homography-based approaches, limited to planar scenes, or meshflow techniques that use grid-based local homographies but struggle with complex non-linear transformations. A key insight of our work is that combining flow fields from different homographies creates motion patterns that cannot be represented by any single homography. We introduce CamFlow, a novel framework that represents camera motion using hybrid motion bases: physical bases derived from camera geometry and stochastic bases for complex scenarios. Our approach includes a hybrid probabilistic loss function based on the Laplace distribution that enhances training robustness. For evaluation, we create a new benchmark by masking dynamic objects in existing optical flow datasets to isolate pure camera motion. Experiments show CamFlow outperforms state-of-the-art methods across diverse scenarios, demonstrating superior robustness and generalization in zero-shot settings. Code and datasets are available at our project page: https://lhaippp.github.io/CamFlow/.
Motivation

Non-linearity of Flow Addition
Key Insight: Two homography matrices generate flow1 and flow2. Adding these flows (flow3) differs from the flow derived by multiplying the original homography matrices (homo3). When sampling points from flow3 to solve for a homography, we get inconsistent solutions, proving that combined flow fields cannot be represented by a single homography.
This fundamental limitation of traditional homography-based methods motivates our hybrid motion basis approach, which combines physical bases derived from camera geometry with stochastic bases to handle complex non-linear transformations that cannot be captured by any single homography.
💡 More bases → Better camera motion representation
Method Overview
Based on the insight that combined flow fields cannot be represented by a single homography, our CamFlow framework introduces hybrid motion bases for robust 2D camera motion estimation:

- Physical Motion Bases: Derived from camera geometry principles for fundamental motion patterns
- Stochastic Motion Bases: Learned representations for complex non-linear transformations that cannot be captured by single homographies
- Motion Estimation Transformer (MET): Computes adaptive weights for combining different motion bases
- Uncertainty Prediction: Mask generator rejects unreliable regions for enhanced robustness
- Hybrid Probabilistic Loss: Laplace distribution-based loss function for enhanced training robustness
Results
Representative Qualitative Results
Our method demonstrates superior performance across challenging scenarios including low-light and adverse weather conditions. The following comparisons show CamFlow (ours) vs. state-of-the-art baselines.
Dark Environment

CamFlow maintains robust camera motion estimation even in challenging low-light conditions where traditional methods struggle.
Rainy Weather

Our hybrid motion basis approach effectively handles complex motion patterns in adverse weather conditions with rain and reduced visibility.
Method Comparison: Ours vs GT Homo vs GT Flow
Comprehensive comparison across challenging scenarios. Each video shows: Original Images | Ours | GT Homo | GT Flow
Dark Environment

Foggy Weather

Rainy Weather

Snowy Environment

Quantitative Results
CAHomo Benchmark (PME)
Methods | AVG |
---|---|
SIFT + MAGSAC | 1.34 |
SPSG + MAGSAC | 0.63 |
RealSH | 0.34 |
DMHomo | 0.31 |
BasesHomo | 0.50 |
HomoGAN | 0.39 |
Ours | 0.32 |
GHOF-Cam (EPE)
Methods | AVG |
---|---|
SIFT | 2.82 |
SPSG | 3.07 |
CAHomo | 2.81 |
BasesHomo | 1.74 |
Meshflow | 2.15 |
RANSAC-F | 3.26 |
Ours | 1.10 |
GHOF Test (PME)
Methods | AVG |
---|---|
SIFT | 4.80 |
SPSG | 4.47 |
RealSH | 1.72 |
DMHomo | 1.75 |
BasesHomo | 2.28 |
HomoGAN | 1.95 |
Ours | 1.23 |
Comprehensive Results on GHOF-Cam: PSNR, SSIM, LPIPS
Method | AVG | RE | FOG | DARK | RAIN | SNOW | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PSNR↑ | SSIM↑ | LPIPS↓ | PSNR↑ | SSIM↑ | LPIPS↓ | PSNR↑ | SSIM↑ | LPIPS↓ | PSNR↑ | SSIM↑ | LPIPS↓ | PSNR↑ | SSIM↑ | LPIPS↓ | PSNR↑ | SSIM↑ | LPIPS↓ | |
I₃ₓ₃ | 24.05 | 0.7403 | 0.0836 | 21.06 | 0.6900 | 0.0750 | 26.57 | 0.7711 | 0.0821 | 25.70 | 0.8506 | 0.0785 | 21.53 | 0.5335 | 0.1411 | 25.37 | 0.8562 | 0.0412 |
GT-Homo | 32.78 | 0.9187 | 0.0570 | 28.39 | 0.8697 | 0.0549 | 35.23 | 0.9508 | 0.0492 | 31.88 | 0.9405 | 0.0575 | 30.11 | 0.8511 | 0.1033 | 38.31 | 0.9814 | 0.0199 |
SIFT | 28.44 | 0.9074 | 0.0781 | 29.23 | 0.9148 | 0.0545 | 29.42 | 0.9016 | 0.0768 | 27.37 | 0.9074 | 0.0982 | 30.00 | 0.8632 | 0.1055 | 26.16 | 0.9497 | 0.0558 |
SPSG | 28.01 | 0.8697 | 0.0796 | 21.83 | 0.7593 | 0.0886 | 30.88 | 0.9049 | 0.0645 | 27.60 | 0.9019 | 0.0966 | 28.86 | 0.8270 | 0.1103 | 30.88 | 0.9556 | 0.0379 |
CAHomo | 25.29 | 0.7837 | 0.0841 | 22.67 | 0.7341 | 0.0805 | 27.51 | 0.8048 | 0.0751 | 26.12 | 0.8743 | 0.0846 | 22.95 | 0.6130 | 0.1420 | 27.20 | 0.8924 | 0.0384 |
BasesHomo | 29.61 | 0.9026 | 0.0672 | 25.08 | 0.8522 | 0.0666 | 31.06 | 0.9170 | 0.0627 | 30.05 | 0.9303 | 0.0702 | 29.58 | 0.8512 | 0.1071 | 32.30 | 0.9622 | 0.0292 |
MeshFlow | 29.91 | 0.9239 | 0.0688 | 28.57 | 0.9216 | 0.0576 | 28.68 | 0.9280 | 0.0742 | 29.41 | 0.9254 | 0.0774 | 30.68 | 0.8747 | 0.1049 | 32.23 | 0.9700 | 0.0298 |
HM_Mix | 25.77 | 0.8896 | 0.0882 | 26.09 | 0.8721 | 0.0596 | 26.56 | 0.8753 | 0.0882 | 26.43 | 0.9037 | 0.1002 | 28.20 | 0.8672 | 0.1107 | 21.58 | 0.9296 | 0.0820 |
RANSAC-F | 26.04 | 0.8348 | 0.0890 | 26.09 | 0.8812 | 0.0665 | 29.22 | 0.8944 | 0.0801 | 27.29 | 0.9031 | 0.0923 | 21.68 | 0.5585 | 0.1495 | 25.90 | 0.9371 | 0.0566 |
Ours | 32.09 | 0.9142 | 0.0575 | 27.08 | 0.8615 | 0.0558 | 34.17 | 0.9371 | 0.0512 | 32.36 | 0.9421 | 0.0565 | 30.52 | 0.8608 | 0.1021 | 36.35 | 0.9692 | 0.0218 |
Comprehensive quantitative comparison across different environmental conditions using PSNR (higher better), SSIM (higher better), and LPIPS (lower better) metrics.