[ICCV25] Estimating 2D Camera Motion with Hybrid Motion Basis

Abstract

Estimating 2D camera motion is a fundamental computer vision task that models the projection of 3D camera movements onto the 2D image plane. Current methods rely on either homography-based approaches, limited to planar scenes, or meshflow techniques that use grid-based local homographies but struggle with complex non-linear transformations. A key insight of our work is that combining flow fields from different homographies creates motion patterns that cannot be represented by any single homography. We introduce CamFlow, a novel framework that represents camera motion using hybrid motion bases: physical bases derived from camera geometry and stochastic bases for complex scenarios. Our approach includes a hybrid probabilistic loss function based on the Laplace distribution that enhances training robustness. For evaluation, we create a new benchmark by masking dynamic objects in existing optical flow datasets to isolate pure camera motion. Experiments show CamFlow outperforms state-of-the-art methods across diverse scenarios, demonstrating superior robustness and generalization in zero-shot settings. Code and datasets are available at our project page: https://lhaippp.github.io/CamFlow/.

Motivation

Non-linearity of Flow Addition

Key Insight: Two homography matrices generate flow1 and flow2. Adding these flows (flow3) differs from the flow derived by multiplying the original homography matrices (homo3). When sampling points from flow3 to solve for a homography, we get inconsistent solutions, proving that combined flow fields cannot be represented by a single homography.

This fundamental limitation of traditional homography-based methods motivates our hybrid motion basis approach, which combines physical bases derived from camera geometry with stochastic bases to handle complex non-linear transformations that cannot be captured by any single homography.

💡 More bases → Better camera motion representation

Method Overview

Based on the insight that combined flow fields cannot be represented by a single homography, our CamFlow framework introduces hybrid motion bases for robust 2D camera motion estimation:

Figure: Our proposed motion estimation framework. Given image pair (I_a, I_b), features are extracted through a multi-scale pyramid and processed by the motion estimation transformer (MET) to compute weights for physical (blue) and noisy (red) motion bases. These weights linearly combine predefined motion bases to generate flow maps for warping. A mask generator predicts uncertainty masks d_ab and d_ba to reject unreliable regions, enhancing estimation robustness.

Physical Motion Bases: Derived from camera geometry principles for fundamental motion patterns
Stochastic Motion Bases: Learned representations for complex non-linear transformations that cannot be captured by single homographies
Motion Estimation Transformer (MET): Computes adaptive weights for combining different motion bases
Uncertainty Prediction: Mask generator rejects unreliable regions for enhanced robustness
Hybrid Probabilistic Loss: Laplace distribution-based loss function for enhanced training robustness

Results

📋 Important Note

🎯 Complete Comparison: We provide subjective results of all comparison methods on GHOF-Cam dataset in our HuggingFace Space. How to compare is demonstrated in our GitHub README, please refer to it for easy reproduction and comparison.

Representative Qualitative Results

Our method demonstrates superior performance across challenging scenarios including low-light and adverse weather conditions. The following comparisons show CamFlow (ours) vs. state-of-the-art baselines.

Dark Environment

CamFlow maintains robust camera motion estimation even in challenging low-light conditions where traditional methods struggle.

Rainy Weather

Our hybrid motion basis approach effectively handles complex motion patterns in adverse weather conditions with rain and reduced visibility.

Method Comparison: Ours vs GT Homo vs GT Flow

Comprehensive comparison across challenging scenarios. Each video shows: Original Images | Ours | GT Homo | GT Flow

Dark Environment

Foggy Weather

Rainy Weather

Snowy Environment

Quantitative Results

CAHomo Benchmark (PME)

Methods	AVG
SIFT + MAGSAC	1.34
SPSG + MAGSAC	0.63
RealSH	0.34
DMHomo	0.31
BasesHomo	0.50
HomoGAN	0.39
Ours	0.32

GHOF-Cam (EPE)

Methods	AVG
SIFT	2.82
SPSG	3.07
CAHomo	2.81
BasesHomo	1.74
Meshflow	2.15
RANSAC-F	3.26
Ours	1.10

GHOF Test (PME)

Methods	AVG
SIFT	4.80
SPSG	4.47
RealSH	1.72
DMHomo	1.75
BasesHomo	2.28
HomoGAN	1.95
Ours	1.23

Comprehensive Results on GHOF-Cam: PSNR, SSIM, LPIPS

Method	AVG			RE			FOG			DARK			RAIN			SNOW
Method	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓
I₃ₓ₃	24.05	0.7403	0.0836	21.06	0.6900	0.0750	26.57	0.7711	0.0821	25.70	0.8506	0.0785	21.53	0.5335	0.1411	25.37	0.8562	0.0412
GT-Homo	32.78	0.9187	0.0570	28.39	0.8697	0.0549	35.23	0.9508	0.0492	31.88	0.9405	0.0575	30.11	0.8511	0.1033	38.31	0.9814	0.0199
SIFT	28.44	0.9074	0.0781	29.23	0.9148	0.0545	29.42	0.9016	0.0768	27.37	0.9074	0.0982	30.00	0.8632	0.1055	26.16	0.9497	0.0558
SPSG	28.01	0.8697	0.0796	21.83	0.7593	0.0886	30.88	0.9049	0.0645	27.60	0.9019	0.0966	28.86	0.8270	0.1103	30.88	0.9556	0.0379
CAHomo	25.29	0.7837	0.0841	22.67	0.7341	0.0805	27.51	0.8048	0.0751	26.12	0.8743	0.0846	22.95	0.6130	0.1420	27.20	0.8924	0.0384
BasesHomo	29.61	0.9026	0.0672	25.08	0.8522	0.0666	31.06	0.9170	0.0627	30.05	0.9303	0.0702	29.58	0.8512	0.1071	32.30	0.9622	0.0292
MeshFlow	29.91	0.9239	0.0688	28.57	0.9216	0.0576	28.68	0.9280	0.0742	29.41	0.9254	0.0774	30.68	0.8747	0.1049	32.23	0.9700	0.0298
HM_Mix	25.77	0.8896	0.0882	26.09	0.8721	0.0596	26.56	0.8753	0.0882	26.43	0.9037	0.1002	28.20	0.8672	0.1107	21.58	0.9296	0.0820
RANSAC-F	26.04	0.8348	0.0890	26.09	0.8812	0.0665	29.22	0.8944	0.0801	27.29	0.9031	0.0923	21.68	0.5585	0.1495	25.90	0.9371	0.0566
Ours	32.09	0.9142	0.0575	27.08	0.8615	0.0558	34.17	0.9371	0.0512	32.36	0.9421	0.0565	30.52	0.8608	0.1021	36.35	0.9692	0.0218

Comprehensive quantitative comparison across different environmental conditions using PSNR (higher better), SSIM (higher better), and LPIPS (lower better) metrics.

Citation

@inproceedings{li2025estimating, title={Estimating 2D Camera Motion with Hybrid Motion Basis}, author={Li, Haipeng and Zhou, Tianhao and Yang, Zhanglei and Wu, Yi and Chen, Yan and Mao, Zijing and Cheng, Shen and Zeng, Bing and Liu, Shuaicheng}, booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}, pages={xxxx--xxxx}, year={2025} }