Reflection Separation from a Single Image
via Joint Latent Diffusion

Zheng-Hui Huang^1,2, Zhixiang Wang^1*, Yu-Lun Liu³, Yung-Yu Chuang²

¹Shanda AI Research Tokyo ²National Taiwan University ³National Yang Ming Chiao Tung University * Corresponding author

CVPR 2026

Paper Code 🤗 Models

Input Transmission

Motivation: Photos shot through glass mix the intended scene with an unwanted reflection — a severely ill-posed single-image problem. We harness diffusion priors to jointly recover both layers; the four examples above were captured in the wild.

Abstract

Single-image reflection separation remains challenging due to its ill-posed nature, especially under extreme conditions with strong or subtle reflections. Existing methods often struggle to recover both layers in glare or weak-reflection scenarios because of insufficient information.

This paper presents the first diffusion model explicitly fine-tuned for this task, leveraging generative diffusion priors for robust separation. Our method simultaneously generates transmission and reflection layers through a unified diffusion model, incorporating a novel cross-layer self-attention mechanism for better feature disentanglement. We further introduce a disjoint sampling strategy to iteratively reduce interference between the layers during diffusion and a latent optimization step with a learned composition function for improved results in complex real-world scenarios.

Extensive experiments show our approach achieves superior separation performance on multiple real-world benchmarks and surpasses state-of-the-art methods in both quantitative metrics and perceptual quality.

Method

Click any red dot on the figure for component details.

Algorithm 1 — Latent Optimization (per t mod 5)
for t = N, ... , 1 do
  ε^T ← ε_θ(z_I, z^T, t, c^T)
  ε^R ← ε_θ(z_I, z^R, t, c^R)

  if t mod 5 == 0 then              ◄ every 5 steps
    for k = 1, ... , 4 do          ◄ 4 inner steps
      ẑ₀^T ← Tweedie(z^T, ε^T, α̅_t)
      ẑ₀^R ← Tweedie(z^R, ε^R, α̅_t)
      ℒ ← ‖z_I − C(ẑ₀^T, ẑ₀^R)‖²   ◄ composition loss
      z^T ← z^T − γ · ∇_z^T ℒ
      z^R ← z^R − γ · ∇_z^R ℒ      ◄ gradient descent
    end for
  end if

  ε̂^T ← ε^T + w(ε^T − ε^R)
  ε̂^R ← ε^R + w(ε^R − ε^T)
  z^T, z^R ← DDIM_step(...)
end for

Overview of our framework: a unified diffusion model jointly generates the transmission and reflection latents via cross-layer self-attention, disjoint sampling, fidelity-guided feature modulation, and a learned latent composition function.

Visual Comparison

Compare with:

Layer:

Input Ours

Drag the slider to compare Ours (right) against the input mixture, ground truth, or state-of-the-art baselines (left). Use the Compare with row to switch the comparison target and Layer to view the transmission or reflection.

Quantitative Comparison

Reflection-layer comparison on Real20.

We compare against state-of-the-art baselines on three real-world benchmarks. The best and second-best results are highlighted; lower (↓) is better for LPIPS / DISTS, higher (↑) is better for PSNR / SSIM.

BibTeX