Reflection Separation from a Single Image
via Joint Latent Diffusion

Zheng-Hui Huang 1,2, Zhixiang Wang 1*, Yu-Lun Liu 3, Yung-Yu Chuang 2
1Shanda AI Research Tokyo    2National Taiwan University    3National Yang Ming Chiao Tung University   * Corresponding author
CVPR 2026
Transmission Reflection Input
Input Transmission
Transmission Reflection Input
Input Transmission
Transmission Reflection Input
Input Transmission
Transmission Reflection Input
Input Transmission

Motivation: Photos shot through glass mix the intended scene with an unwanted reflection — a severely ill-posed single-image problem. We harness diffusion priors to jointly recover both layers; the four examples above were captured in the wild.

Abstract

Single-image reflection separation remains challenging due to its ill-posed nature, especially under extreme conditions with strong or subtle reflections. Existing methods often struggle to recover both layers in glare or weak-reflection scenarios because of insufficient information.

This paper presents the first diffusion model explicitly fine-tuned for this task, leveraging generative diffusion priors for robust separation. Our method simultaneously generates transmission and reflection layers through a unified diffusion model, incorporating a novel cross-layer self-attention mechanism for better feature disentanglement. We further introduce a disjoint sampling strategy to iteratively reduce interference between the layers during diffusion and a latent optimization step with a learned composition function for improved results in complex real-world scenarios.

Extensive experiments show our approach achieves superior separation performance on multiple real-world benchmarks and surpasses state-of-the-art methods in both quantitative metrics and perceptual quality.

Method

Click any red dot on the figure for component details.

Framework overview
Algorithm 1 β€” Latent Optimization (per t mod 5)
for t = N, ... , 1 do
  Ξ΅T ← Ρθ(zI, zT, t, cT)
  Ξ΅R ← Ρθ(zI, zR, t, cR)

  if t mod 5 == 0 then              β—„ every 5 steps
    for k = 1, ... , 4 do          β—„ 4 inner steps
      αΊ‘β‚€T ← Tweedie(zT, Ξ΅T, Ξ±Μ…t)
      αΊ‘β‚€R ← Tweedie(zR, Ξ΅R, Ξ±Μ…t)
      β„’ ← β€–zIC(αΊ‘β‚€T, αΊ‘β‚€R)β€–Β²   β—„ composition loss
      zT ← zT − Ξ³ · βˆ‡zT β„’
      zR ← zR − Ξ³ · βˆ‡zR β„’      β—„ gradient descent
    end for
  end if

  Ξ΅Μ‚T ← Ξ΅T + w(Ξ΅T − Ξ΅R)
  Ξ΅Μ‚R ← Ξ΅R + w(Ξ΅R − Ξ΅T)
  zT, zR ← DDIM_step(...)
end for

Overview of our framework: a unified diffusion model jointly generates the transmission and reflection latents via cross-layer self-attention, disjoint sampling, fidelity-guided feature modulation, and a learned latent composition function.

Visual Comparison

Compare with:
Layer:
Ours Ours Input
Input Ours
Ours Ours Input
Input Ours
Ours Ours Input
Input Ours
Ours Ours Input
Input Ours
Ours Ours Input
Input Ours
Ours Ours Input
Input Ours
Ours Ours Input
Input Ours
Ours Ours Input
Input Ours
Ours Ours Input
Input Ours
Ours Ours Input
Input Ours

Drag the slider to compare Ours (right) against the input mixture, ground truth, or state-of-the-art baselines (left). Use the Compare with row to switch the comparison target and Layer to view the transmission or reflection.

Quantitative Comparison

Transmission quantitative results
Reflection quantitative results

Reflection-layer comparison on Real20.

We compare against state-of-the-art baselines on three real-world benchmarks. The best and second-best results are highlighted; lower (↓) is better for LPIPS / DISTS, higher (↑) is better for PSNR / SSIM.

BibTeX

@inproceedings{huang2026reflection,
  title={Reflection Separation from a Single Image via Joint Latent Diffusion},
  author={Huang, Zheng-Hui and Wang, Zhixiang and Liu, Yu-Lun and Chuang, Yung-Yu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and
             Pattern Recognition (CVPR)},
  year={2026}
}