BIRD: Exploring Image Restoration with Pre-trained Diffusion Models

Overview

This paper tackles the Image Restoration (IR) problem similarly to how GAN inversion is handled. It aims to optimize the loss function to find the noise that yields better results in the reverse phase of diffusion models. Notably, it doesn’t train or fine-tune the Diffusion model but uses it as an image prior.

Modeling the Problem

The paper explains that the IR problem can be modeled as a Maximum a Posteriori (MAP) optimization problem:

$$
\hat{x} = \arg \max_x {\log p(y|x) + \log p(x)}
$$

Here, $\hat{x}$ is the restored image and $y$ is the degraded image. In this equation, $p(y|x)$ is the likelihood, and $p(x)$ is the prior distribution. The inverse problem of image restoration is modeled as:

$$
y = H_\eta (x) + n
$$

where $H_\eta$ is the degradation model with parameters $\eta$, and $n$ is the noise. Assuming Gaussian noise in the MAP formulation, the first equation can be rewritten as:

$$
\hat{x}, \hat{\eta} = \arg \min_{x\in \mathbb{R}^{HR}, \eta} \lVert y - H_\eta(x) \rVert^2 + \lambda \left( -\log p(x) \right)
$$

Here, the second term is the regularization term, and $\lambda > 0$ controls the balance between the likelihood and the prior.

Based on some assumptions, this can be simplified to a constrained optimization problem:

$$
\hat{x}, \hat{\eta} = \arg \min_{x\in \mathbb{R}^{HR}, \eta} \lVert y - H_\eta(x) \rVert^2 \newline
\text{s.t.} \quad -\log p(x) \le \rho
$$

If we have a mapping $ g: \mathbb{R}^Z \rightarrow \mathbb{R}^{HR} $ that induces $p(x)$ based on the normal distribution $q(x) = \mathcal{N}(z; 0, I)$, then:

$$
\hat{z}, \hat{\eta} = \arg \min_{z\in \mathbb{R}^Z, \eta} \lVert y - H_\eta(g(z)) \rVert^2
\newline
\text{s.t.} \quad -\log p(g(z)) \le \rho
$$

Assuming that $g$ always ensures $p(g(z)) \ge e^{-\rho}$ when $|z|^2 = d_1 \times d_2$, in high dimensions, $\frac{1}{d_1 \times d_2} |z|^2 = 1$ as $d_1 \times d_2 \rightarrow \infty$. Thus, we can approximate the problem as:

$$
\hat{z}, \hat{\eta} = \arg \min_{z: |z|^2 = d_1 \times d_2, \eta} \lVert y - H_\eta(g(z)) \rVert^2
$$

Finally, $\hat{x} = g(\hat{z})$. To find this mapping, the paper proposes using a pre-trained Denoising Diffusion Implicit Model (DDIM).

Efficient Diffusion Inversion

This method utilizes a pre-trained DDIM for efficient and realistic image restoration without retraining. The approach iteratively optimizes both the degradation model parameters and the restored image at test time, making it adaptable to various IR tasks.

The Final Algorithm

Here’s a visual summary of the algorithm:

Algorithm 1

Algorithm 2

The proposed method demonstrates that leveraging existing diffusion models can significantly enhance the IR process by maintaining a strong adherence to the image data manifold, ensuring high-quality restoration outputs without the need for model retraining.

References

Chihaoui, H., Lemkhenter, A., & Favaro, P. (2024). Blind Image Restoration via Fast Diffusion Inversion. Retrieved from arXiv:2405.19572 [cs.CV].