Things You Need To Know To Get Started With Diffusion Models

author

Calibraint

Author

October 29, 2023

Last updated: August 13, 2024

Guide To Diffusion Models

What is a Diffusion Model?

Diffusion Models are a class of generative models that can produce realistic and diverse data, such as images, text, audio, and video. They are based on the idea of transforming the data distribution into a simple noise distribution through a series of random diffusion steps. By reversing this process, we can sample new data from the noise distribution using a learned score function that guides the diffusion towards the data distribution.

AI generated image using diffusion models

What are the advantages of Diffusion Models?

Diffusion Models have several advantages over other generative models, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs). Some of these advantages are:

  • They do not suffer from mode collapse, where the stable diffusion models only generate a few modes of data distribution and ignore the rest.
  • They do not require adversarial training, which can be unstable and hard to tune.
  • They can handle discrete and continuous data without any special tricks or modifications. 
  • They can generate high-resolution and high-fidelity data with fewer parameters and less computation.

What are forward and reverse diffusion processes?

The forward and reverse diffusion processes are the core components of the Diffusion Model. They define how the data is transformed into noise and how the noise is transformed back into data.

Forward diffusion process

The forward diffusion process is a Markov chain that starts from the original data x and ends at a noise sample ε. At each step t, the data is corrupted by adding Gaussian noise to it. The noise level increases as t increases until it reaches 1 at the final step T. At this point, x_T is completely random and independent of x.

x_t = √(1 – βt) * x(t-1) + √β_t * η_t
where β_t is the noise level at step t, and η_t is a standard Gaussian random variable. The noise level β_t increases as t increases until it reaches 1 at the final step T. At this point, x_T is completely random and independent of x.

Reverse diffusion process

The reverse diffusion process is the inverse of the forward diffusion process. It starts from a noise sample ε and ends at a data sample x. At each step t, the noise is reduced by subtracting Gaussian noise from it. The noise level decreases as t decreases until it reaches 0 at the initial step 0. At this point, ε0 is equal to x.

ε_t = √(1 – β_t) * ε(t+1) – √β_t * η_t

where β_t is the same noise level as in the forward diffusion process, and η_t is a standard Gaussian random variable. The noise level β_t decreases as t decreases until it reaches 0 at the initial step 0. At this point, ε_0 is equal to x.

Denoising process

How to set up the forward and reverse diffusion processes?

In practice, we do not know the exact value of ηt at each step. Therefore, we need a score function s_t(x_t) that estimates the conditional distribution of x(t-1) given x_t. The score function s_t(x_t) tells us how likely x_(t-1) is for a given x_t, and how to adjust x_t to make it closer to x_(t-1). We can use the score function s_t(x_t) to sample from the reverse diffusion process using Langevin dynamics:

x_(t-1) = x_t + α_t * s_t(x_t) + √(2 * α_t) * ζ

where α_t is the step size at step t, and ζ is a standard Gaussian random variable. By repeating this process from t = T to t = 0, we can generate a data sample x from a noise sample ε.

How to choose the noise schedule and the number of steps?

The noise schedule and the number of steps are two important hyperparameters that affect the performance of the Diffusion Model. They determine how fast and how smoothly the data is transformed into noise and vice versa.

The noise schedule is a sequence of noise levels β_t that control the amount of Gaussian noise added or subtracted at each step t. A common choice for the noise schedule is to use a geometric progression:

β_t = β * (1 – β)^(T – 1 – t)

where β is a constant between 0 and 1, and T is the total number of steps. This noise schedule ensures that the variance of x_t is constant for all t, which simplifies the score function estimation.

The number of steps T is the length of the forward and reverse diffusion processes. It affects the quality and diversity of the generated data. A larger T means that the data is more corrupted by noise, which makes it harder to recover from the noise, but also allows for more variation in the data. A smaller T means that the data is less corrupted by noise, which makes it easier to recover from the noise, but also limits the variation in the data.

There is a trade-off between the noise schedule and the number of steps. A more aggressive noise schedule (larger β) requires more steps to achieve better quality, while a less aggressive noise schedule (smaller β) requires fewer steps to achieve good diversity. The optimal choice of these hyperparameters depends on the data domain, the score function architecture, and the computational budget.

Note:

β is a constant between 0 and 1 that controls the noise level in the Diffusion Model. A larger β means that more noise is added or subtracted at each step, while a smaller β means that less noise is added or subtracted at each step. A larger β makes the data more corrupted by noise, while a smaller β makes the data less corrupted by noise.

0.5 is the middle value of β and is neither considered as a small nor a large β. It is a middle value of β that balances the trade-off between quality and diversity in the Diffusion Model. It means that the noise level is 50% at the final step of the forward diffusion process and 50% at the initial step of the reverse diffusion process. It is a balanced choice that preserves some information and some variation in the data. 

However, it may not be the optimal choice for every data domain or score function architecture. You may need to experiment with different values of β to find the best one for your task. 

How to train a Diffusion Model

To sample from the trained Diffusion Model, we need to follow the reverse diffusion process using the score function and Langevin dynamics. 

Here are the steps to do that:

  1. Start from a random noise sample ε ~ N(0, I), where I is the identity matrix.
  1. Set t = T, where T is the total number of steps in the forward and reverse diffusion processes.
  1. While t > 0, do the following:
  1. Compute the score function output s_t(x_t) by feeding x_t to the neural network.
  1. Update x_(t-1) by using the Langevin dynamics formula:

x_(t-1) = x_t + α_t * s_t(x_t) + √(2 * α_t) * ζ

where α_t is the step size at step t, and ζ is a standard Gaussian random variable.

  1. Decrease t by 1. 
  2. Return x_0 as the sampled data.

Final note

Diffusion Model in AI is a promising research direction in the field of generative AI modeling. They have shown impressive results in various data domains, such as images, text, audio, and video. Applications of diffusion models can be found in areas such as data augmentation, super-resolution, inpainting, style transfer, and more. 

However, there are still some challenges and limitations that need to be addressed in the future. Experts are working on solutions to overcome the challenges and improve its results but until then Happy Diffusing readers.

Related Articles

field image

AI in Real Estate In 2025, AI in real estate is no longer just a buzzword. It’s the competitive edge that separates top-performing agents from those stuck in outdated workflows. A Forbes study revealed that 85% of real estate professionals expect artificial intelligence to significantly impact the industry this year. And it’s already happening. Clients […]

author-image

Calibraint

Author

10 Apr 2025

field image

AI in Media and Entertainment What if the next blockbuster, hit song, or viral video wasn’t just powered by human creativity—but by artificial intelligence? The role of AI in media and entertainment has swiftly moved from experimental to essential. Today, over 64% of media companies are already using AI in some form, according to PwC’s […]

author-image

Calibraint

Author

09 Apr 2025

field image

Have you ever wondered how artificial intelligence (AI) is transforming the world around you? From automating tedious tasks to enhancing decision-making, AI is driving the next wave of innovation across industries. According to a PwC report, AI could contribute up to $15.7 trillion to the global economy by 2030. Businesses in healthcare, banking, eCommerce, and […]

author-image

Calibraint

Author

03 Apr 2025

field image

Web3 promises decentralization, transparency, and security, but to reach its full potential, it needs intelligence and adaptability, this is where AI comes in. By integrating AI in Web3, businesses can automate complex processes, improve decision-making, and create more user-centric experiences. AI enhances blockchain’s efficiency by optimizing smart contracts, enabling predictive analytics, and powering autonomous systems […]

author-image

Calibraint

Author

31 Mar 2025

field image

The automobile sector has always been at the forefront of technical progress, from the introduction of the first Model T in 1908 to the recent boom in electric vehicles. Today, Artificial Intelligence AI in automotive industry is propelling another revolution. According to Allied industry Research, the worldwide automotive AI industry is predicted to reach $15.9 […]

author-image

Calibraint

Author

27 Mar 2025

field image

Introduction AI is becoming a necessity for a majority of enterprises in 2025. As businesses navigate an increasingly data-driven world, understanding AI’s impact is important for making well-informed decisions. This blog post is essential for enterprises looking to use AI consulting companies for automation, data analytics, and decision-making, ensuring they stay ahead in the competitive […]

author-image

Calibraint

Author

24 Mar 2025

Let's Start A Conversation

Table of Contents