Approach-2 Using AutoPipelineForText2Image

In order to use task-oriented pipeline, Diffusers also provide AutoPipeline, where we have more flexibility in running inference by enabling the use_safetensors to directly load weights. By automatically identifying the appropriate pipeline class, the AutoPipeline eliminates the need to know the exact class name, simplifying the process of loading a checkpoint for a given task.

Import required Libraries

Python3




import torch
from diffusers import AutoPipelineForText2Image


Create Auto Pipeline for Text to Image

The syntax is similar as approach-1, but here we also define use_safetensors to be True and variant to run on floating point 16-bit precision. Notice one change, here we are using the Stabel Diffusion XL pre-trained model, which is the most advanced model in the current date.

Python3




pipe = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe = pipe.to("cuda")


Define prompt and run Pipeline

Use the same prompt and check the response quality between the base model (v1.5) and advanced model (xl).

Python3




prompt = "a horse racing near beach, 8k, realistic photography"
image = pipe(prompt=prompt).images[0]
image


Output:

Output from Stable-diffusion-XL

Stable Diffusion XL gives more accurate result compared to Stable Diffusion v1.5 as in prompt we mentioned beach, but v1.5 doesn’t have beach in its image. With this we conclude.

Build Text To Image with HuggingFace Diffusers

This article will implement the Text 2 Image application using the Hugging Face Diffusers library. We will demonstrate two different pipelines with 2 different pre-trained Stable Diffusion models. Before we dive into code implementation, let us understand Stable Diffusion.

Similar Reads

What is Stable Diffusion?

With the advancement of AI in the Image and Video domain, one might come across a word called Stable Diffusion that can perform tasks such as Text-to-Image, Text-to-Video, Image-to-Video, Image-to-Image and so on. To understand Stable Diffusion as a whole, it all started as a cutting-edge text-to-image latent diffusion model developed collaboratively by researchers and engineers associated with CompVis, Stability AI, and LAION. The model originated from the research paper “High-Resolution Image Synthesis with Latent Diffusion Models” written by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. The fundamental idea behind latent diffusion revolves around implementing the diffusion process within a lower-dimensional latent space, effectively addressing the challenges posed by memory and computational demands in high-resolution image synthesis....

Hugging Face Diffusers

In order to implement Stable Diffusion model using GitHub repository is not beginner friendly. To make it more appealing to the user HuggingFace released Diffusers, an open-source repository for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Further just like HuggingFace transformers, even diffusers support various pipelines which makes running state-of-art models run withing one-two lines of code....

Installation

! pip install diffusers accelerate...

Approach-1 Using StableDiffusionPipeline

In approach-1 we will use simple Stable Diffusion pipeline using a pre-trained model open sourced by RunwayML....

Approach-2 Using AutoPipelineForText2Image

...

Conclusion

...

Contact Us