Text-to-Image using Stable Diffusion HuggingFace Model

Models available through HuggingFace utilize advanced machine-learning techniques for a variety of applications, from natural language processing to computer vision. Recently, they have expanded to include the ability to generate images directly from text descriptions, prominently featuring models like Stable Diffusion. In this article, we will explore how we can use the Stable Diffusion XL base model to transform textual descriptions into vivid images.

Pre-requisites

  • diffusers: A library from HuggingFace for diffusion models, commonly used for generative tasks such as text-to-image generation.
  • invisible_watermark: This library is typically used to embed and detect invisible watermarks in digital images, useful for copyright protection.

Download Prerequisites:

pip install diffusers 
pip install invisible-watermark transformers accelerate safetensors

Stable Diffusion XL Base Model for Text-to-Image

The Stable Diffusion XL base model is an advanced version of the popular Stable Diffusion model, designed for generating high-quality images from textual descriptions. This model is part of the broader category of diffusion models, which have gained significant attention for their ability to produce detailed and coherent images.

Implementing Stable Diffusion XL Base Model To Generate Images From Text

1. Using Diffusers Library

We will implement the code in Google Collab for computational efficiency.

The steps are as follows:

  • Imports:
    • DiffusionPipeline from diffusers for handling diffusion model components.
    • torch for tensor operations and device management.
  • Model Initialization:
    • Loads the “stabilityai/stable-diffusion-xl-base-1.0” model using DiffusionPipeline.from_pretrained().
    • Sets tensor data type to torch.float16 for reduced memory usage.
    • Enables safetensors for secure tensor serialization.
    • Specifies a model variant optimized for float16 operations.
  • Device Configuration:
    • Transfers the model pipeline to GPU with pipe.to("cuda") for faster processing.
  • Prompt Setting:
    • Defines the text prompt “An astronaut riding a horse”.
  • Image Generation:
    • Generates an image from the prompt, extracting the first image from the output batch with .images[0].
Python
from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0",
                                         torch_dtype=torch.float16, 
                                         use_safetensors=True, 
                                         variant="fp16")
pipe.to("cuda")

prompt = "An astronaut riding a green horse"

images = pipe(prompt=prompt).images[0]
image

Output:

2. Implementation using HuggingFace Inference API

We can aslo use the HuggingFace Inference API following the steps:

  • Navigate to the model page on the official website : Stable Diffusion XL Base Model.
  • Click on the “Deploy” button as highlighted in the above image.
  • Select “Inference API” from the options provided.
  • Copy the generated code snippet in your desired language.

Here’s how to incorporate the code into the implementation:

Python
import requests
import io
from PIL import Image
from IPython.display import display

API_URL = "https://api-inference.huggingface.co/models/stabilityai/stable-diffusion-xl-base-1.0"
headers = {"Authorization": "Bearer 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"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.content

image_bytes = query({
    "inputs": "Dog playing",
})

# Convert the image bytes to a PIL image
image = Image.open(io.BytesIO(image_bytes))

# Display the image
display(image)

Output:


Thus the article provides a clear and detailed guide on generating images from text using HuggingFace models, catering to both beginners and experienced users. We can use this models in various tasks with considerations about copyright infringements.



Contact Us