A look at Stable Diffusion - An open-source text to image alternative to MidJourney and DALL-E 2

Implementing Stable Diffusion using Tensorflow and keras

Dec 07, 2022

Earlier, DALL-E 2 and MidJourney were the only text-to-image AI generators available. They have a significant artificial limitation: the inability to produce images of well-known individuals, including politicians and celebrities. Additionally, using these services have a price tag attached to it.

Artificial Intelligence (AI) art is current trend, but most AI image generators run in the cloud. To make text-to-image AI tool feasible, a startup called Stability AI developed the “Stable Diffusion” AI model which can be used to create various text-to-image AI art within seconds on your own PC with zero cost!

Jumping right into Stable Diffusion model, it is a complex algorithm trained on images from the internet. The algorithm itself is built on ideas of Open AI. It has been trained on billions of images and can produce results that are comparable to the ones you’d get from DALL-E 2 and MidJourney.

Stable Diffusion currently runs in a command-line interface (CLI). It separates the image generating process into a “diffusion” process at runtime. In the beginning, it starts with noise and keep improving until there’s no noise left, thus achieving an output that is similar (close to) to the text provided.

Now, you can also implement stable diffusion using Tensorflow and keras.

Perks of KerasCV

With several implementations of Stable Diffusion publicly available why should you use keras_cv.models.StableDiffusion?

Advantages of KerasCV's Stable Diffusion model are:

Easy-to-use API
Graph mode execution
XLA compilation through jit_compile=True
Support for mixed precision computation
Runs orders of magnitude faster than naive implementations

Code:

Setup:

!pip install --upgrade keras-cv

import keras_cv
from tensorflow import keras
import matplotlib.pyplot as plt

Construct a model:

model = keras_cv.models.StableDiffusion(img_width=512, img_height=512)

Prompt:

images = model.text_to_image("photograph of an astronaut riding a horse", batch_size=3)

def plot_images(images):
    plt.figure(figsize=(20, 20))
    for i in range(len(images)):
        ax = plt.subplot(1, len(images), i + 1)
        plt.imshow(images[i])
        plt.axis("off")


plot_images(images)

images = model.text_to_image(
    "cute magical flying dog, fantasy art, "
    "golden color, high quality, highly detailed, elegant, sharp focus, "
    "concept art, character concepts, digital painting, mystery, adventure",
    batch_size=3,
)
plot_images(images)

images = model.text_to_image("An avocado armchair", batch_size=3)
plot_images(images)

images = model.text_to_image(
    "Teddy bears conducting machine learning research",
    batch_size=3,
)
plot_images(images)

images = model.text_to_image(
    "A mysterious dark stranger visits the great pyramids of egypt, "
    "high quality, highly detailed, elegant, sharp focus, "
    "concept art, character concepts, digital painting",
    batch_size=3,
)
plot_images(images)

Stable Diffusion consists of three parts:

A text encoder, which turns your prompt into a latent vector.
A diffusion model, which repeatedly "denoises" a 64x64 latent image patch.
A decoder, which turns the final 64x64 latent patch into a higher resolution 512x512 image.
High-performance image generation using Stable Diffusion in KerasCV

Advantages:

Outputs are achieved faster than other tools.
Stable Diffusion is very precise and can even mix the faces of people.
Good at portraits, symmetrical facial expressions.

Disadvantages:

It appears to be more permissive than its competitors.
Stability AI doesn’t have a clear policy prohibiting pictures of famous people.
Produces societal biases, unsafe content and also allows some users to generate offensive or lude images.

Our next article will be on Simplifying Similarity Problem: Introduction to Siamese Neural Networks

Vevesta: Your Machine Learning Team’s Feature and Technique Repository - Accelerate your Machine learning project by using features, techniques and projects used by your peers

100 early birds who login into Vevesta will get free subscription for lifetime.

References:

Stable Diffusion: New And FREE Text-To-Image AI Tool
Stability.Ai
divamgupta/stable-diffusion-tensorflow
High-performance image generation using Stable Diffusion in KerasCV
Stable Diffusion - Vevesta
This article was originally published at https://www.vevesta.com/blog/34-Stable-Diffusion

Machine Learning Diaries

Discussion about this post