A look at Stable Diffusion - An open-source text to image alternative to MidJourney and DALL-E 2
Implementing Stable Diffusion using Tensorflow and keras
Earlier, DALL-E 2 and MidJourney were the only text-to-image AI generators available. They have a significant artificial limitation: the inability to produce images of well-known individuals, including politicians and celebrities. Additionally, using these services have a price tag attached to it.
Artificial Intelligence (AI) art is current trend, but most AI image generators run in the cloud. To make text-to-image AI tool feasible, a startup called Stability AI developed the “Stable Diffusion” AI model which can be used to create various text-to-image AI art within seconds on your own PC with zero cost!
Jumping right into Stable Diffusion model, it is a complex algorithm trained on images from the internet. The algorithm itself is built on ideas of Open AI. It has been trained on billions of images and can produce results that are comparable to the ones you’d get from DALL-E 2 and MidJourney.
Stable Diffusion currently runs in a command-line interface (CLI). It separates the image generating process into a “diffusion” process at runtime. In the beginning, it starts with noise and keep improving until there’s no noise left, thus achieving an output that is similar (close to) to the text provided.
Now, you can also implement stable diffusion using Tensorflow and keras.
Perks of KerasCV
With several implementations of Stable Diffusion publicly available why should you use keras_cv.models.StableDiffusion
?
Advantages of KerasCV's Stable Diffusion model are:
Easy-to-use API
Graph mode execution
XLA compilation through
jit_compile=True
Support for mixed precision computation
Runs orders of magnitude faster than naive implementations
Code:
Setup:
!pip install --upgrade keras-cv
import keras_cv
from tensorflow import keras
import matplotlib.pyplot as plt
Construct a model:
model = keras_cv.models.StableDiffusion(img_width=512, img_height=512)
Prompt:
images = model.text_to_image("photograph of an astronaut riding a horse", batch_size=3)
def plot_images(images):
plt.figure(figsize=(20, 20))
for i in range(len(images)):
ax = plt.subplot(1, len(images), i + 1)
plt.imshow(images[i])
plt.axis("off")
plot_images(images)
images = model.text_to_image(
"cute magical flying dog, fantasy art, "
"golden color, high quality, highly detailed, elegant, sharp focus, "
"concept art, character concepts, digital painting, mystery, adventure",
batch_size=3,
)
plot_images(images)
images = model.text_to_image("An avocado armchair", batch_size=3)
plot_images(images)
images = model.text_to_image(
"Teddy bears conducting machine learning research",
batch_size=3,
)
plot_images(images)
images = model.text_to_image(
"A mysterious dark stranger visits the great pyramids of egypt, "
"high quality, highly detailed, elegant, sharp focus, "
"concept art, character concepts, digital painting",
batch_size=3,
)
plot_images(images)
Stable Diffusion consists of three parts:
A text encoder, which turns your prompt into a latent vector.
A diffusion model, which repeatedly "denoises" a 64x64 latent image patch.
A decoder, which turns the final 64x64 latent patch into a higher resolution 512x512 image.
Advantages:
Outputs are achieved faster than other tools.
Stable Diffusion is very precise and can even mix the faces of people.
Good at portraits, symmetrical facial expressions.
Disadvantages:
It appears to be more permissive than its competitors.
Stability AI doesn’t have a clear policy prohibiting pictures of famous people.
Produces societal biases, unsafe content and also allows some users to generate offensive or lude images.
Our next article will be on Simplifying Similarity Problem: Introduction to Siamese Neural Networks
Vevesta: Your Machine Learning Team’s Feature and Technique Repository - Accelerate your Machine learning project by using features, techniques and projects used by your peers
100 early birds who login into Vevesta will get free subscription for lifetime.