Use case overview
In the first part of our series, “How to Create an AI Image Generator Application Using Stable Diffusion – Part 1/2” we introduced Stable Diffusion, an advanced latent text-to-image diffusion model that has the remarkable ability to generate incredibly realistic images based on textual input. Additionally, we showcased how to optimize the performance of Stable Diffusion 2.1 base using AWS Deep Learning Containers (DLCs) and Inferentia for efficient text-to-image predictions.
In this blog post, we will take the text-to-image capability further by expanding it to include image-to-image functionality. We will also explore the concept of reusing seeds and incorporate an image search engine that utilizes a vector database. Lastly, we will combine all of these powerful features to construct an AI image generator application using Streamlit. This application will enable users to generate images based on either text descriptions or initial images, provide a gallery of previously generated images, and even perform image searches using textual queries.
Architectural Diagram
Below are three main features of the AI image generator. Each feature is color coded as described below.
- Image generation via text prompt or with initial image (Blue)
- Image search / prompt recommendation (Red)
- Image history search by user session (Yellow)
- User provides a session id which creates a new folder in the project s3 bucket where all generated images would be stored in.
- User inputs prompt to generate image(s). Users can also supply an initial image or seed value. User requests invoke Sagemaker real time endpoint that hosts Stable Diffusion 2.1 base. For each image inference, images are stored in S3 bucket. In addition, the model also performs a text2vector which stores the prompt as embeddings along with the image s3 location as metadata to the vector database (Pinecone).
- Image prediction and seed value are returned to the user.
- User searches for images by providing a prompt.
- A similarity search is performed between the search prompt and the image prompts in the vector database.
- Descending sorted results (by similarity scores) of the images along with their prompts are returned to the user.
- User can also view images generated in their own session.
Image2Image text-guided generation
The StableDiffusionImg2ImgPipeline lets you pass a text prompt and an initial image to condition the generation of new images.
Initial image (Left)
“A fantasy landscape, trending on artstation” (Right, new image)
Below is a code sample to implement image2image:
import requests from PIL import Image from io import BytesIO
from diffusers import StableDiffusionImg2ImgPipeline
# load the pipeline device = “cuda” pipe = StableDiffusionImg2ImgPipeline.from_pretrained( “runwayml/stable-diffusion-v1-5”, torch_dtype=torch.float16, ).to(device)
# let’s download an initial image url = “https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg”
response = requests.get(url) init_image = Image.open(BytesIO(response.content)).convert(“RGB”) init_image = init_image.resize((768, 512))
prompt = “A fantasy landscape, trending on artstation”
images = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images
images[0].save(“fantasy_landscape.png”) |
Source (https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb)
Reusing seeds and latents
To reproduce desired results or fine-tune specific outcomes that appeal to you, you have the flexibility to generate your own latents or adjust your prompt accordingly. If we wish to reuse seeds for reproducibility purposes, it becomes necessary for us to generate the latents ourselves. Otherwise, the internal pipeline will handle their generation, and we won’t have a means to replicate them.
Latents serve as the initial random Gaussian noise that undergoes transformation into tangible images throughout the diffusion process. To generate these latents, we will assign a unique random seed to each latent and store them for future reuse. This ensures consistency and allows us to reproduce desired results reliably.
Prompt:
“Puppy in grass,flowers,poppies” (Left, original seed)
Negative prompt: “labrador” (Right, generated from seed)
Sample code to generate seed and use for inference is below.
generator = torch.Generator(device=device) latents = None prompt = “Labrador in the style of Vermeer” with torch.autocast(“cuda”): |
Source (https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb)
To enable our model to perform both image2image and reuse of seeds, we modified our inference code in our model artifact. We have shared the modified inference code here.
Vector database
A “vector database” is a powerful storage and retrieval service designed to efficiently search through vast quantities of vectors, ranging from millions to billions in number. To make use of this service, we begin by generating prompt vectors, which are then inserted into the vector database. By querying the database with new prompt vectors, we can easily retrieve the corresponding vectors.
Source (www.pinecone.io)
Let’s consider an example to better understand the process. Suppose our initial prompt is “Sheep grazing.” We construct a prompt vector based on this prompt and proceed to search for similar items within the vector database. As a result of this search, we discover multiple images that already align with the given description.
To facilitate efficient storage and retrieval of generated images, we store them in an S3 storage system. When inserting the images into the vector database, we include the image location (S3 URI) as metadata alongside the corresponding vector. Additionally, we incorporate the plaintext prompt as metadata. This additional metadata serves a valuable purpose, enabling a second feature known as prompt recommendations.
Prompt recommendations prove to be immensely beneficial for users who may be grappling with prompt ideas. By leveraging these recommendations, users can find inspiration and enhance their creativity, leading to more satisfying outcomes.
Prompt/image recommendations based on prompt
Stable Diffusion Playground
To create a user-friendly interface for Stable Diffusion 2.1 base, we used Streamlit. Our frontend UI comprises three distinct tabs: “Search,” “Generate,” and “History,” each serving unique functionalities. We have shared the code to the entire application here.
Within the “Search” tab, users gain the ability to search for images by entering a prompt. This intuitive feature allows for efficient retrieval of relevant images based on user-specified prompts.
“Search” tab Stable Diffusion Playground
Moving to the “Generate” tab, users are presented with a versatile set of options to generate images. By providing a prompt, alongside additional parameters such as a negative prompt, seed value, and desired output count, users can perform text-to-image generation. Alternatively, users have the opportunity to initiate image-to-image generation by providing an initial image in conjunction with a prompt.
“Generate” tab Stable Diffusion Playground
In the final tab, “History,” users can delve into their session’s historical image records. This invaluable functionality enables users to conveniently search and retrieve previously generated images, empowering them to review and analyze their creative journey.
“History” tab Stable Diffusion Playground
Conclusion
In this second part of the blog series, we have expanded the capabilities of our text2image Stable Diffusion 2.1 base model from the first part. In addition to text-to-image generation, we have introduced image2image functionality, allowing users to explore a wider range of creative possibilities.
To further enhance the user experience, we have implemented the reuse of seeds, enabling users to reproduce specific image outputs consistently. Moreover, we have integrated a vector database, which serves as the backbone for our image and prompt search as well as recommendation features. This vector database empowers users to search for images or prompts that align with their requirements, while also providing valuable recommendations to inspire their creative process.
Checkout our open source Git repository for more open source materials. Also, contact us to learn how to economically productionize your generative AI model at scale.