Deep Floyd IF

May 3, 2023 · 2 min read · ai images ·

Share on:

Overview

Download prerequisites

Setup Environment

Clone the git repo

1git clone https://github.com/deep-floyd/IF.git

cd to the repo folder

In my case:

1cd C:\Users\trima\Documents\GitHub\IF

Create the conda environment

1conda create --name IF python=3.10.10

Activate the environment

1conda activate IF

Install requirements

1pip install -r requirements.txt --upgrade
2pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118

Setup Program

Download the model weights from Hugging Face

WARNING

IF-I-XL-v1.0 is ~262 GB

1git clone https://huggingface.co/DeepFloyd/IF-I-XL-v1.0.git

WARNING

IF-II-L-v1 is ~182 GB

1git clone https://huggingface.co/DeepFloyd/IF-II-L-v1.0.git

WARNING

stable-diffusion-x4-upscaler is ~26.1 GB

1git clone https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler.git

Run Deep Floyd IF

Put the code below in a file called run.py. Run it in Anaconda Prompt with python run.py

 1
 2import gc
 3import torch
 4import time
 5
 6torch.cuda.set_per_process_memory_fraction(0.5)
 7
 8def flush():
 9    gc.collect()
10    torch.cuda.empty_cache()
11
12from diffusers import DiffusionPipeline
13from diffusers.utils import pt_to_pil
14
15# stage 1
16stage_1 = DiffusionPipeline.from_pretrained("./IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16, safety_checker=None)
17
18# stage 2
19stage_2 = DiffusionPipeline.from_pretrained('./IF-II-L-v1.0', text_encoder=None, variant="fp16", torch_dtype=torch.float16, safety_checker=None)
20
21# stage 3
22stage_3 = DiffusionPipeline.from_pretrained('./stable-diffusion-x4-upscaler', torch_dtype=torch.float16, safety_checker=None)
23
24# Memory management
25stage_1.enable_sequential_cpu_offload()
26stage_2.enable_model_cpu_offload()
27stage_3.enable_model_cpu_offload()
28
29# prompt
30prompt = 'an anime girl wearing a shirt that says "hello world"'
31
32# text embeds
33prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt)
34
35# seed settings
36time_seed = int(time.time())
37generator = torch.manual_seed(time_seed)
38
39# stage 1
40image = stage_1(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt").images
41pt_to_pil(image)[0].save("./if_stage_I.png")
42
43del stage_1
44flush()
45
46# stage 2
47image = stage_2(
48    image=image, prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt"
49).images
50pt_to_pil(image)[0].save("./if_stage_II.png")
51
52del stage_2
53flush()
54
55# stage 3
56image = stage_3(prompt=prompt, image=image, generator=generator, noise_level=100).images
57image[0].save("./if_stage_III.png")

Conclusion

My takeaways from Deep Floyd IF:

The 16GB of VRAM in my RTX 4080 isn't enough to run the third stage, so the largest output this implementation can make is 256x256
Deep Floyd IF has extremely slow inference times, upwards of two mintues per 256x256 image. I've played around a bit with memory management but don't know enough about Pytorch to get VRAM usage under 16GB. I got stage 3 working in CPU mode only, which sent inference times soaring over 40 minutes per 1024x1024 image.
Community adoption has been slow, probably because of slow inference times
Not really seeing an advantage of this over Stable Diffusion + ControlNet