examples | ||
min_dalle | ||
.gitattributes | ||
.gitignore | ||
cog.yaml | ||
image_from_text.py | ||
LICENSE | ||
min_dalle.ipynb | ||
README.md | ||
replicate_predictor.py | ||
requirements.txt | ||
setup.py |
min(DALL·E)
This is a fast, minimal port of DALL·E Mega. It has been stripped down for inference and converted to PyTorch. The only third party dependencies are numpy, requests, pillow and torch.
To generate a 4x4 grid of DALL·E Mega images it takes:
- 89 sec with a T4 in Colab
- 48 sec with a P100 in Colab
- 13 sec with an A100 on Replicate
The flax model and code for converting it to torch can be found here.
Install
$ pip install min-dalle
Usage
Load the model parameters once and reuse the model to generate multiple images.
from min_dalle import MinDalle
model = MinDalle(
models_root='./pretrained',
dtype=torch.float32,
is_mega=True,
is_reusable=True
)
The required models will be downloaded to models_root
if they are not already there. Set the dtype
to torch.float16
to save GPU memory. If you have an Ampere architecture GPU you can use torch.bfloat16
. Once everything has finished initializing, call generate_image
with some text as many times as you want. Use a positive seed
for reproducible results. Higher values for log2_supercondition_factor
result in better agreement with the text but a narrower variety of generated images. Every image token is sampled from the top-k
most probable tokens.
image = model.generate_image(
text='Nuclear explosion broccoli',
seed=-1,
grid_size=4,
log2_k=6,
log2_supercondition_factor=5,
is_verbose=False
)
display(image)
credit: https://twitter.com/hardmaru/status/1544354119527596034
Saving Individual Images
The images can also be generated as a FloatTensor
in case you want to process them manually.
images = model.generate_images(
text='Nuclear explosion broccoli',
seed=-1,
image_count=7,
log2_k=6,
log2_supercondition_factor=5,
is_verbose=False
)
To get an image into PIL format you will have to first move the images to the CPU and convert the tensor to a numpy array.
images = images.to('cpu').numpy()
Then image i
can be coverted to a PIL.Image and saved
image = Image.fromarray(images[i])
image.save('image_{}.png'.format(i))
Interactive
If the model is being used interactively (e.g. in a notebook) generate_image_stream
can be used to generate a stream of images as the model is decoding. The detokenizer adds a slight delay for each image. Setting log2_mid_count
to 3 results in a total of 2 ** 3 = 8
generated images. The only valid values for log2_mid_count
are 0, 1, 2, 3, and 4. This is implemented in the colab.
image_stream = model.generate_image_stream(
text='Dali painting of WALL·E',
seed=-1,
grid_size=3,
log2_mid_count=3,
log2_k=6,
log2_supercondition_factor=3,
is_verbose=False
)
for image in image_stream:
display(image)
Command Line
Use image_from_text.py
to generate images from the command line.
$ python image_from_text.py --text='artificial intelligence' --no-mega