Towards Controllable Diffusion Models with GLIGEN

Company

Voxel51

Date Published

April 11, 2023

Author

Yuheng Li

Word count

1619

Language

English

Hacker News points

None

URL

voxel51.com/blog/towards-controllable-diffusion-models-with-gligen

Summary

Researchers have developed a new method called GLIGEN (Grounded Language to Image Generation) that allows users more control over AI-generated imagery. Unlike existing text-to-image diffusion models, GLIGEN can be conditioned on input from other modalities such as bounding boxes and keypoints. The approach involves adding new learnable parameters to adapt and modify intermediate features in existing models without changing their weights. This enables users to generate images with specific conditions, such as controlling the position of objects or the pose of generated subjects. GLIGEN has been trained on a combination of grounding data, detection data, and caption data, making it capable of handling various tasks like object detection and causal inference.