/plushcap/analysis/voxel51/towards-controllable-diffusion-models-with-gligen

Towards Controllable Diffusion Models with GLIGEN

What's this blog post about?

Researchers have developed a new method called GLIGEN (Grounded Language to Image Generation) that allows users more control over AI-generated imagery. Unlike existing text-to-image diffusion models, GLIGEN can be conditioned on input from other modalities such as bounding boxes and keypoints. The approach involves adding new learnable parameters to adapt and modify intermediate features in existing models without changing their weights. This enables users to generate images with specific conditions, such as controlling the position of objects or the pose of generated subjects. GLIGEN has been trained on a combination of grounding data, detection data, and caption data, making it capable of handling various tasks like object detection and causal inference.

Company
Voxel51

Date published
April 11, 2023

Author(s)
Yuheng Li

Word count
1619

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.