Towards Controllable Diffusion Models with GLIGEN
Researchers have developed a new method called GLIGEN (Grounded Language to Image Generation) that allows users more control over AI-generated imagery. Unlike existing text-to-image diffusion models, GLIGEN can be conditioned on input from other modalities such as bounding boxes and keypoints. The approach involves adding new learnable parameters to adapt and modify intermediate features in existing models without changing their weights. This enables users to generate images with specific conditions, such as controlling the position of objects or the pose of generated subjects. GLIGEN has been trained on a combination of grounding data, detection data, and caption data, making it capable of handling various tasks like object detection and causal inference.
Company
Voxel51
Date published
April 11, 2023
Author(s)
Yuheng Li
Word count
1619
Language
English
Hacker News points
None found.