Why 2023 was the most exciting year in computer vision history (so far)
In 2023, computer vision made significant progress across various modalities. Notable developments include YOLO-NAS for object detection, the Segment Anything Model (SAM) for segmentation, DINOv2 for self-supervised learning, Gaussian Splatting as an alternative to NeRFs, and advancements in text-to-image models like Midjourney and Stable Diffusion. Additionally, LoRA facilitated efficient fine-tuning of diffusion models, while the Ego-Exo4D dataset emerged as a foundation for video perception research. Furthermore, T2V models made strides towards high-quality video generation from text prompts, and multimodal LLMs like GPT-4 Vision and LLaVA combined language understanding with visual capabilities. Finally, LLM-aided visual reasoning enabled the integration of general reasoning abilities with expert vision models for tasks such as visual question answering.
Company
Voxel51
Date published
Dec. 20, 2023
Author(s)
Jacob Marks
Word count
2594
Language
English
Hacker News points
None found.