Why 2023 was the most exciting year in computer vision history (so far)

Company

Voxel51

Date Published

Dec. 20, 2023

Author

Jacob Marks

Word count

2594

Language

English

Hacker News points

None

URL

voxel51.com/blog/why-2023-was-the-most-exciting-year-in-computer-vision-history-so-far

Summary

In 2023, computer vision made significant progress across various modalities. Notable developments include YOLO-NAS for object detection, the Segment Anything Model (SAM) for segmentation, DINOv2 for self-supervised learning, Gaussian Splatting as an alternative to NeRFs, and advancements in text-to-image models like Midjourney and Stable Diffusion. Additionally, LoRA facilitated efficient fine-tuning of diffusion models, while the Ego-Exo4D dataset emerged as a foundation for video perception research. Furthermore, T2V models made strides towards high-quality video generation from text prompts, and multimodal LLMs like GPT-4 Vision and LLaVA combined language understanding with visual capabilities. Finally, LLM-aided visual reasoning enabled the integration of general reasoning abilities with expert vision models for tasks such as visual question answering.