/plushcap/analysis/voxel51/why-2023-was-the-most-exciting-year-in-computer-vision-history-so-far

Why 2023 was the most exciting year in computer vision history (so far)

What's this blog post about?

In 2023, computer vision made significant progress across various modalities. Notable developments include YOLO-NAS for object detection, the Segment Anything Model (SAM) for segmentation, DINOv2 for self-supervised learning, Gaussian Splatting as an alternative to NeRFs, and advancements in text-to-image models like Midjourney and Stable Diffusion. Additionally, LoRA facilitated efficient fine-tuning of diffusion models, while the Ego-Exo4D dataset emerged as a foundation for video perception research. Furthermore, T2V models made strides towards high-quality video generation from text prompts, and multimodal LLMs like GPT-4 Vision and LLaVA combined language understanding with visual capabilities. Finally, LLM-aided visual reasoning enabled the integration of general reasoning abilities with expert vision models for tasks such as visual question answering.

Company
Voxel51

Date published
Dec. 20, 2023

Author(s)
Jacob Marks

Word count
2594

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.