In 2023, computer vision made significant progress across various modalities. Notable developments include YOLO-NAS for object detection, the Segment Anything Model (SAM) for segmentation, DINOv2 for self-supervised learning, Gaussian Splatting as an alternative to NeRFs, and advancements in text-to-image models like Midjourney and Stable Diffusion. Additionally, LoRA facilitated efficient fine-tuning of diffusion models, while the Ego-Exo4D dataset emerged as a foundation for video perception research. Furthermore, T2V models made strides towards high-quality video generation from text prompts, and multimodal LLMs like GPT-4 Vision and LLaVA combined language understanding with visual capabilities. Finally, LLM-aided visual reasoning enabled the integration of general reasoning abilities with expert vision models for tasks such as visual question answering.