This May 2024 Computer Vision Monthly Wrap provides an overview of recent developments and resources in the field of vision-language modeling (VLMs). Researchers at Meta AI introduced a comprehensive paper on VLMs, covering their introduction, training, and evaluation. Google also open-sourced PaliGemma-3B, a state-of-the-art VLM that combines visual and textual information for more accurate outputs. Additionally, this month's wrap reviews the capabilities of GPT-4o, Gemini 1.5 Pro, and Claude 3 Opus, comparing their performance across various benchmarks and real-world applications. The wrap also includes developer resources such as TTI-Eval, an open-source library to evaluate the performance of fine-tuned CLIP models, and a fine-tuning notebook for PaliGemma. These resources are available on GitHub and Hugging Face Spaces, providing developers with tools to build and deploy VLM applications.