Bug Detection Showdown: OpenAI o3-mini vs Claude 3.5 Sonnet

Company

Greptile

Date Published

April 3, 2025

Author

Everett Butler

Word count

810

Language

English

Hacker News points

None

URL

www.greptile.com/blog/o3-mini-vs-sonnet-3.5

Summary

Bug detection is a challenging task that requires understanding, reasoning, and inference. A recent comparison of two models, OpenAI's o3-mini and Anthropic's Sonnet 3.5, was conducted across five programming languages on a dataset of real-world bugs. The evaluation revealed that o3-mini performed better overall, especially in Python and Rust, where it benefited from strong language coverage and a hybrid of reasoning and memorization. However, Sonnet 3.5 excelled in Go and Ruby, where its reasoning pipeline shone. The results highlight the importance of considering language, bug type, and model strengths when choosing an AI tool for bug detection. A notable example of reasoning's effectiveness was seen in detecting a race condition in a smart home system's API Server, where Sonnet 3.5 correctly identified the issue despite o3-mini missing it.