Company
Date Published
Author
Chuan Li
Word count
2037
Language
English
Hacker News points
None

Summary

The task of object detection is to identify "what" objects are inside an image and "where" they are. Object detection has been a central problem in computer vision and pattern recognition, inheriting challenges from image classification such as robustness to noise, transformations, occlusions, and introducing new challenges like detecting multiple instances and identifying their precise locations. The Single Shot Detector (SSD) is a multi-scale sliding window detector that leverages deep CNNs for both classification and localization tasks. SSD makes detection more robust by leveraging deep features and allows feature sharing between the classification task and the localization task. The network outputs a prediction map with class confidence and bounding box information, which is then processed using priorbox to select ground truth objects and compute loss. Priorbox uses a simple distance-based heuristic to create ground truth predictions, including backgrounds where no matched object can be found. SSD uses hard negative mining to address the problem of imbalance between foreground and background samples. The network also employs data augmentation strategies like "zoom in" and "zoom out" to improve performance on detecting large and small objects, respectively. Pre-trained feature extractors and L2 normalization are used, with modifications made to the VGG_16 model. Post-processing involves filtering out weak detections using confidence thresholds and performing non-maximum suppression to curate results.