| Abstract: |
Real-time object detection is a foundational challenge in computer vision, demanding algorithms that simultaneously achieve high detection accuracy and minimal inference latency. This study investigates the comparative performance of efficient deep learning-based object detection algorithms ranging from YOLOv3 to YOLOv7 evaluated systematically on the MS-COCO 2017 benchmark dataset. The primary objectives are to benchmark detection accuracy and inference speed across leading architectures and to identify the specific algorithmic contributions that most significantly improve the speed-accuracy trade-off. A quantitative, comparative experimental methodology was adopted using metrics including mAP@0.5, mAP@0.5:0.95, FPS, parameter count, and GFLOPs on a standardized NVIDIA Tesla V100 GPU platform. The hypothesis posits that anchor-free architectures combined with advanced label assignment strategies and re-parameterization techniques yield measurably superior speed-accuracy trade-offs compared to traditional anchor-based detectors. Results confirm that YOLOv7, leveraging the Extended Efficient Layer Aggregation Network (E-ELAN) and trainable bag-of-freebies, achieves 51.2% mAP@0.5:0.95 at 161 FPS with only 36.9M parameters. Findings demonstrate that targeted architectural innovation, rather than model scaling, is the decisive determinant of real-time detection efficiency. The study concludes with evidence-based recommendations for deployment-optimized model selection. |