【ML Paper】YOLO: Unified Real-Time Object Detection part3
This time, I'll explain the YOLO image detection model with paper.
This is a part3, part4 will publish soon.
Original paper: https://arxiv.org/abs/1506.02640
4. Network design
The network architecture is inspired by the GoogLeNet model for image classification.
The model network has 24 convolutional layers followed by 2 fully connected layers.
Instead of the inception modules used by GoogLeNet, simply use 1 × 1 reduction layers followed by 3 × 3 convolutional layers.
The final output of our network is the 7 × 7 × 30 tensor
of predictions.
The full network is shown below.
Our detection network has 24 convolutional layers followed by 2 fully connected layers. Alternating 1 × 1convolutional layers reduce the features space from preceding layers.
*model pretrained the convolutional layers on the ImageNet classification
task at half the resolution (224 × 224 input image) and then double the resolution for detection.
Fast YOLO
The authors also trained a fast version of YOLO. Fast YOLO uses aneural network with fewer convolutional layers (9 instead of 24) and fewer filters in those layers. Other than the size of the network, all training and testing parameters are the same between YOLO and Fast YOLO.
Discussion