👀

【Object Detection】YOLO simple explaination Part1

2024/05/10に公開

機械学習

yolo

tech

In this article, I'll explain about YOLO.
Original Paper: You Only Look Once: Unified, Real-Time Object Detection

1. What is YOLO?

YOLO is a object detection method has features that faster and precisely. It published until version 7 and version NAS, but here I'll explain principle of YOLO.

For example, it works like this:

Quote: Original Paper $^1$

From here, I'll explain about YOLO step by step.

2.Image Classification

First, let's think about image classification.
For example, thinking classification dog and person. We can classify image below to dog easily.

When doing it as machine, it should output number 1 by using CNN.(If detect person, should output 0)

3. Object Localization

Next, what do we do when want to indicate bounding box additionally?
Like this:

We can correspond bounding box to output with using vector as a output, so output shape will be like array(7,1).
・Array represet object with bounding box

・element
$P_c$ : Probability of exist detection target
$B_x$ : Bounding box position of x axis
$B_y$ : Bounding box position of y axis
$B_w$ : Bounding box width
$B_h$ : Bounding box height
$C_1$ : Dog class
$C_2$ : Person class

This array can represent object in image with including bounding box. If person detected, $C_2$ will be 1, if detected nothing, $P_c$ will be 0.
・Array when Not detected

4. Neural Network

4.1 CNN

We use neural network(CNN) from here.
Above image and vector is provided to CNN that image as X(feature), vector as y(teacher).

Here, We already can detect singel object by this CNN.

4.2 Multi Detection

I explained method in above is enable only for single object. How do we do when multiple objects in a image?
・multiple objects in a image

Quote: What is YOLO algorithm? $^2$

In here, we divide image into some parts(4x4 in here), and create a vector for each part.

・Image devided

Quote: What is YOLO algorithm? $^2$

The prediction elem( $P_c$ ) of vector is only set for part that has center point of object, and vector coordinate is set as left top is (0, 0) and right bottom is (1, 1).
It can recognite multiple objects and bounding boxes.

And then, we can get the 4x4x7 array, becouse we get a vector(7 elem) for each 16 part.

Quote: What is YOLO algorithm? $^2$

Also, It provide to CNN.
・CNN with multi detection

Summary

This time, I explained about YOLO until it detect multiple object.
Next time, I'll start form this continuation.

Thank you for reading.

Next article is published!

Reference

(1) You Only Look Once: Unified, Real-Time Object Detection
(2) What is YOLO algorithm? | Deep Learning Tutorial 31 (Tensorflow, Keras & Python)

1. What is YOLO?

2.Image Classification

3. Object Localization

4. Neural Network

4.1 CNN

4.2 Multi Detection

Summary

Reference

Discussion