👀

【Object Detection】YOLO simple explaination Part1

2024/05/10に公開

In this article, I'll explain about YOLO.
Original Paper: You Only Look Once: Unified, Real-Time Object Detection

1. What is YOLO?

YOLO is a object detection method has features that faster and precisely. It published until version 7 and version NAS, but here I'll explain principle of YOLO.

For example, it works like this:

Quote: Original Paper^1

From here, I'll explain about YOLO step by step.

2.Image Classification

First, let's think about image classification.
For example, thinking classification dog and person. We can classify image below to dog easily.

When doing it as machine, it should output number 1 by using CNN.(If detect person, should output 0)

3. Object Localization

Next, what do we do when want to indicate bounding box additionally?
Like this:

We can correspond bounding box to output with using vector as a output, so output shape will be like array(7,1).
・Array represet object with bounding box

・element
P_c: Probability of exist detection target
B_x: Bounding box position of x axis
B_y: Bounding box position of y axis
B_w: Bounding box width
B_h: Bounding box height
C_1: Dog class
C_2: Person class

This array can represent object in image with including bounding box. If person detected, C_2 will be 1, if detected nothing, P_c will be 0.
・Array when Not detected

4. Neural Network

4.1 CNN

We use neural network(CNN) from here.
Above image and vector is provided to CNN that image as X(feature), vector as y(teacher).

Here, We already can detect singel object by this CNN.

4.2 Multi Detection

I explained method in above is enable only for single object. How do we do when multiple objects in a image?
・multiple objects in a image

Quote: What is YOLO algorithm?^2

In here, we divide image into some parts(4x4 in here), and create a vector for each part.

・Image devided

Quote: What is YOLO algorithm?^2

The prediction elem(P_c) of vector is only set for part that has center point of object, and vector coordinate is set as left top is (0, 0) and right bottom is (1, 1).
It can recognite multiple objects and bounding boxes.

And then, we can get the 4x4x7 array, becouse we get a vector(7 elem) for each 16 part.

Quote: What is YOLO algorithm?^2

Also, It provide to CNN.
・CNN with multi detection

Summary

This time, I explained about YOLO until it detect multiple object.
Next time, I'll start form this continuation.

Thank you for reading.

Next article is published!

Reference

(1) You Only Look Once: Unified, Real-Time Object Detection
(2) What is YOLO algorithm? | Deep Learning Tutorial 31 (Tensorflow, Keras & Python)

Discussion