🐏

【ML Paper】DeiT: There are only fewer images for ViT? Part2

2024/10/05に公開

The summary of this paper, part2.
The authors proposed an improved Vision Transformer, DeiT(Data-Efficient image Transformer)

Original Paper: https://arxiv.org/abs/2012.12877v2

2. Terms

2.1 Knowledge Distillaton

Knowledge Distillation is a method that uses the teacher model's output as an auxiliary loss of the student model.
・Knowledge Distillation Image

[1]

2.2 Class Token

Class Token is a vector that represents the information of all the other tokens in sequence by self-attention with them.
The class token acts as a way to aggregate information from all tokens and is designed specifically to handle this task. For tasks like classification, this final representation of the class token is used as the input to a classifier (a fully connected layer followed by softmax, for example) to make a prediction.

References

(1)PyTorch Tutorial

Discussion