🐏

【ML Paper】DeiT: There are only fewer images for ViT? Part9

2024/10/16に公開

The summary of this paper, part9.

The authors proposed an improved Vision Transformer, DeiT(Data-Efficient image Transformer)
Original Paper: https://arxiv.org/abs/2012.12877v2

 6 ConclusionThis is an introduction of DeiT, which are image transformers that

do not require very large amount of data to be trained, thanks to improved training and in particular a novel distillation procedure.
DeiT has started the existing data augmentation and regularization strategies pre-existing for convnets, not introducing any significant architectural beyond their novel distillation token. Therefore it is likely that research on dataaugmentation more adapted or learned for transformers will bring further gains.
It would rapidly become a method of choice considering their lower memory footprint for a given accuracy.
・Inprementation

https://github.com/facebookresearch/deit.

 7 SummaryThis time, I read the DeiT paper.

DeiT is a new  VIT method to handle the problem of VIT requiring a large amount of data.

It incorporates a distillation token and learns from the teacher model's prediction to imitate the ability of the teacher models.
The various normalization and distillation token provide the power that makes VIT able to predict from a lower dataset than usual VIT.

Discussion

ログインするとコメントできます