【ML Paper】DeiT: There are only fewer images for ViT? Part9
The summary of this paper, part9.
The authors proposed an improved Vision Transformer, DeiT(Data-Efficient image Transformer)
Original Paper: https://arxiv.org/abs/2012.12877v2
6 Conclusion
This is an introduction of DeiT, which are image transformers that
do not require very large amount of data to be trained, thanks to improved training and in particular a novel distillation procedure.
DeiT has started the existing data augmentation and regularization strategies pre-existing for convnets, not introducing any significant architectural beyond their novel distillation token. Therefore it is likely that research on dataaugmentation more adapted or learned for transformers will bring further gains.
It would rapidly become a method of choice considering their lower memory footprint for a given accuracy.
・Inprementation
https://github.com/facebookresearch/deit.
7 Summary
This time, I read the DeiT paper.
DeiT is a new VIT method to handle the problem of VIT requiring a large amount of data.
It incorporates a distillation token and learns from the teacher model's prediction to imitate the ability of the teacher models.
The various normalization and distillation token provide the power that makes VIT able to predict from a lower dataset than usual VIT.
Discussion