👁️

zero-shotでの画像認識手法まとめ

2023/08/26に公開

zero-shotでの画像認識

概要は以下。

https://speakerdeck.com/sensetime_japan/zerosiyotutowu-ti-jian-chu-falseyan-jiu-dong-xiang

https://speakerdeck.com/onely7/large-vision-language-model-lvlm-niguan-suruzui-xin-zhi-jian-matome-part-1

とりあえずメモ的に雑にまとめてます。

画像分類

CLIP

https://trail.t.u-tokyo.ac.jp/ja/blog/22-12-02-clip/

https://note.com/mori_tokuiten/n/n4e230bdda0da

https://note.com/npaka/n/n113761af8c5d

BLIP-2

https://github.com/salesforce/LAVIS/tree/main/projects/blip2

物体検出

Detic

https://github.com/facebookresearch/Detic

https://zenn.dev/kaeru39/articles/ed00244dbb32cd

GRiT

https://github.com/JialianW/GRiT

GroundingDINO

https://github.com/IDEA-Research/GroundingDINO

汎用

LLM全体

https://speakerdeck.com/ryotatanaka/recent-trends-in-llm-based-visual-document-understanding

GPT-4V

https://arxiv.org/abs/2309.17421

https://www.docswell.com/s/DeepLearning2023/56YD7E-2023-12-08-124618

https://arxiv.org/abs/2310.16809

https://github.com/DLYuanGod/TinyGPT-V?tab=readme-ov-file#launching-demo-locally

Gemini

https://note.com/it_navi/n/nd0beb06f5360

https://note.com/npaka/n/n166bc3df3abc

https://github.com/GoogleCloudPlatform/generative-ai

https://note.com/masayuki_abe/n/na744c92a8182

https://blog.brainpad.co.jp/entry/2023/12/22/153000

LLaVa

https://zenn.dev/kazuhito/articles/da87c94004e1f4

https://nowokay.hatenablog.com/entry/2023/10/13/113343

https://nowokay.hatenablog.com/entry/2023/10/17/153657

https://huggingface.co/cyberagent/llava-calm2-siglip

応用したシステム・アプリ

Video Chat

https://github.com/OpenGVLab/Ask-Anything

動画検索システム

https://zenn.dev/turing_motors/articles/ai-movie-searcher

https://note.com/eurekachan/n/n9d4f62b80ad6

Language Segment-Anything

https://zenn.dev/turing_motors/articles/c3291423ab914c

ベンチマークソフト

https://qiita.com/toshi_456/items/050a4ba98d90b7ca7bac

参考

https://zenn.dev/elith/articles/7bdde4f8650f9b

https://zenn.dev/turing_motors/articles/353a6e71a1444c

https://note.com/npaka/n/n1ebe218e95d0

https://twitter.com/ai_database/status/1737473689896030631

https://techblog.exawizards.com/entry/2023/05/10/055218

https://speakerdeck.com/ssii/ssii2024-os3-ijiri

https://github.com/gokayfem/awesome-vlm-architectures

https://qiita.com/ryosuke_ohori/items/34581692852b8b406139

Discussion