👁️

zero-shotでの画像認識手法まとめ

2023/08/26に公開

 zero-shotでの画像認識概要は以下。
https://speakerdeck.com/sensetime_japan/zerosiyotutowu-ti-jian-chu-falseyan-jiu-dong-xiang
https://speakerdeck.com/onely7/large-vision-language-model-lvlm-niguan-suruzui-xin-zhi-jian-matome-part-1
とりあえずメモ的に雑にまとめてます。

 画像分類
 CLIPhttps://trail.t.u-tokyo.ac.jp/ja/blog/22-12-02-clip/
https://note.com/mori_tokuiten/n/n4e230bdda0da
https://note.com/npaka/n/n113761af8c5d

 BLIP-2https://github.com/salesforce/LAVIS/tree/main/projects/blip2

 物体検出
 Detichttps://github.com/facebookresearch/Detic
https://zenn.dev/kaeru39/articles/ed00244dbb32cd

 GRiThttps://github.com/JialianW/GRiT

 GroundingDINOhttps://github.com/IDEA-Research/GroundingDINO

 汎用
 LLM全体https://speakerdeck.com/ryotatanaka/recent-trends-in-llm-based-visual-document-understanding

 GPT-4Vhttps://arxiv.org/abs/2309.17421
https://www.docswell.com/s/DeepLearning2023/56YD7E-2023-12-08-124618
https://arxiv.org/abs/2310.16809
https://github.com/DLYuanGod/TinyGPT-V?tab=readme-ov-file#launching-demo-locally

 Geminihttps://note.com/it_navi/n/nd0beb06f5360
https://note.com/npaka/n/n166bc3df3abc
https://github.com/GoogleCloudPlatform/generative-ai
https://note.com/masayuki_abe/n/na744c92a8182
https://blog.brainpad.co.jp/entry/2023/12/22/153000

 LLaVahttps://zenn.dev/kazuhito/articles/da87c94004e1f4
https://nowokay.hatenablog.com/entry/2023/10/13/113343
https://nowokay.hatenablog.com/entry/2023/10/17/153657
https://huggingface.co/cyberagent/llava-calm2-siglip

 応用したシステム・アプリ
 Video Chathttps://github.com/OpenGVLab/Ask-Anything

 動画検索システムhttps://zenn.dev/turing_motors/articles/ai-movie-searcher
https://note.com/eurekachan/n/n9d4f62b80ad6

 Language Segment-Anythinghttps://zenn.dev/turing_motors/articles/c3291423ab914c

 ベンチマークソフトhttps://qiita.com/toshi_456/items/050a4ba98d90b7ca7bac

 参考https://zenn.dev/elith/articles/7bdde4f8650f9b
https://zenn.dev/turing_motors/articles/353a6e71a1444c
https://note.com/npaka/n/n1ebe218e95d0
https://twitter.com/ai_database/status/1737473689896030631
https://techblog.exawizards.com/entry/2023/05/10/055218
https://speakerdeck.com/ssii/ssii2024-os3-ijiri
https://github.com/gokayfem/awesome-vlm-architectures
https://qiita.com/ryosuke_ohori/items/34581692852b8b406139

zero-shotでの画像認識

画像分類

CLIP

BLIP-2

物体検出

Detic

GRiT

GroundingDINO

汎用

LLM全体

GPT-4V

Gemini

LLaVa

応用したシステム・アプリ

Video Chat

動画検索システム

Language Segment-Anything

ベンチマークソフト

参考

Discussion