zero-shotでの画像認識手法まとめ
zero-shotでの画像認識
概要は以下。
https://speakerdeck.com/sensetime_japan/zerosiyotutowu-ti-jian-chu-falseyan-jiu-dong-xiang
とりあえずメモ的に雑にまとめてます。
画像分類
CLIP
https://trail.t.u-tokyo.ac.jp/ja/blog/22-12-02-clip/
https://note.com/mori_tokuiten/n/n4e230bdda0da
https://note.com/npaka/n/n113761af8c5d
BLIP-2
https://github.com/salesforce/LAVIS/tree/main/projects/blip2
物体検出
Detic
https://github.com/facebookresearch/Detic
https://zenn.dev/kaeru39/articles/ed00244dbb32cd
GRiT
https://github.com/JialianW/GRiT
GroundingDINO
https://github.com/IDEA-Research/GroundingDINO
汎用
LLM全体
https://speakerdeck.com/ryotatanaka/recent-trends-in-llm-based-visual-document-understanding
GPT-4V
https://arxiv.org/abs/2309.17421
https://www.docswell.com/s/DeepLearning2023/56YD7E-2023-12-08-124618
https://arxiv.org/abs/2310.16809
https://github.com/DLYuanGod/TinyGPT-V?tab=readme-ov-file#launching-demo-locally
Gemini
https://note.com/it_navi/n/nd0beb06f5360
https://note.com/npaka/n/n166bc3df3abc
https://github.com/GoogleCloudPlatform/generative-ai
https://note.com/masayuki_abe/n/na744c92a8182
https://blog.brainpad.co.jp/entry/2023/12/22/153000
LLaVa
https://zenn.dev/kazuhito/articles/da87c94004e1f4
https://nowokay.hatenablog.com/entry/2023/10/13/113343
https://nowokay.hatenablog.com/entry/2023/10/17/153657
https://huggingface.co/cyberagent/llava-calm2-siglip
応用したシステム・アプリ
Video Chat
https://github.com/OpenGVLab/Ask-Anything
動画検索システム
https://zenn.dev/turing_motors/articles/ai-movie-searcher
https://note.com/eurekachan/n/n9d4f62b80ad6
Language Segment-Anything
https://zenn.dev/turing_motors/articles/c3291423ab914c
ベンチマークソフト
https://qiita.com/toshi_456/items/050a4ba98d90b7ca7bac
参考
https://zenn.dev/elith/articles/7bdde4f8650f9b
https://zenn.dev/turing_motors/articles/353a6e71a1444c
https://note.com/npaka/n/n1ebe218e95d0
https://twitter.com/ai_database/status/1737473689896030631
https://techblog.exawizards.com/entry/2023/05/10/055218
https://speakerdeck.com/ssii/ssii2024-os3-ijiri
Discussion