🤗
🤗BLIPで画像キャプション生成するPythonスクリプト🤗
実装
app.py
import sys
import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
if len(sys.argv) == 1:
print("URL of Image is missing!")
print("python3 app.py [image_url]")
sys.exit(1)
device = "cuda" if torch.cuda.is_available() else "cpu"
image_url = sys.argv[1]
raw_image = Image.open(requests.get(image_url, stream=True).raw).convert('RGB')
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
model.to(device)
inputs = processor(raw_image, return_tensors="pt").to(device)
outputs = model.generate(**inputs, min_length=20, max_length=50)
caption = processor.decode(outputs[0], skip_special_tokens=True)
print(caption)
環境構築
pip3 install requests pillow transformers
transformers
は爆速で開発されているので古かったら以下を実行するべし
pip install --upgrade transformers
実行
python3 app.py https://i.gyazo.com/0301ae28236d7a50abedb0f2670bf170.jpg
出力
a martini cocktail with a view of the city skyline and a view of the cityscaing the city
Discussion