🤗

🤗BLIPで画像キャプション生成するPythonスクリプト🤗

2023/03/26に公開

実装

app.py
import sys
import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration

if len(sys.argv) == 1:
  print("URL of Image is missing!")
  print("python3 app.py [image_url]")
  sys.exit(1)

device = "cuda" if torch.cuda.is_available() else "cpu"

image_url = sys.argv[1]
raw_image = Image.open(requests.get(image_url, stream=True).raw).convert('RGB')

processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
model.to(device)
inputs = processor(raw_image, return_tensors="pt").to(device)
outputs = model.generate(**inputs, min_length=20, max_length=50)
caption = processor.decode(outputs[0], skip_special_tokens=True)
print(caption)

環境構築

pip3 install requests pillow transformers

transformers は爆速で開発されているので古かったら以下を実行するべし

pip install --upgrade transformers

実行

python3 app.py https://i.gyazo.com/0301ae28236d7a50abedb0f2670bf170.jpg

出力

a martini cocktail with a view of the city skyline and a view of the cityscaing the city

Discussion