Image Captioning¶
This notebook demonstrates how to perform image captioning and feature extraction using the BLIP model and spaCy NLP processing. The ImageCaptioner class provides a convenient interface for:
- Generating captions for images from local files or URLs
- Extracting meaningful features (nouns) from captions
- Filtering features using predefined aerial vocabulary or custom lists
Installation¶
Uncomment the following line to install the required packages if needed.
# %pip install "segment-geospatial[samgeo3]"
Import Libraries¶
from samgeo.caption import ImageCaptioner, blip_analyze_image, show_image
Initialize the ImageCaptioner¶
Create an ImageCaptioner instance. You can customize the models used:
blip_model_name: The BLIP model for caption generation (default:"Salesforce/blip-image-captioning-base")spacy_model_name: The spaCy model for NLP processing (default:"en_core_web_sm")device: The device to run inference on ("cuda","mps", or"cpu"). Auto-detected if not specified.
Available BLIP models:
Salesforce/blip-image-captioning-base(default, ~990MB)Salesforce/blip-image-captioning-large(larger, more accurate, ~1.9GB)
captioner = ImageCaptioner(
blip_model_name="Salesforce/blip-image-captioning-base",
spacy_model_name="en_core_web_sm",
)
Example 1: Building Image¶
Let's analyze an aerial image of a building.
url1 = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/caption-building.webp"
show_image(url1)
Basic Analysis¶
Use the analyze() method to generate a caption and extract all noun features.
caption, features = captioner.analyze(url1)
print(f"Caption: {caption}")
print(f"Features: {features}")
caption, features = captioner.analyze(url1, include_features="default")
print(f"Caption: {caption}")
print(f"Aerial Features: {features}")
Custom Feature Filtering¶
You can also provide a custom list of features to look for, and exclude specific features.
# Look only for specific features
caption, features = captioner.analyze(
url1, include_features=["building", "parking_lot", "road", "car", "tree"]
)
print(f"Caption: {caption}")
print(f"Custom Features: {features}")
# Exclude certain features from results
caption, features = captioner.analyze(url1, exclude_features=["view", "image"])
print(f"Caption: {caption}")
print(f"Features (excluding 'view', 'image'): {features}")
Example 2: Traffic Sign Image¶
Let's analyze a different type of image - a traffic sign.
url2 = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/caption-traffic-sign.webp"
show_image(url2)
caption, features = captioner.analyze(url2)
print(f"Caption: {caption}")
print(f"Features: {features}")
# Using aerial vocabulary
caption, features = captioner.analyze(url2, include_features="default")
print(f"Caption: {caption}")
print(f"Aerial Features: {features}")
Using Individual Methods¶
The ImageCaptioner class also provides individual methods for more granular control:
generate_caption(): Generate only the captionextract_features(): Extract features from an existing caption
# Generate caption only
caption = captioner.generate_caption(url1)
print(f"Caption: {caption}")
# Extract features from an existing caption
features = captioner.extract_features(caption)
print(f"All Features: {features}")
aerial_features = captioner.extract_features(caption, include_features="default")
print(f"Aerial Features: {aerial_features}")
Using the Convenience Function¶
For quick one-off analyses, you can use the blip_analyze_image() function directly without creating an ImageCaptioner instance. You can also specify custom models.
# Quick analysis with default models
caption, features = blip_analyze_image(url1)
print(f"Caption: {caption}")
print(f"Features: {features}")
# Using a larger BLIP model for potentially better captions
caption, features = blip_analyze_image(
url1,
include_features="default",
blip_model_name="Salesforce/blip-image-captioning-large",
)
print(f"Caption (large model): {caption}")
print(f"Aerial Features: {features}")