External Mask Tracking with SAM 3¶
This notebook demonstrates how to use external segmentation masks with SAM 3's tracking capability. This is useful when:
- SAM 3's text prompts don't segment the exact objects you need
- You have masks from another segmentation model (YOLO, Detectron2, GroundingDINO, etc.)
- You want to use a specialized model for initial segmentation and SAM 3 for tracking
Workflow¶
- Initialize tracker - Call
init_tracker()with any text prompt to initialize SAM3's tracker - Add external masks - Use
add_mask_prompt()to add masks from your external model - Propagate - Use SAM 3's tracking to propagate masks through the video
Installation¶
SAM 3 requires CUDA-capable GPU. Install with:
# %pip install "segment-geospatial[samgeo3]"
Import Libraries¶
import os
import numpy as np
from samgeo import SamGeo3Video, download_file
Download Sample Video¶
url = "https://github.com/opengeos/datasets/releases/download/videos/cars.mp4"
video_path = download_file(url)
Method 1: Using SAM 3's Text Prompt to Generate Initial Masks¶
In this example, we'll use SAM 3's own text prompt to generate masks, then demonstrate how to use those masks with the add_mask_prompt() method. In a real workflow, you would replace this with masks from your preferred external model.
# Initialize SAM 3 video predictor
sam = SamGeo3Video()
# Load the video
sam.set_video(video_path)
Step 1: Generate initial masks using text prompt¶
This simulates getting masks from an external model. In practice, you would use your own segmentation model here.
# Generate masks using text prompt (simulating external model output)
sam.generate_masks("car")
# Show the segmentation results
sam.show_frame(0, axis="on")
Step 2: Extract masks from frame 0¶
Extract the binary masks that we'll use to demonstrate the add_mask_prompt() method.
# Extract masks from the first frame
formatted_outputs = sam._format_outputs()
frame_0_masks = formatted_outputs.get(0, {})
# Store masks and their object IDs
external_masks = []
original_obj_ids = []
for obj_id, mask in frame_0_masks.items():
if isinstance(obj_id, int):
external_masks.append(np.array(mask))
original_obj_ids.append(obj_id)
print(f"Extracted {len(external_masks)} masks from frame 0")
print(f"Object IDs: {original_obj_ids}")
Step 3: Add external masks for tracking¶
Now add the extracted masks using add_mask_prompt(). Use new object IDs (starting from 100) to avoid conflicts with the text-prompt detected objects.
First, let's initialize the tracker.
sam.init_tracker()
# Add masks one by one using add_mask_prompt()
# Use obj_ids starting from 100 to avoid conflicts with detected objects
for i, mask in enumerate(external_masks):
sam.add_mask_prompt(mask, obj_id=100 + i, frame_idx=0)
# Propagate masks through the video
sam.propagate()
# Show results
sam.show_frame(0, axis="on")
# Show multiple frames
sam.show_frames(frame_stride=20, ncols=3)
Method 2: Using add_masks_prompt() for Multiple Masks¶
For convenience, you can add multiple masks at once using add_masks_prompt().
Note: Object IDs are auto-assigned starting from 100 to avoid conflicts.
sam.init_tracker()
# Add all masks at once (IDs will be auto-assigned starting from 100)
sam.add_masks_prompt(external_masks)
# Propagate
sam.propagate()
# Show results
sam.show_frames(frame_stride=20, ncols=3)
Save Results¶
os.makedirs("output", exist_ok=True)
# Save mask images
sam.save_masks("output/external_masks")
# Save video with blended masks
sam.save_video("output/external_tracked.mp4", fps=25)
Clean Up¶
# Close session to free GPU resources
sam.close()
sam.shutdown()
Summary¶
The add_mask_prompt() and add_masks_prompt() methods allow you to:
- Use any segmentation model - Not limited to SAM 3's text prompts
- Get accurate initial segmentation - Use specialized models for your specific objects
- Leverage SAM 3's tracking - Powerful temporal tracking through the video
- Selective tracking - Choose which objects to track
Important Workflow Notes¶
- Initialize tracker first: Always call
init_tracker()beforeadd_mask_prompt() - Don't reset: Do not call
reset()before adding external masks - Use unique IDs: Use
obj_id >= 100to avoid conflicts with text-detected objects
API Reference¶
add_mask_prompt(mask, obj_id, frame_idx=0, num_points=5)- Add a single maskadd_masks_prompt(masks, obj_ids=None, frame_idx=0)- Add multiple masks at once
Mask Requirements¶
- Binary mask array of shape
(H, W) - Values:
True/1for object,False/0for background - Automatically resized to match video frame dimensions