A newer version of the Gradio SDK is available:
6.1.0
title: SAM2 Video Background Remover
emoji: ๐ฅ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: apache-2.0
tags:
- computer-vision
- video
- segmentation
- sam2
- background-removal
- object-tracking
๐ฅ SAM2 Video Background Remover
Remove backgrounds from videos by tracking objects using Meta's Segment Anything Model 2 (SAM2).
Features
โจ Background Removal: Automatically remove backgrounds and keep only tracked objects
๐ฏ Object Tracking: Track multiple objects across video frames
๐ฅ๏ธ Interactive UI: Easy-to-use Gradio interface
๐ REST API: Programmatic access via API endpoints
โก GPU Accelerated: Fast processing with CUDA support
How It Works
SAM2 is a foundation model for video segmentation that can:
- Segment objects based on point or box annotations
- Track objects automatically across all video frames
- Handle occlusions and object reappearance
- Process multiple objects simultaneously
Usage
๐ฑ๏ธ Simple Mode (Web UI)
- Upload your video
- Specify X,Y coordinates of the object you want to track (from first frame)
- Click "Process Video"
- Download the result with background removed!
Example: For a 640x480 video with a person in the center, use X=320, Y=240
๐ง Advanced Mode (JSON Annotations)
For more control, use JSON annotations:
[
{
"frame_idx": 0,
"object_id": 1,
"points": [[320, 240]],
"labels": [1]
}
]
Parameters:
frame_idx: Frame number to annotate (0 = first frame)object_id: Unique ID for each object (1, 2, 3, ...)points: List of [x, y] coordinates on the objectlabels:1for foreground point,0for background point
๐ก API Usage
You can call this Space programmatically using the Gradio Client:
Python Example
from gradio_client import Client
import json
# Connect to the Space
client = Client("YOUR_USERNAME/sam2-video-bg-remover")
# Define what to track
annotations = [
{
"frame_idx": 0,
"object_id": 1,
"points": [[320, 240]], # x, y coordinates
"labels": [1] # 1 = foreground
}
]
# Process video
result = client.predict(
video_file="./input_video.mp4",
annotations_json=json.dumps(annotations),
remove_background=True,
max_frames=300, # Limit frames for faster processing
api_name="/segment_video_api"
)
print(f"Output video saved to: {result}")
Track Multiple Objects
annotations = [
# First object (person)
{
"frame_idx": 0,
"object_id": 1,
"points": [[320, 240]],
"labels": [1]
},
# Second object (ball)
{
"frame_idx": 0,
"object_id": 2,
"points": [[500, 300]],
"labels": [1]
}
]
Refine Segmentation with Background Points
annotations = [
{
"frame_idx": 0,
"object_id": 1,
"points": [
[320, 240], # Point ON the object
[100, 100] # Point on background to exclude
],
"labels": [1, 0] # 1=foreground, 0=background
}
]
๐ HTTP API
You can also call the API directly via HTTP:
curl -X POST https://YOUR_USERNAME-sam2-video-bg-remover.hf.space/api/predict \
-F "video_file=@input_video.mp4" \
-F 'annotations_json=[{"frame_idx":0,"object_id":1,"points":[[320,240]],"labels":[1]}]' \
-F "remove_background=true" \
-F "max_frames=300"
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
video_file |
File | - | Input video file (required) |
annotations_json |
String | - | JSON array of annotations (required) |
remove_background |
Boolean | true |
Remove background or just highlight objects |
max_frames |
Integer | null |
Limit frames for faster processing |
Tips & Best Practices
๐ฏ Getting Good Results
- Choose Clear Points: Click on the center/most distinctive part of your object
- Add Multiple Points: For complex objects, add 2-3 points on different parts
- Use Background Points: Add points with
label: 0on areas you DON'T want - Annotate Key Frames: If object changes significantly, add annotations on multiple frames
โก Performance Tips
- Limit Frames: Use
max_framesparameter for long videos - Use Smaller Model: Default is
sam2.1-hiera-tinyfor speed - Process Shorter Clips: Split long videos into segments
๐ Troubleshooting
| Issue | Solution |
|---|---|
| Object not tracked | Add more points on different parts of the object |
| Background leakage | Add background points with label: 0 |
| Slow processing | Reduce max_frames or use a shorter video |
| Wrong object tracked | Be more precise with point coordinates |
Model Information
This Space uses facebook/sam2.1-hiera-tiny for efficient processing. Other available models:
facebook/sam2.1-hiera-tiny- Fastest, good quality โกfacebook/sam2.1-hiera-small- Balancedfacebook/sam2.1-hiera-base-plus- Higher qualityfacebook/sam2.1-hiera-large- Best quality, slower ๐ฏ
Use Cases
- ๐ฌ Video Production: Remove backgrounds for green screen effects
- ๐ Sports Analysis: Isolate athletes for motion analysis
- ๐ฎ Content Creation: Extract game characters or objects
- ๐ฌ Research: Track objects in scientific videos
- ๐ฑ Social Media: Create engaging content with background removal
Limitations
- Video length affects processing time (longer = slower)
- GPU recommended for videos > 10 seconds
- Very fast-moving objects may require multiple annotations
- Extreme lighting changes can affect tracking quality
Citation
If you use this Space, please cite the SAM2 paper:
@article{ravi2024sam2,
title={Segment Anything in Images and Videos},
author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and others},
journal={arXiv preprint arXiv:2408.00714},
year={2024}
}
License
Apache 2.0
Links
- ๐ SAM2 Documentation
- ๐ค Model on Hugging Face
- ๐ Research Paper
- ๐ป Original Repository
Built with โค๏ธ using Transformers and Gradio