|
|
--- |
|
|
title: SAM2 Video Background Remover |
|
|
emoji: ๐ฅ |
|
|
colorFrom: blue |
|
|
colorTo: purple |
|
|
sdk: gradio |
|
|
sdk_version: 4.44.0 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- computer-vision |
|
|
- video |
|
|
- segmentation |
|
|
- sam2 |
|
|
- background-removal |
|
|
- object-tracking |
|
|
--- |
|
|
|
|
|
# ๐ฅ SAM2 Video Background Remover |
|
|
|
|
|
Remove backgrounds from videos by tracking objects using Meta's **Segment Anything Model 2 (SAM2)**. |
|
|
|
|
|
## Features |
|
|
|
|
|
โจ **Background Removal**: Automatically remove backgrounds and keep only tracked objects |
|
|
๐ฏ **Object Tracking**: Track multiple objects across video frames |
|
|
๐ฅ๏ธ **Interactive UI**: Easy-to-use Gradio interface |
|
|
๐ **REST API**: Programmatic access via API endpoints |
|
|
โก **GPU Accelerated**: Fast processing with CUDA support |
|
|
|
|
|
## How It Works |
|
|
|
|
|
SAM2 is a foundation model for video segmentation that can: |
|
|
1. **Segment objects** based on point or box annotations |
|
|
2. **Track objects** automatically across all video frames |
|
|
3. **Handle occlusions** and object reappearance |
|
|
4. **Process multiple objects** simultaneously |
|
|
|
|
|
## Usage |
|
|
|
|
|
### ๐ฑ๏ธ Simple Mode (Web UI) |
|
|
|
|
|
1. Upload your video |
|
|
2. Specify X,Y coordinates of the object you want to track (from first frame) |
|
|
3. Click "Process Video" |
|
|
4. Download the result with background removed! |
|
|
|
|
|
**Example**: For a 640x480 video with a person in the center, use X=320, Y=240 |
|
|
|
|
|
### ๐ง Advanced Mode (JSON Annotations) |
|
|
|
|
|
For more control, use JSON annotations: |
|
|
|
|
|
```json |
|
|
[ |
|
|
{ |
|
|
"frame_idx": 0, |
|
|
"object_id": 1, |
|
|
"points": [[320, 240]], |
|
|
"labels": [1] |
|
|
} |
|
|
] |
|
|
``` |
|
|
|
|
|
**Parameters**: |
|
|
- `frame_idx`: Frame number to annotate (0 = first frame) |
|
|
- `object_id`: Unique ID for each object (1, 2, 3, ...) |
|
|
- `points`: List of [x, y] coordinates on the object |
|
|
- `labels`: `1` for foreground point, `0` for background point |
|
|
|
|
|
### ๐ก API Usage |
|
|
|
|
|
You can call this Space programmatically using the Gradio Client: |
|
|
|
|
|
#### Python Example |
|
|
|
|
|
```python |
|
|
from gradio_client import Client |
|
|
import json |
|
|
|
|
|
# Connect to the Space |
|
|
client = Client("YOUR_USERNAME/sam2-video-bg-remover") |
|
|
|
|
|
# Define what to track |
|
|
annotations = [ |
|
|
{ |
|
|
"frame_idx": 0, |
|
|
"object_id": 1, |
|
|
"points": [[320, 240]], # x, y coordinates |
|
|
"labels": [1] # 1 = foreground |
|
|
} |
|
|
] |
|
|
|
|
|
# Process video |
|
|
result = client.predict( |
|
|
video_file="./input_video.mp4", |
|
|
annotations_json=json.dumps(annotations), |
|
|
remove_background=True, |
|
|
max_frames=300, # Limit frames for faster processing |
|
|
api_name="/segment_video_api" |
|
|
) |
|
|
|
|
|
print(f"Output video saved to: {result}") |
|
|
``` |
|
|
|
|
|
#### Track Multiple Objects |
|
|
|
|
|
```python |
|
|
annotations = [ |
|
|
# First object (person) |
|
|
{ |
|
|
"frame_idx": 0, |
|
|
"object_id": 1, |
|
|
"points": [[320, 240]], |
|
|
"labels": [1] |
|
|
}, |
|
|
# Second object (ball) |
|
|
{ |
|
|
"frame_idx": 0, |
|
|
"object_id": 2, |
|
|
"points": [[500, 300]], |
|
|
"labels": [1] |
|
|
} |
|
|
] |
|
|
``` |
|
|
|
|
|
#### Refine Segmentation with Background Points |
|
|
|
|
|
```python |
|
|
annotations = [ |
|
|
{ |
|
|
"frame_idx": 0, |
|
|
"object_id": 1, |
|
|
"points": [ |
|
|
[320, 240], # Point ON the object |
|
|
[100, 100] # Point on background to exclude |
|
|
], |
|
|
"labels": [1, 0] # 1=foreground, 0=background |
|
|
} |
|
|
] |
|
|
``` |
|
|
|
|
|
### ๐ HTTP API |
|
|
|
|
|
You can also call the API directly via HTTP: |
|
|
|
|
|
```bash |
|
|
curl -X POST https://YOUR_USERNAME-sam2-video-bg-remover.hf.space/api/predict \ |
|
|
-F "video_file=@input_video.mp4" \ |
|
|
-F 'annotations_json=[{"frame_idx":0,"object_id":1,"points":[[320,240]],"labels":[1]}]' \ |
|
|
-F "remove_background=true" \ |
|
|
-F "max_frames=300" |
|
|
``` |
|
|
|
|
|
## Parameters |
|
|
|
|
|
| Parameter | Type | Default | Description | |
|
|
|-----------|------|---------|-------------| |
|
|
| `video_file` | File | - | Input video file (required) | |
|
|
| `annotations_json` | String | - | JSON array of annotations (required) | |
|
|
| `remove_background` | Boolean | `true` | Remove background or just highlight objects | |
|
|
| `max_frames` | Integer | `null` | Limit frames for faster processing | |
|
|
|
|
|
## Tips & Best Practices |
|
|
|
|
|
### ๐ฏ Getting Good Results |
|
|
|
|
|
1. **Choose Clear Points**: Click on the center/most distinctive part of your object |
|
|
2. **Add Multiple Points**: For complex objects, add 2-3 points on different parts |
|
|
3. **Use Background Points**: Add points with `label: 0` on areas you DON'T want |
|
|
4. **Annotate Key Frames**: If object changes significantly, add annotations on multiple frames |
|
|
|
|
|
### โก Performance Tips |
|
|
|
|
|
1. **Limit Frames**: Use `max_frames` parameter for long videos |
|
|
2. **Use Smaller Model**: Default is `sam2.1-hiera-tiny` for speed |
|
|
3. **Process Shorter Clips**: Split long videos into segments |
|
|
|
|
|
### ๐ Troubleshooting |
|
|
|
|
|
| Issue | Solution | |
|
|
|-------|----------| |
|
|
| Object not tracked | Add more points on different parts of the object | |
|
|
| Background leakage | Add background points with `label: 0` | |
|
|
| Slow processing | Reduce `max_frames` or use a shorter video | |
|
|
| Wrong object tracked | Be more precise with point coordinates | |
|
|
|
|
|
## Model Information |
|
|
|
|
|
This Space uses **facebook/sam2.1-hiera-tiny** for efficient processing. Other available models: |
|
|
|
|
|
- `facebook/sam2.1-hiera-tiny` - Fastest, good quality โก |
|
|
- `facebook/sam2.1-hiera-small` - Balanced |
|
|
- `facebook/sam2.1-hiera-base-plus` - Higher quality |
|
|
- `facebook/sam2.1-hiera-large` - Best quality, slower ๐ฏ |
|
|
|
|
|
## Use Cases |
|
|
|
|
|
- ๐ฌ **Video Production**: Remove backgrounds for green screen effects |
|
|
- ๐ **Sports Analysis**: Isolate athletes for motion analysis |
|
|
- ๐ฎ **Content Creation**: Extract game characters or objects |
|
|
- ๐ฌ **Research**: Track objects in scientific videos |
|
|
- ๐ฑ **Social Media**: Create engaging content with background removal |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Video length affects processing time (longer = slower) |
|
|
- GPU recommended for videos > 10 seconds |
|
|
- Very fast-moving objects may require multiple annotations |
|
|
- Extreme lighting changes can affect tracking quality |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this Space, please cite the SAM2 paper: |
|
|
|
|
|
```bibtex |
|
|
@article{ravi2024sam2, |
|
|
title={Segment Anything in Images and Videos}, |
|
|
author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and others}, |
|
|
journal={arXiv preprint arXiv:2408.00714}, |
|
|
year={2024} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 |
|
|
|
|
|
## Links |
|
|
|
|
|
- ๐ [SAM2 Documentation](https://huggingface.co/docs/transformers/model_doc/sam2_video) |
|
|
- ๐ค [Model on Hugging Face](https://huggingface.co/facebook/sam2.1-hiera-tiny) |
|
|
- ๐ [Research Paper](https://arxiv.org/abs/2408.00714) |
|
|
- ๐ป [Original Repository](https://github.com/facebookresearch/segment-anything-2) |
|
|
|
|
|
--- |
|
|
|
|
|
Built with โค๏ธ using [Transformers](https://github.com/huggingface/transformers) and [Gradio](https://gradio.app) |
|
|
|
|
|
|