Spaces:

furbola
/

chaskick

Running on Zero

App Files Files Community

Mirko Trasciatti commited on 29 days ago

Commit

3cddaf8

1 Parent(s): 69f4f56

Deploy SAM2 Video Background Remover with Gradio UI and API

Browse files

Files changed (5) hide show

.gitignore +64 -0
README.md +233 -7
api_example.py +356 -0
app.py +540 -0
requirements.txt +8 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,64 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual environments
+.venv
+venv/
+ENV/
+env/
+# IDEs
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# Gradio
+flagged/
+gradio_cached_examples/
+# Temporary files
+*.tmp
+*.temp
+tmp/
+temp/
+# Video files (if you don't want to commit test videos)
+*.mp4
+*.avi
+*.mov
+*.mkv
+!example_*.mp4
+# Model cache
+.cache/
+models/
+# OS
+.DS_Store
+Thumbs.db
+# Logs
+*.log
+logs/

README.md CHANGED Viewed

@@ -1,14 +1,240 @@
 ---
-title: Chaskick
-emoji: 🦀
-colorFrom: purple
-colorTo: green
 sdk: gradio
-sdk_version: 5.49.1
 app_file: app.py
 pinned: false
 license: apache-2.0
-short_description: BKGRMV
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: SAM2 Video Background Remover
+emoji: 🎥
+colorFrom: blue
+colorTo: purple
 sdk: gradio
+sdk_version: 4.44.0
 app_file: app.py
 pinned: false
 license: apache-2.0
+tags:
+  - computer-vision
+  - video
+  - segmentation
+  - sam2
+  - background-removal
+  - object-tracking
 ---
+# 🎥 SAM2 Video Background Remover
+Remove backgrounds from videos by tracking objects using Meta's **Segment Anything Model 2 (SAM2)**.
+## Features
+✨ **Background Removal**: Automatically remove backgrounds and keep only tracked objects
+🎯 **Object Tracking**: Track multiple objects across video frames
+🖥️ **Interactive UI**: Easy-to-use Gradio interface
+🔌 **REST API**: Programmatic access via API endpoints
+⚡ **GPU Accelerated**: Fast processing with CUDA support
+## How It Works
+SAM2 is a foundation model for video segmentation that can:
+1. **Segment objects** based on point or box annotations
+2. **Track objects** automatically across all video frames
+3. **Handle occlusions** and object reappearance
+4. **Process multiple objects** simultaneously
+## Usage
+### 🖱️ Simple Mode (Web UI)
+1. Upload your video
+2. Specify X,Y coordinates of the object you want to track (from first frame)
+3. Click "Process Video"
+4. Download the result with background removed!
+**Example**: For a 640x480 video with a person in the center, use X=320, Y=240
+### 🔧 Advanced Mode (JSON Annotations)
+For more control, use JSON annotations:
+```json
+[
+    {
+        "frame_idx": 0,
+        "object_id": 1,
+        "points": [[320, 240]],
+        "labels": [1]
+    }
+]
+```
+**Parameters**:
+- `frame_idx`: Frame number to annotate (0 = first frame)
+- `object_id`: Unique ID for each object (1, 2, 3, ...)
+- `points`: List of [x, y] coordinates on the object
+- `labels`: `1` for foreground point, `0` for background point
+### 📡 API Usage
+You can call this Space programmatically using the Gradio Client:
+#### Python Example
+```python
+from gradio_client import Client
+import json
+# Connect to the Space
+client = Client("YOUR_USERNAME/sam2-video-bg-remover")
+# Define what to track
+annotations = [
+    {
+        "frame_idx": 0,
+        "object_id": 1,
+        "points": [[320, 240]],  # x, y coordinates
+        "labels": [1]             # 1 = foreground
+    }
+]
+# Process video
+result = client.predict(
+    video_file="./input_video.mp4",
+    annotations_json=json.dumps(annotations),
+    remove_background=True,
+    max_frames=300,  # Limit frames for faster processing
+    api_name="/segment_video_api"
+)
+print(f"Output video saved to: {result}")
+```
+#### Track Multiple Objects
+```python
+annotations = [
+    # First object (person)
+    {
+        "frame_idx": 0,
+        "object_id": 1,
+        "points": [[320, 240]],
+        "labels": [1]
+    },
+    # Second object (ball)
+    {
+        "frame_idx": 0,
+        "object_id": 2,
+        "points": [[500, 300]],
+        "labels": [1]
+    }
+]
+```
+#### Refine Segmentation with Background Points
+```python
+annotations = [
+    {
+        "frame_idx": 0,
+        "object_id": 1,
+        "points": [
+            [320, 240],  # Point ON the object
+            [100, 100]   # Point on background to exclude
+        ],
+        "labels": [1, 0]  # 1=foreground, 0=background
+    }
+]
+```
+### 🌐 HTTP API
+You can also call the API directly via HTTP:
+```bash
+curl -X POST https://YOUR_USERNAME-sam2-video-bg-remover.hf.space/api/predict \
+  -F "video_file=@input_video.mp4" \
+  -F 'annotations_json=[{"frame_idx":0,"object_id":1,"points":[[320,240]],"labels":[1]}]' \
+  -F "remove_background=true" \
+  -F "max_frames=300"
+```
+## Parameters
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `video_file` | File | - | Input video file (required) |
+| `annotations_json` | String | - | JSON array of annotations (required) |
+| `remove_background` | Boolean | `true` | Remove background or just highlight objects |
+| `max_frames` | Integer | `null` | Limit frames for faster processing |
+## Tips & Best Practices
+### 🎯 Getting Good Results
+1. **Choose Clear Points**: Click on the center/most distinctive part of your object
+2. **Add Multiple Points**: For complex objects, add 2-3 points on different parts
+3. **Use Background Points**: Add points with `label: 0` on areas you DON'T want
+4. **Annotate Key Frames**: If object changes significantly, add annotations on multiple frames
+### ⚡ Performance Tips
+1. **Limit Frames**: Use `max_frames` parameter for long videos
+2. **Use Smaller Model**: Default is `sam2.1-hiera-tiny` for speed
+3. **Process Shorter Clips**: Split long videos into segments
+### 🐛 Troubleshooting
+| Issue | Solution |
+|-------|----------|
+| Object not tracked | Add more points on different parts of the object |
+| Background leakage | Add background points with `label: 0` |
+| Slow processing | Reduce `max_frames` or use a shorter video |
+| Wrong object tracked | Be more precise with point coordinates |
+## Model Information
+This Space uses **facebook/sam2.1-hiera-tiny** for efficient processing. Other available models:
+- `facebook/sam2.1-hiera-tiny` - Fastest, good quality ⚡
+- `facebook/sam2.1-hiera-small` - Balanced
+- `facebook/sam2.1-hiera-base-plus` - Higher quality
+- `facebook/sam2.1-hiera-large` - Best quality, slower 🎯
+## Use Cases
+- 🎬 **Video Production**: Remove backgrounds for green screen effects
+- 🏃 **Sports Analysis**: Isolate athletes for motion analysis
+- 🎮 **Content Creation**: Extract game characters or objects
+- 🔬 **Research**: Track objects in scientific videos
+- 📱 **Social Media**: Create engaging content with background removal
+## Limitations
+- Video length affects processing time (longer = slower)
+- GPU recommended for videos > 10 seconds
+- Very fast-moving objects may require multiple annotations
+- Extreme lighting changes can affect tracking quality
+## Citation
+If you use this Space, please cite the SAM2 paper:
+```bibtex
+@article{ravi2024sam2,
+  title={Segment Anything in Images and Videos},
+  author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and others},
+  journal={arXiv preprint arXiv:2408.00714},
+  year={2024}
+}
+```
+## License
+Apache 2.0
+## Links
+- 📚 [SAM2 Documentation](https://huggingface.co/docs/transformers/model_doc/sam2_video)
+- 🤗 [Model on Hugging Face](https://huggingface.co/facebook/sam2.1-hiera-tiny)
+- 📄 [Research Paper](https://arxiv.org/abs/2408.00714)
+- 💻 [Original Repository](https://github.com/facebookresearch/segment-anything-2)
+---
+Built with ❤️ using [Transformers](https://github.com/huggingface/transformers) and [Gradio](https://gradio.app)

api_example.py ADDED Viewed

	@@ -0,0 +1,356 @@

+"""
+Example script showing how to use the SAM2 Video Background Remover API.
+This script demonstrates various use cases:
+1. Simple single object tracking
+2. Multiple object tracking
+3. Refined segmentation with background points
+4. Batch processing multiple videos
+"""
+from gradio_client import Client
+import json
+from pathlib import Path
+def example_1_simple_tracking():
+    """
+    Example 1: Track a single object (e.g., person, ball, car)
+    """
+    print("=" * 60)
+    print("Example 1: Simple Single Object Tracking")
+    print("=" * 60)
+    # Connect to your Space
+    client = Client("furbola/chaskick")
+    # Simple annotation: click on the center of your object in the first frame
+    annotations = [
+        {
+            "frame_idx": 0,      # First frame
+            "object_id": 1,      # First object
+            "points": [[320, 240]],  # x, y coordinates of the object center
+            "labels": [1]        # 1 = this is a foreground point
+        }
+    ]
+    # Process the video
+    result = client.predict(
+        video_file="./input_video.mp4",
+        annotations_json=json.dumps(annotations),
+        remove_background=True,
+        max_frames=None,  # Process all frames
+        api_name="/segment_video_api"
+    )
+    print(f"✅ Output saved to: {result}")
+def example_2_multi_object_tracking():
+    """
+    Example 2: Track multiple objects simultaneously
+    Useful for: tracking player + ball, multiple people, etc.
+    """
+    print("\n" + "=" * 60)
+    print("Example 2: Multi-Object Tracking")
+    print("=" * 60)
+    client = Client("furbola/chaskick")
+    annotations = [
+        # Object 1: Player
+        {
+            "frame_idx": 0,
+            "object_id": 1,
+            "points": [[320, 240]],
+            "labels": [1]
+        },
+        # Object 2: Ball
+        {
+            "frame_idx": 0,
+            "object_id": 2,
+            "points": [[500, 300]],
+            "labels": [1]
+        },
+        # Object 3: Another player
+        {
+            "frame_idx": 0,
+            "object_id": 3,
+            "points": [[150, 200]],
+            "labels": [1]
+        }
+    ]
+    result = client.predict(
+        video_file="./soccer_match.mp4",
+        annotations_json=json.dumps(annotations),
+        remove_background=True,
+        max_frames=300,  # Limit to 300 frames for speed
+        api_name="/segment_video_api"
+    )
+    print(f"✅ Tracked 3 objects! Output: {result}")
+def example_3_refined_segmentation():
+    """
+    Example 3: Use both foreground AND background points for better accuracy
+    Useful when: object is complex, background is similar color, etc.
+    """
+    print("\n" + "=" * 60)
+    print("Example 3: Refined Segmentation with Negative Points")
+    print("=" * 60)
+    client = Client("furbola/chaskick")
+    annotations = [
+        {
+            "frame_idx": 0,
+            "object_id": 1,
+            "points": [
+                [320, 240],  # ✅ Point ON the person's body
+                [350, 250],  # ✅ Another point on the person
+                [280, 220],  # ✅ Third point for better coverage
+                [100, 100],  # ❌ Point on the BACKGROUND to exclude
+                [600, 400]   # ❌ Another background point
+            ],
+            "labels": [
+                1,  # foreground
+                1,  # foreground
+                1,  # foreground
+                0,  # background (exclude this area)
+                0   # background (exclude this area)
+            ]
+        }
+    ]
+    result = client.predict(
+        video_file="./person_video.mp4",
+        annotations_json=json.dumps(annotations),
+        remove_background=True,
+        max_frames=None,
+        api_name="/segment_video_api"
+    )
+    print(f"✅ Refined segmentation complete: {result}")
+def example_4_temporal_annotations():
+    """
+    Example 4: Add annotations on multiple frames
+    Useful when: object changes appearance, camera cuts, occlusions
+    """
+    print("\n" + "=" * 60)
+    print("Example 4: Multi-Frame Annotations")
+    print("=" * 60)
+    client = Client("furbola/chaskick")
+    annotations = [
+        # Annotate frame 0
+        {
+            "frame_idx": 0,
+            "object_id": 1,
+            "points": [[320, 240]],
+            "labels": [1]
+        },
+        # Annotate frame 50 (object might have moved or changed)
+        {
+            "frame_idx": 50,
+            "object_id": 1,
+            "points": [[450, 300]],
+            "labels": [1]
+        },
+        # Annotate frame 100 (after a camera cut or scene change)
+        {
+            "frame_idx": 100,
+            "object_id": 1,
+            "points": [[200, 180]],
+            "labels": [1]
+        }
+    ]
+    result = client.predict(
+        video_file="./long_video.mp4",
+        annotations_json=json.dumps(annotations),
+        remove_background=True,
+        max_frames=None,
+        api_name="/segment_video_api"
+    )
+    print(f"✅ Multi-frame tracking complete: {result}")
+def example_5_batch_processing():
+    """
+    Example 5: Process multiple videos in batch
+    """
+    print("\n" + "=" * 60)
+    print("Example 5: Batch Processing Multiple Videos")
+    print("=" * 60)
+    client = Client("furbola/chaskick")
+    # List of videos to process
+    videos = [
+        {"path": "./video1.mp4", "point": [320, 240]},
+        {"path": "./video2.mp4", "point": [400, 300]},
+        {"path": "./video3.mp4", "point": [250, 200]},
+    ]
+    results = []
+    for i, video in enumerate(videos, 1):
+        print(f"\nProcessing video {i}/{len(videos)}: {video['path']}")
+        annotations = [{
+            "frame_idx": 0,
+            "object_id": 1,
+            "points": [video['point']],
+            "labels": [1]
+        }]
+        try:
+            result = client.predict(
+                video_file=video['path'],
+                annotations_json=json.dumps(annotations),
+                remove_background=True,
+                max_frames=200,  # Limit frames for faster batch processing
+                api_name="/segment_video_api"
+            )
+            results.append({"input": video['path'], "output": result, "status": "✅"})
+            print(f"  ✅ Success: {result}")
+        except Exception as e:
+            results.append({"input": video['path'], "output": None, "status": f"❌ {str(e)}"})
+            print(f"  ❌ Failed: {e}")
+    print("\n" + "=" * 60)
+    print("Batch Processing Summary:")
+    print("=" * 60)
+    for r in results:
+        print(f"{r['status']} {r['input']} -> {r['output']}")
+def example_6_highlight_mode():
+    """
+    Example 6: Highlight objects instead of removing background
+    Useful for: visualization, debugging, object detection demos
+    """
+    print("\n" + "=" * 60)
+    print("Example 6: Highlight Mode (Keep Background)")
+    print("=" * 60)
+    client = Client("furbola/chaskick")
+    annotations = [{
+        "frame_idx": 0,
+        "object_id": 1,
+        "points": [[320, 240]],
+        "labels": [1]
+    }]
+    result = client.predict(
+        video_file="./input_video.mp4",
+        annotations_json=json.dumps(annotations),
+        remove_background=False,  # Keep background, just highlight the object
+        max_frames=None,
+        api_name="/segment_video_api"
+    )
+    print(f"✅ Object highlighted: {result}")
+def example_7_find_coordinates():
+    """
+    Example 7: Helper to find coordinates in a video
+    Opens the first frame so you can identify x,y coordinates
+    """
+    print("\n" + "=" * 60)
+    print("Example 7: Find Coordinates Helper")
+    print("=" * 60)
+    import cv2
+    video_path = "./input_video.mp4"
+    # Read first frame
+    cap = cv2.VideoCapture(video_path)
+    ret, frame = cap.read()
+    cap.release()
+    if ret:
+        # Save first frame
+        cv2.imwrite("first_frame.jpg", frame)
+        print(f"✅ Saved first frame to: first_frame.jpg")
+        print(f"   Video size: {frame.shape[1]}x{frame.shape[0]} (width x height)")
+        print(f"   Open this image and note the x,y coordinates of your object")
+        print(f"   Then use those coordinates in your annotation!")
+    else:
+        print("❌ Could not read video")
+# ============================================================================
+# UTILITY FUNCTIONS
+# ============================================================================
+def create_annotation(frame_idx, object_id, points, labels=None):
+    """
+    Helper function to create annotation objects.
+    Args:
+        frame_idx: Frame number (0 = first frame)
+        object_id: Unique object ID (1, 2, 3, ...)
+        points: List of [x, y] coordinates, e.g., [[320, 240]]
+        labels: List of labels (1=foreground, 0=background). Defaults to all 1s.
+    Returns:
+        Dictionary with annotation
+    """
+    if labels is None:
+        labels = [1] * len(points)
+    return {
+        "frame_idx": frame_idx,
+        "object_id": object_id,
+        "points": points,
+        "labels": labels
+    }
+def load_annotations_from_file(json_file):
+    """Load annotations from a JSON file."""
+    with open(json_file, 'r') as f:
+        return json.load(f)
+def save_annotations_to_file(annotations, json_file):
+    """Save annotations to a JSON file."""
+    with open(json_file, 'w') as f:
+        json.dump(annotations, f, indent=2)
+# ============================================================================
+# MAIN
+# ============================================================================
+if __name__ == "__main__":
+    print("""
+    ╔════════════════════════════════════════════════════════════╗
+    ║  SAM2 Video Background Remover - API Examples              ║
+    ║  Choose an example to run or uncomment in the code         ║
+    ╚════════════════════════════════════════════════════════════╝
+    """)
+    # Uncomment the examples you want to run:
+    # example_1_simple_tracking()
+    # example_2_multi_object_tracking()
+    # example_3_refined_segmentation()
+    # example_4_temporal_annotations()
+    # example_5_batch_processing()
+    # example_6_highlight_mode()
+    # example_7_find_coordinates()
+    print("\n✅ Done! Check the output files.")
+    print("\n🎉 Your Space: https://huggingface.co/spaces/furbola/chaskick")

app.py ADDED Viewed

	@@ -0,0 +1,540 @@

+"""
+SAM2 Video Segmentation Space
+Removes background from videos by tracking specified objects.
+Provides both Gradio UI and API endpoints.
+"""
+import gradio as gr
+import torch
+import numpy as np
+import cv2
+import tempfile
+import os
+from pathlib import Path
+from typing import List, Tuple, Optional, Dict, Any
+from transformers import Sam2VideoModel, Sam2VideoProcessor
+from transformers.video_utils import load_video
+from PIL import Image
+import json
+# Global model variables
+MODEL_NAME = "facebook/sam2.1-hiera-tiny"  # Options: tiny, small, base-plus, large
+device = None
+model = None
+processor = None
+def initialize_model():
+    """Initialize SAM2 model and processor."""
+    global device, model, processor
+    # Determine device
+    if torch.cuda.is_available():
+        device = torch.device("cuda")
+        dtype = torch.float16
+    elif torch.backends.mps.is_available():
+        device = torch.device("mps")
+        dtype = torch.float32
+    else:
+        device = torch.device("cpu")
+        dtype = torch.float32
+    print(f"Loading SAM2 model on {device}...")
+    # Load model and processor
+    model = Sam2VideoModel.from_pretrained(MODEL_NAME).to(device, dtype=dtype)
+    processor = Sam2VideoProcessor.from_pretrained(MODEL_NAME)
+    print("Model loaded successfully!")
+    return device, model, processor
+def extract_frames_from_video(video_path: str, max_frames: Optional[int] = None) -> Tuple[List[Image.Image], Dict]:
+    """Extract frames from video file."""
+    video_frames, info = load_video(video_path)
+    if max_frames and len(video_frames) > max_frames:
+        # Sample frames uniformly
+        indices = np.linspace(0, len(video_frames) - 1, max_frames, dtype=int)
+        video_frames = [video_frames[i] for i in indices]
+    return video_frames, info
+def create_output_video(
+    video_frames: List[Image.Image],
+    masks: Dict[int, torch.Tensor],
+    output_path: str,
+    fps: float = 30.0,
+    remove_background: bool = True
+) -> str:
+    """
+    Create output video with segmented objects.
+    Args:
+        video_frames: Original video frames
+        masks: Dictionary mapping frame_idx to mask tensors
+        output_path: Path to save output video
+        fps: Frames per second
+        remove_background: If True, remove background; if False, highlight objects
+    """
+    if not masks:
+        raise ValueError("No masks provided")
+    # Get first frame to determine dimensions
+    first_frame = np.array(video_frames[0])
+    height, width = first_frame.shape[:2]
+    # Initialize video writer
+    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
+    out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
+    for frame_idx, frame_pil in enumerate(video_frames):
+        frame = np.array(frame_pil)
+        if frame_idx in masks:
+            mask = masks[frame_idx].cpu().numpy()
+            # Handle different mask shapes
+            if mask.ndim == 4:  # (batch, num_objects, H, W)
+                mask = mask[0]  # Take first batch
+            if mask.ndim == 3:  # (num_objects, H, W)
+                # Combine all object masks
+                mask = mask.max(axis=0)
+            # Resize mask to frame size if needed
+            if mask.shape != (height, width):
+                mask = cv2.resize(mask, (width, height), interpolation=cv2.INTER_NEAREST)
+            # Convert to binary mask
+            mask_binary = (mask > 0.5).astype(np.uint8)
+            if remove_background:
+                # Keep only the tracked objects (remove background)
+                if frame.shape[2] == 3:  # RGB
+                    # Create RGBA with alpha channel
+                    result = np.zeros((height, width, 4), dtype=np.uint8)
+                    result[:, :, :3] = frame
+                    result[:, :, 3] = mask_binary * 255
+                    # Convert back to RGB with black background
+                    background = np.zeros_like(frame)
+                    mask_3d = np.repeat(mask_binary[:, :, np.newaxis], 3, axis=2)
+                    result_rgb = frame * mask_3d + background * (1 - mask_3d)
+                    frame = result_rgb.astype(np.uint8)
+            else:
+                # Highlight tracked objects (overlay colored mask)
+                overlay = frame.copy()
+                overlay[mask_binary > 0] = [0, 255, 0]  # Green overlay
+                frame = cv2.addWeighted(frame, 0.7, overlay, 0.3, 0)
+        # Convert RGB to BGR for OpenCV
+        frame_bgr = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
+        out.write(frame_bgr)
+    out.release()
+    return output_path
+def segment_video(
+    video_path: str,
+    annotations: List[Dict[str, Any]],
+    remove_background: bool = True,
+    max_frames: Optional[int] = None
+) -> str:
+    """
+    Main function to segment video based on annotations.
+    Args:
+        video_path: Path to input video
+        annotations: List of annotation dictionaries with format:
+            [
+                {
+                    "frame_idx": 0,
+                    "object_id": 1,
+                    "points": [[x1, y1], [x2, y2], ...],
+                    "labels": [1, 1, ...]  # 1 for foreground, 0 for background
+                },
+                ...
+            ]
+        remove_background: If True, remove background; if False, highlight objects
+        max_frames: Maximum number of frames to process (None = all frames)
+    Returns:
+        Path to output video file
+    """
+    global device, model, processor
+    if model is None:
+        initialize_model()
+    # Load video frames
+    print("Loading video frames...")
+    video_frames, video_info = extract_frames_from_video(video_path, max_frames)
+    fps = video_info.get('fps', 30.0)
+    print(f"Processing {len(video_frames)} frames at {fps} FPS")
+    # Initialize inference session
+    dtype = torch.float16 if device.type == "cuda" else torch.float32
+    inference_session = processor.init_video_session(
+        video=video_frames,
+        inference_device=device,
+        dtype=dtype,
+    )
+    # Add annotations to inference session
+    print("Adding annotations...")
+    for ann in annotations:
+        frame_idx = ann["frame_idx"]
+        obj_id = ann["object_id"]
+        points = ann.get("points", [])
+        labels = ann.get("labels", [1] * len(points))
+        if points:
+            # Format points for processor: [[[[x, y], [x, y], ...]]]
+            formatted_points = [[points]]
+            formatted_labels = [[labels]]
+            processor.add_inputs_to_inference_session(
+                inference_session=inference_session,
+                frame_idx=frame_idx,
+                obj_ids=obj_id,
+                input_points=formatted_points,
+                input_labels=formatted_labels,
+            )
+            # Run inference on this frame
+            outputs = model(
+                inference_session=inference_session,
+                frame_idx=frame_idx,
+            )
+    # Propagate through all frames
+    print("Propagating masks through video...")
+    video_segments = {}
+    for sam2_output in model.propagate_in_video_iterator(inference_session):
+        video_res_masks = processor.post_process_masks(
+            [sam2_output.pred_masks],
+            original_sizes=[[inference_session.video_height, inference_session.video_width]],
+            binarize=False
+        )[0]
+        video_segments[sam2_output.frame_idx] = video_res_masks
+    print(f"Generated masks for {len(video_segments)} frames")
+    # Create output video
+    output_path = tempfile.mktemp(suffix=".mp4")
+    print("Creating output video...")
+    create_output_video(video_frames, video_segments, output_path, fps, remove_background)
+    print(f"Output video saved to: {output_path}")
+    return output_path
+# ============================================================================
+# GRADIO INTERFACE
+# ============================================================================
+def gradio_segment_video(
+    video_file,
+    annotation_json: str,
+    remove_bg: bool = True,
+    max_frames: Optional[int] = None
+):
+    """
+    Gradio wrapper for video segmentation.
+    Args:
+        video_file: Uploaded video file
+        annotation_json: JSON string with annotations
+        remove_bg: Whether to remove background
+        max_frames: Maximum frames to process
+    """
+    try:
+        # Parse annotations
+        annotations = json.loads(annotation_json)
+        if not isinstance(annotations, list):
+            return None, "Error: Annotations must be a list of objects"
+        # Process video
+        output_path = segment_video(
+            video_path=video_file,
+            annotations=annotations,
+            remove_background=remove_bg,
+            max_frames=max_frames
+        )
+        return output_path, "✅ Video processed successfully!"
+    except json.JSONDecodeError as e:
+        return None, f"❌ JSON parsing error: {str(e)}"
+    except Exception as e:
+        return None, f"❌ Error: {str(e)}"
+def gradio_simple_segment(
+    video_file,
+    point_x: int,
+    point_y: int,
+    frame_idx: int = 0,
+    remove_bg: bool = True,
+    max_frames: Optional[int] = 300
+):
+    """
+    Simple Gradio interface with single point annotation.
+    """
+    try:
+        # Create simple annotation
+        annotations = [{
+            "frame_idx": frame_idx,
+            "object_id": 1,
+            "points": [[point_x, point_y]],
+            "labels": [1]
+        }]
+        # Process video
+        output_path = segment_video(
+            video_path=video_file,
+            annotations=annotations,
+            remove_background=remove_bg,
+            max_frames=max_frames
+        )
+        return output_path, f"✅ Video processed! Tracked from point ({point_x}, {point_y}) on frame {frame_idx}"
+    except Exception as e:
+        return None, f"❌ Error: {str(e)}"
+# ============================================================================
+# API ENDPOINTS (via Gradio API)
+# ============================================================================
+def api_segment_video(video_file, annotations_json: str, remove_background: bool = True, max_frames: int = None):
+    """
+    API endpoint for video segmentation.
+    Can be called via gradio_client or direct HTTP requests.
+    """
+    annotations = json.loads(annotations_json)
+    output_path = segment_video(video_file, annotations, remove_background, max_frames)
+    return output_path
+# ============================================================================
+# CREATE GRADIO APP
+# ============================================================================
+def create_interface():
+    """Create the Gradio interface."""
+    # Initialize model
+    initialize_model()
+    # Create tabs for different interfaces
+    with gr.Blocks(title="SAM2 Video Segmentation - Remove Background") as app:
+        gr.Markdown("""
+        # 🎥 SAM2 Video Background Remover
+        Remove backgrounds from videos by tracking objects. Uses Meta's Segment Anything Model 2 (SAM2).
+        **Two ways to use this:**
+        1. **Simple Mode**: Click on an object in the first frame
+        2. **Advanced Mode**: Provide detailed JSON annotations
+        3. **API Mode**: Use the API endpoint programmatically
+        """)
+        with gr.Tabs():
+            # ===================== SIMPLE MODE =====================
+            with gr.Tab("Simple Mode"):
+                gr.Markdown("""
+                ### Quick Start
+                1. Upload a video
+                2. Specify the coordinates of the object you want to track
+                3. Click "Process Video"
+                **Tip**: Open your video in an image viewer to find the x,y coordinates of your target object in the first frame.
+                """)
+                with gr.Row():
+                    with gr.Column():
+                        simple_video_input = gr.Video(label="Upload Video")
+                        with gr.Row():
+                            point_x_input = gr.Number(label="Point X", value=320, precision=0)
+                            point_y_input = gr.Number(label="Point Y", value=240, precision=0)
+                        frame_idx_input = gr.Number(label="Frame Index", value=0, precision=0,
+                                                    info="Which frame to annotate (usually 0 for first frame)")
+                        remove_bg_simple = gr.Checkbox(label="Remove Background", value=True,
+                                                       info="If checked, removes background. If unchecked, highlights object.")
+                        max_frames_simple = gr.Number(label="Max Frames (optional)", value=300, precision=0,
+                                                      info="Limit frames for faster processing. Leave at 0 for all frames.")
+                        simple_btn = gr.Button("🎬 Process Video", variant="primary")
+                    with gr.Column():
+                        simple_output_video = gr.Video(label="Output Video")
+                        simple_status = gr.Textbox(label="Status", lines=3)
+                simple_btn.click(
+                    fn=gradio_simple_segment,
+                    inputs=[simple_video_input, point_x_input, point_y_input, frame_idx_input,
+                           remove_bg_simple, max_frames_simple],
+                    outputs=[simple_output_video, simple_status]
+                )
+                gr.Markdown("""
+                ### Example:
+                For a 640x480 video with a person in the center, try: X=320, Y=240, Frame=0
+                """)
+            # ===================== ADVANCED MODE =====================
+            with gr.Tab("Advanced Mode (JSON)"):
+                gr.Markdown("""
+                ### Advanced Annotations
+                Provide detailed JSON annotations for multiple objects and frames.
+                **JSON Format:**
+                ```json
+                [
+                    {
+                        "frame_idx": 0,
+                        "object_id": 1,
+                        "points": [[x1, y1], [x2, y2]],
+                        "labels": [1, 1]
+                    }
+                ]
+                ```
+                - `frame_idx`: Frame number to annotate
+                - `object_id`: Unique ID for each object (1, 2, 3, ...)
+                - `points`: List of [x, y] coordinates
+                - `labels`: 1 for foreground point, 0 for background point
+                """)
+                with gr.Row():
+                    with gr.Column():
+                        adv_video_input = gr.Video(label="Upload Video")
+                        adv_annotation_input = gr.Textbox(
+                            label="Annotations (JSON)",
+                            lines=10,
+                            value='''[
+    {
+        "frame_idx": 0,
+        "object_id": 1,
+        "points": [[320, 240]],
+        "labels": [1]
+    }
+]''',
+                            placeholder="Enter JSON annotations here..."
+                        )
+                        remove_bg_adv = gr.Checkbox(label="Remove Background", value=True)
+                        max_frames_adv = gr.Number(label="Max Frames (0 = all)", value=0, precision=0)
+                        adv_btn = gr.Button("🎬 Process Video", variant="primary")
+                    with gr.Column():
+                        adv_output_video = gr.Video(label="Output Video")
+                        adv_status = gr.Textbox(label="Status", lines=3)
+                adv_btn.click(
+                    fn=gradio_segment_video,
+                    inputs=[adv_video_input, adv_annotation_input, remove_bg_adv, max_frames_adv],
+                    outputs=[adv_output_video, adv_status]
+                )
+            # ===================== API INFO =====================
+            with gr.Tab("API Documentation"):
+                gr.Markdown("""
+                ## 📡 API Usage
+                This Space exposes an API that you can call programmatically.
+                ### Using Python with `gradio_client`
+                ```python
+                from gradio_client import Client
+                import json
+                # Connect to the Space
+                client = Client("YOUR_USERNAME/YOUR_SPACE_NAME")
+                # Define annotations
+                annotations = [
+                    {
+                        "frame_idx": 0,
+                        "object_id": 1,
+                        "points": [[320, 240]],
+                        "labels": [1]
+                    }
+                ]
+                # Call the API
+                result = client.predict(
+                    video_file="path/to/video.mp4",
+                    annotations_json=json.dumps(annotations),
+                    remove_background=True,
+                    max_frames=300,
+                    api_name="/segment_video_api"
+                )
+                print(f"Output video: {result}")
+                ```
+                ### Using cURL
+                ```bash
+                curl -X POST https://YOUR_USERNAME-YOUR_SPACE_NAME.hf.space/api/predict \\
+                  -H "Content-Type: application/json" \\
+                  -F "data=@video.mp4" \\
+                  -F 'annotations=[{"frame_idx":0,"object_id":1,"points":[[320,240]],"labels":[1]}]'
+                ```
+                ### Parameters
+                - **video_file**: Video file (required)
+                - **annotations_json**: JSON string with annotations (required)
+                - **remove_background**: Boolean (default: true)
+                - **max_frames**: Integer (default: null, processes all frames)
+                ### Response
+                Returns the path to the processed video file.
+                """)
+        # Add API endpoint
+        api_interface = gr.Interface(
+            fn=api_segment_video,
+            inputs=[
+                gr.Video(label="Video File"),
+                gr.Textbox(label="Annotations JSON"),
+                gr.Checkbox(label="Remove Background", value=True),
+                gr.Number(label="Max Frames", value=None, precision=0)
+            ],
+            outputs=gr.Video(label="Output Video"),
+            api_name="segment_video_api",
+            visible=False  # Hidden from UI, only accessible via API
+        )
+    return app
+# ============================================================================
+# LAUNCH
+# ============================================================================
+if __name__ == "__main__":
+    app = create_interface()
+    app.launch(
+        server_name="0.0.0.0",
+        server_port=7860,
+        share=False
+    )

requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+transformers>=4.57.0
+torch>=2.0.0
+gradio>=4.0.0
+opencv-python-headless>=4.8.0
+numpy>=1.24.0
+Pillow>=10.0.0
+accelerate>=0.20.0