| | --- |
| | license: cc-by-4.0 |
| | library_name: pytorch |
| | tags: |
| | - computer-vision |
| | - object-tracking |
| | - spiking-neural-networks |
| | - visual-streaming-perception |
| | - energy-efficient |
| | - cvpr-2025 |
| | pipeline_tag: object-detection |
| | --- |
| | |
| | # ViStream: Law-of-Charge-Conservation Inspired Spiking Neural Network for Visual Streaming Perception |
| |
|
| | **ViStream** is a novel energy-efficient framework for Visual Streaming Perception (VSP) that leverages Spiking Neural Networks (SNNs) with Law of Charge Conservation (LoCC) properties. |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | - **Developed by:** Kang You, Ziling Wei, Jing Yan, Boning Zhang, Qinghai Guo, Yaoyu Zhang, Zhezhi He |
| | - **Model type:** Spiking Neural Network for Visual Streaming Perception |
| | - **Language(s):** PyTorch implementation |
| | - **License:** CC-BY-4.0 |
| | - **Paper:** [CVPR 2025](https://openaccess.thecvf.com/content/CVPR2025/papers/You_VISTREAM_Improving_Computation_Efficiency_of_Visual_Streaming_Perception_via_Law-of-Charge-Conservation_CVPR_2025_paper.pdf) |
| | - **Repository:** [GitHub](https://github.com/Intelligent-Computing-Research-Group/ViStream) |
| |
|
| | ### Model Architecture |
| |
|
| | ViStream introduces two key innovations: |
| | 1. **Law of Charge Conservation (LoCC)** property in ST-BIF neurons |
| | 2. **Differential Encoding (DiffEncode)** scheme for temporal optimization |
| |
|
| | The framework achieves significant computational reduction while maintaining accuracy equivalent to ANN counterparts. |
| |
|
| | ## Uses |
| |
|
| | ### Direct Use |
| |
|
| | ViStream can be directly used for: |
| | - **Multiple Object Tracking (MOT)** |
| | - **Single Object Tracking (SOT)** |
| | - **Video Object Segmentation (VOS)** |
| | - **Multiple Object Tracking and Segmentation (MOTS)** |
| | - **Pose Tracking** |
| |
|
| | ### Downstream Use |
| |
|
| | The model can be fine-tuned for various visual streaming perception tasks in: |
| | - Autonomous driving |
| | - UAV navigation |
| | - AR/VR applications |
| | - Real-time surveillance |
| |
|
| | ## Bias, Risks, and Limitations |
| |
|
| | ### Limitations |
| | - Requires specific hardware optimization for maximum energy benefits |
| | - Performance may vary with different frame rates |
| | - Limited to visual perception tasks |
| |
|
| | ### Recommendations |
| | - Test thoroughly on target hardware before deployment |
| | - Consider computational constraints of edge devices |
| | - Validate performance on domain-specific datasets |
| |
|
| | ## How to Get Started with the Model |
| |
|
| | ```python |
| | from huggingface_hub import hf_hub_download |
| | import torch |
| | |
| | # Download the checkpoint |
| | checkpoint_path = hf_hub_download( |
| | repo_id="AndyBlocker/ViStream", |
| | filename="checkpoint-90.pth" |
| | ) |
| | |
| | # Load the model (requires ViStream implementation) |
| | checkpoint = torch.load(checkpoint_path, map_location='cpu') |
| | ``` |
| |
|
| | For complete usage examples, see the [GitHub repository](https://github.com/Intelligent-Computing-Research-Group/ViStream). |
| |
|
| | ## Training Details |
| |
|
| | ### Training Data |
| |
|
| | The model was trained on multiple datasets for various visual streaming perception tasks including object tracking, video object segmentation, and pose tracking. |
| |
|
| | ### Training Procedure |
| |
|
| | **Training Details:** |
| | - Framework: PyTorch |
| | - Optimization: Energy-efficient SNN training with Law of Charge Conservation |
| | - Architecture: ResNet-based backbone with spike quantization layers |
| |
|
| | ## Evaluation |
| |
|
| | The model demonstrates competitive performance across multiple visual streaming perception tasks while achieving significant energy efficiency improvements compared to traditional ANN-based approaches. Detailed evaluation results are available in the [CVPR 2025 paper](https://openaccess.thecvf.com/content/CVPR2025/papers/You_VISTREAM_Improving_Computation_Efficiency_of_Visual_Streaming_Perception_via_Law-of-Charge-Conservation_CVPR_2025_paper.pdf). |
| |
|
| | ## Model Card Authors |
| |
|
| | Kang You, Ziling Wei, Jing Yan, Boning Zhang, Qinghai Guo, Yaoyu Zhang, Zhezhi He |
| |
|
| | ## Model Card Contact |
| |
|
| | For questions about this model, please open an issue in the [GitHub repository](https://github.com/Intelligent-Computing-Research-Group/ViStream). |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @inproceedings{you2025vistream, |
| | title={VISTREAM: Improving Computation Efficiency of Visual Streaming Perception via Law-of-Charge-Conservation Inspired Spiking Neural Network}, |
| | author={You, Kang and Wei, Ziling and Yan, Jing and Zhang, Boning and Guo, Qinghai and Zhang, Yaoyu and He, Zhezhi}, |
| | booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference}, |
| | pages={8796--8805}, |
| | year={2025} |
| | } |
| | ``` |