| --- |
| license: other |
| pipeline_tag: image-classification |
| library_name: pytorch |
| tags: |
| - multimodal |
| - aquaculture |
| - shrimp |
| - disease-detection |
| - computer-vision |
| - time-series |
| - sensor-fusion |
| - uncertainty |
| --- |
| # ShrimpFusionNet for Real-Time Shrimp Disease Detection Using Trust-Aware Multimodal Fusion |
|
|
| MMSD25 is a real-world multimodal shrimp disease dataset introduced in this paper. |
| This repository provides a public sanitized reference subset of MMSD25 together with the benchmark protocol to support reproducibility and further research. |
|
|
| ## 1. Dataset Overview |
|
|
| MMSD25 is designed for shrimp disease detection under real aquaculture conditions, where data are noisy, heterogeneous, asynchronous, and partially missing. |
|
|
| The dataset integrates three modalities: |
|
|
| - RGB shrimp images captured directly in ponds |
| - Farmer-written textual reports describing shrimp health and pond observations |
| - Environmental sensor streams, including: |
| - Temperature |
| - pH |
| - Dissolved oxygen |
| - Turbidity |
| - Salinity |
|
|
| Data were collected from 8 shrimp ponds in the Mekong Delta, Vietnam, under diverse environmental and operational conditions. |
|
|
| ## 2. Public Release Scope |
|
|
| ### What is publicly released |
|
|
| This repository and the associated Hugging Face page provide: |
|
|
| - A **sanitized reference subset** of MMSD25 |
| - The **full benchmark protocol**, including: |
| - Data preprocessing procedures |
| The public subset is intended to demonstrate data structure. |
|
|
| ### What is NOT publicly released |
|
|
| - The **full MMSD25 dataset is NOT publicly available** |
| - Full raw data are restricted due to data governance and farm partner agreements |
|
|
| Access to the full dataset may be considered for **non-commercial academic research only**, subject to a controlled-access agreement. |
|
|
| ## 3. Dataset Composition (Full Dataset Description) |
|
|
| The full MMSD25 dataset (described in the paper) consists of: |
| - 3, 625 RGB shrimp images |
| - 12,404 farmer-generated text descriptions |
| - Synchronized multi-channel sensor time series |
| - 5 disease classes: |
| - Healthy |
| - WSSV |
| - AHPND |
| - EHP |
| - Bacterial necrosis |
| Each sample is verified by aquaculture experts, with inter-annotator agreement reaching Cohen’s κ = 0.86. |
|
|
| ## 4. Train / Validation / Test Split |
|
|
| The benchmark uses a **region-based (pond-level) split** to evaluate generalization: |
|
|
| - Training set: 70% of ponds |
| - Validation set: 10% of ponds |
| - Test set: 20% of ponds (unseen ponds) |
|
|
| This setup supports zero-shot domain evaluation under real deployment conditions. |
|
|
| ## 5. Hugging Face Repository |
|
|
| The public reference subset is hosted on Hugging Face: |
|
|
| https://huggingface.co/ducdatit2002/ShrimpFusionNet |
|
|
| ## 6. Intended Use |
|
|
| MMSD25 is intended for research on: |
|
|
| - Multimodal learning (image + text + sensor) |
| - Trust-aware and uncertainty-aware fusion |
| - Robust learning under noisy and missing modalities |
| - Edge AI and IoT-based aquaculture systems |
|
|
| The dataset is **not intended for commercial use**. |
|
|
| ## 7. Limitations |
|
|
| - The public subset is not statistically representative of the full dataset |
| - Some environmental and operational variability present in the full dataset is not exposed |
| - Results obtained on the public subset should not be interpreted as full benchmark performance |
|
|
| ## 8. Citation |
|
|
| If you use MMSD25 or the benchmark protocol, please cite: |
|
|
| ```bibtex |
| @article{shrimpfusionnet2025, |
| title={ShrimpFusionNet for Real-Time Shrimp Disease Detection Using Trust-Aware Multimodal Fusion}, |
| author={Le, Tan Duy and Huynh, Kha Tu and Pham, Duc Dat and Nguyen, Hong Quan and Nguyen, Minh Tu}, |
| year={2025} |
| } |
| ```` |
|
|
| ## 9. License |
|
|
| The public subset of MMSD25 is released for **non-commercial research use only**. |
|
|
| ## 11. Contact |
|
|
| For questions or controlled access requests to the full dataset: |
|
|
| * Duc Dat Pham |
| * Email: [ducdatit2002@gmail.com](mailto:ducdatit2002@gmail.com) |