Spaces:

oscarzhang
/

Wearable_TimeSeries_Health_Monitor

Running

App Files Files Community

Wearable_TimeSeries_Health_Monitor / README.md

oscarzhang

Upload folder using huggingface_hub

23bb099 verified 17 days ago

preview code

raw

history blame contribute delete

26 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

metadata

title: Wearable_TimeSeries_Health_Monitor
emoji: 📟
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 4.44.0
app_file: gradio_app.py
pinned: false

library_name: pytorch pipeline_tag: time-series-forecasting language: - zh - en tags: - anomaly-detection - time-series - wearable - health - lstm - transformer - physiological-monitoring - hrv - heart-rate - real-time - multi-user - personalized - sensor-fusion - healthcare - continuous-monitoring license: apache-2.0 pretty_name: Wearable TimeSeries Health Monitor

Language / 语言: 中文 | English

Wearable_TimeSeries_Health_Monitor

面向可穿戴设备的多用户健康监控方案：一份模型、一个配置，就能为不同用户构建个性化异常检测。模型基于 **Phased LSTM + Temporal Fusion Transformer (TFT)**，并整合自适应基线、因子特征以及单位秒级的数据滑窗能力，适合当作 HuggingFace 模型或企业内部服务快速接入。

🌟 模型应用亮点

能力	说明
即插即用	内置 `WearableAnomalyDetector` 封装，加载模型即可预测，一次初始化后可持续监控多个用户
配置驱动特征	`configs/features_config.json` 描述所有特征、缺省值、类别映射，新增/删减血氧、呼吸率等只需改配置
多用户实时服务	`FeatureCalculator` + 轻量级 `data_storage` 缓存，实现用户历史管理、基线演化、批量推理
真实数据验证	README 内置“真实数据测试”操作说明，可一键模拟正常/异常用户、基线更新与多天模式检测
自适应基线支持	可扩展 `UserDataManager` 将个人/分组基线接入推理流程，持续改善个体敏感度

⚡ 核心特点与技术优势

🎯 自适应基线：个人与群体智能融合

模型采用自适应基线策略，根据用户历史数据量动态选择最优基线：

个人基线优先：当用户有足够历史数据（如 ≥7 天）时，使用个人 HRV 均值/标准差作为基线，捕捉个体生理节律差异
群体基线兜底：新用户或数据稀疏时，自动切换到群体统计基线，确保冷启动也能稳定检测
平滑过渡机制：通过加权混合（如 final_mean = α × personal_mean + (1-α) × group_mean）实现从群体到个人的渐进式适应
实时基线更新：推理过程中持续累积用户数据，基线随用户状态演化而动态调整，提升长期监控精度

优势：相比固定阈值或纯群体基线，自适应基线能同时兼顾个性化敏感度（减少误报）和冷启动鲁棒性（新用户可用），特别适合多用户、长周期监控场景。

⏱️ 灵活的时间窗口与周期

5 分钟级粒度：每条数据点代表 5 分钟聚合，支持秒级到小时级的灵活时间尺度
可配置窗口大小：默认 12 点（1 小时），可根据业务需求调整为 6 点（30 分钟）或 24 点（2 小时）
不等间隔容错：Phased LSTM 架构天然处理缺失数据点，即使数据稀疏（如夜间传感器断开）也能稳定推理
多时间尺度特征：同时提取短期波动（RMSSD）、中期趋势（滑动均值）和长期模式（日/周周期），捕捉不同时间尺度的异常信号

优势：适应不同设备采样频率、用户佩戴习惯，无需强制对齐时间戳，降低数据预处理复杂度。

🔄 多通道数据协同作用

模型整合4 大类特征通道，通过因子特征与注意力机制实现跨通道信息融合：

生理通道（HR、HRV 系列、呼吸率、血氧）
- 直接反映心血管与呼吸系统状态
- 因子特征：physiological_mean, physiological_std, physiological_max, physiological_min
活动通道（步数、距离、能量消耗、加速度、陀螺仪）
- 捕捉运动强度与身体负荷
- 因子特征：activity_mean, activity_std 等
环境通道（光线、时间周期、数据质量）
- 提供上下文信息，区分运动性心率升高 vs 静息异常
- 类别特征：time_period_primary（morning/day/evening/night）
基线通道（自适应基线均值/标准差、偏差特征）
- 提供个性化参考基准，计算 hrv_deviation_abs, hrv_z_score 等相对异常指标

协同机制：

因子特征聚合：将同类通道的统计量（均值/标准差/最值）作为高层特征，让模型学习通道间的关联模式
TFT 注意力：Temporal Fusion Transformer 的变量选择网络自动识别哪些通道在特定时间点最重要
已知未来特征：时间特征（小时、星期、是否周末）帮助模型理解周期性，区分正常波动与异常

优势：多通道协同能显著降低单一指标误报（如运动导致心率升高），提升异常检测的上下文感知能力，特别适合可穿戴设备的多传感器融合场景。

📊 核心指标（短期窗口）

F1: 0.2819
Precision: 0.1769
Recall: 0.6941
最佳阈值: 0.53
窗口定义: 12 条 5 分钟数据（1小时时间窗，预测未来 0.5 小时）

模型偏向召回，适合“异常先提醒、人机协同复核”的场景。可通过阈值/采样策略调节精度与召回。

🚀 快速体验

Hugging Face Space 在线体验

地址：https://huggingface.co/spaces/oscarzhang/Wearable_TimeSeries_Health_Monitor

实时窗口检测：直接选择“正常 / 短期异常 / 长期异常 / 缺失数据”四个预置窗口，查看模型 JSON 输出与格式化 LLM 文本。
LLM 输入示例：展示项目训练数据中同款 Markdown（系统提示 + 用户输入），方便复制到其他 LLM 服务验证。
PatchTrAD 案例：内置“平台自带预筛”“官方 precheck”两条链路，展示预筛得分、Case JSON、LLM 输入，配合 manifest 可快速扩展新案例。

若要自定义数据，可在本地运行：

python simulate_patchad_case_pipeline.py --mode all \
  --data-file data_storage/users/your_case.jsonl \
  --save-dir demo_patchad_cases --sample-name your_case

生成的案例会直接出现在 Space 的下拉菜单里。

1. 克隆或下载模型仓库

git clone https://huggingface.co/oscarzhang/Wearable_TimeSeries_Health_Monitor
cd Wearable_TimeSeries_Health_Monitor
pip install -r requirements.txt

2. 在业务代码中调用

from wearable_anomaly_detector import WearableAnomalyDetector

detector = WearableAnomalyDetector(
    model_dir="checkpoints/phase2/exp_factor_balanced",
    threshold=0.53,
)

result = detector.predict(data_points, return_score=True, return_details=True)
print(result)

data_points 为 12 条最新的 5 分钟记录；若缺静态特征/设备信息，系统会自动从配置/缓存补齐。

3. 快速体验真实数据模拟

from datetime import datetime, timedelta
from wearable_anomaly_detector import WearableAnomalyDetector

detector = WearableAnomalyDetector("checkpoints/phase2/exp_factor_balanced", device="cpu")

def make_point(ts, hrv, hr):
    return {
        "timestamp": ts.isoformat(),
        "deviceId": "demo_user",
        "features": {
            "hr": hr,
            "hr_resting": 65,
            "hrv_rmssd": hrv,
            "time_period_primary": "day",
            "data_quality": "high",
            "baseline_hrv_mean": 75.0,
            "baseline_hrv_std": 5.0
        },
        "static_features": {
            "age_group": 2,
            "sex": 0,
            "exercise": 1
        }
    }

start = datetime.now() - timedelta(hours=1)
window = [make_point(start + timedelta(minutes=5*i), 75 - i*0.5, 70 + i*0.2) for i in range(12)]
print(detector.detect_realtime(window))

以上脚本会自动构造 12 条 5 分钟数据，完成一次实时检测。可自行调节 HRV、HR 或窗口大小模拟不同场景。

🧪 真实数据测试

以下结果来自 README 中的示例脚本（模拟正常/异常用户、基线更新、多天模式）。全部在 CPU 上完成。

场景	数据概况	结果
实时检测（正常）	HRV≈76ms，HR≈68 bpm，12 条数据	异常分数 0.5393，阈值 0.53（轻微触发，模型对边缘异常敏感）
实时检测（异常）	HRV≈69ms，HR≈74 bpm，12 条数据	异常分数 0.4764，未超阈值，需结合多天模式进一步观察
模式聚合（7 天）	前 3 天正常，后 4 天逐渐下行	正确识别持续 3 天的异常模式，趋势为 stable
基线存储/更新	初始基线 75±5，记录 30 条	存储成功；新值 70ms 后均值更新为 74.84，记录数 31
完整流程	实时检测 → 基线更新 → LLM 文本	全流程执行成功，生成 114 字符的结构化异常摘要

复制上文的“真实数据模拟”代码，按需调整 HRV/HR、窗口长度或异常强度即可复现同样的流程。

🔧 输入与输出

输入（单个数据点）

{
  "timestamp": "2024-01-01T08:00:00",
  "deviceId": "ab60",            # 可选，缺失时会自动创建匿名 ID
  "features": {
    "hr": 72.0,
    "hrv_rmssd": 30.0,
    "time_period_primary": "morning",
    "data_quality": "high",
    ...
  }
}

每个窗口需 12 条数据（默认 1 小时）
特征是否必填由 configs/features_config.json 控制
缺失值会自动回落到 default 或 category_mapping 定义值

输出

{
  "is_anomaly": True,
  "anomaly_score": 0.5760,
  "threshold": 0.5300,
  "details": {
     "window_size": 12,
     "model_output": 0.5760,
     "prediction_confidence": 0.0460
  }
}

🧱 模型架构与训练

模型骨干：Phased LSTM 处理不等间隔序列 + Temporal Fusion Transformer 聚合时间上下文
异常检测头：增强注意力、多层 MLP、可选对比学习/类型辅助头
特征体系：
- 生理：HR、HRV（RMSSD/SDNN/PNN50…）
- 活动：步数、距离、能量消耗、加速度、陀螺仪
- 环境：光线、昼夜标签、数据质量
- 基线：自适应基线均值/标准差 + 偏差特征
标签来源：问卷高置信度标签 + 自适应基线低置信度标签
训练流程：Stage1/2/3 数据加工 ➜ Phase1 自监督预训练 ➜ Phase2 监督微调 ➜ 阈值/案例校正

📦 仓库结构（部分）

├─ configs/
│   └─ features_config.json     # 特征定义 & 归一化策略
├─ wearable_anomaly_detector.py # 核心封装：加载、预测、批处理
├─ feature_calculator.py        # 配置驱动的特征构建 + 用户历史缓存
└─ checkpoints/phase2/...       # 模型权重 & summary

📚 数据来源与许可证

训练数据基于 “A continuous real-world dataset comprising wearable-based heart rate variability alongside sleep diaries”（Baigutanova et al., Scientific Data, 2025）以及其 Figshare 数据集 doi:10.1038/s41597-025-05801-3 / dataset link。
该数据集以 Creative Commons Attribution 4.0 (CC BY 4.0) 许可发布，可自由使用、修改、分发，但必须保留署名并附上许可证链接。
本仓库沿用 CC BY 4.0 对原始数据的要求；若你在此基础上再加工或发布，请继续保留上述署名与许可证说明。
代码/模型可根据需要使用 MIT/Apache 等许可证，但凡涉及数据的部分，仍需遵循 CC BY 4.0。

🤝 贡献与扩展

欢迎：

新增特征或数据源 ⇒ 更新 features_config.json + 提交 PR
接入新的用户数据管理/基线策略 ⇒ 扩展 FeatureCalculator 或贡献 UserDataManager
反馈案例或真实部署经验 ⇒ 提 Issue 或 Discussion

📄 许可证

模型与代码：Apache-2.0。可在保留版权与许可证声明的前提下任意使用/修改/分发。
训练数据：原始可穿戴 HRV 数据集使用 CC BY 4.0，复用时请继续保留作者署名与许可信息。

🔖 引用

@software{Wearable_TimeSeries_Health_Monitor,
  title  = {Wearable\_TimeSeries\_Health\_Monitor},
  author = {oscarzhang},
  year   = {2025},
  url    = {https://huggingface.co/oscarzhang/Wearable_TimeSeries_Health_Monitor}
}

Wearable_TimeSeries_Health_Monitor

A multi-user health monitoring solution for wearable devices: one model, one configuration, enabling personalized anomaly detection for different users. The model is based on Phased LSTM + Temporal Fusion Transformer (TFT), integrating adaptive baselines, factor features, and second-level data sliding window capabilities, suitable for deployment as a HuggingFace model or rapid integration into enterprise services.

🌟 Model Highlights

Capability	Description
Plug-and-Play	Built-in `WearableAnomalyDetector` wrapper, load the model and start predicting, supports continuous monitoring of multiple users after a single initialization
Configuration-Driven Features	`configs/features_config.json` defines all features, default values, and category mappings; adding/removing features like blood oxygen or respiratory rate only requires configuration changes
Multi-User Real-Time Service	`FeatureCalculator` + lightweight `data_storage` cache enables user history management, baseline evolution, and batch inference
Real-World Validation	README ships with a “Real Data Tests” section plus sample simulation code so you can mimic normal/abnormal users in minutes
Adaptive Baseline Support	Extensible `UserDataManager` integrates personal/group baselines into the inference pipeline, continuously improving individual sensitivity

⚡ Core Features & Technical Advantages

🎯 Adaptive Baseline: Intelligent Fusion of Personal and Group

The model employs an adaptive baseline strategy that dynamically selects the optimal baseline based on user historical data volume:

Personal Baseline Priority: When users have sufficient historical data (e.g., ≥7 days), use personal HRV mean/std as baseline to capture individual physiological rhythm differences
Group Baseline Fallback: For new users or sparse data, automatically switch to group statistical baseline, ensuring stable detection even during cold start
Smooth Transition Mechanism: Achieve gradual adaptation from group to personal through weighted mixing (e.g., final_mean = α × personal_mean + (1-α) × group_mean)
Real-Time Baseline Updates: Continuously accumulate user data during inference, baseline dynamically adjusts as user state evolves, improving long-term monitoring accuracy

Advantage: Compared to fixed thresholds or pure group baselines, adaptive baselines balance personalized sensitivity (reducing false positives) and cold-start robustness (usable for new users), especially suitable for multi-user, long-term monitoring scenarios.

⏱️ Flexible Time Windows & Periods

5-Minute Granularity: Each data point represents 5-minute aggregation, supporting flexible time scales from seconds to hours
Configurable Window Size: Default 12 points (1 hour), adjustable to 6 points (30 minutes) or 24 points (2 hours) based on business needs
Uneven Interval Tolerance: Phased LSTM architecture naturally handles missing data points, stable inference even with sparse data (e.g., sensor disconnection at night)
Multi-Time-Scale Features: Simultaneously extract short-term fluctuations (RMSSD), medium-term trends (rolling mean), and long-term patterns (daily/weekly cycles), capturing anomaly signals at different time scales

Advantage: Adapts to different device sampling frequencies and user wearing habits, no need to force timestamp alignment, reducing data preprocessing complexity.

🔄 Multi-Channel Data Synergy

The model integrates 4 major feature channels, achieving cross-channel information fusion through factor features and attention mechanisms:

Physiological Channel (HR, HRV series, respiratory rate, blood oxygen)
- Directly reflects cardiovascular and respiratory system status
- Factor features: physiological_mean, physiological_std, physiological_max, physiological_min
Activity Channel (steps, distance, energy consumption, acceleration, gyroscope)
- Captures exercise intensity and body load
- Factor features: activity_mean, activity_std, etc.
Environmental Channel (light, time period, data quality)
- Provides contextual information, distinguishing exercise-induced heart rate elevation vs. resting anomalies
- Categorical features: time_period_primary (morning/day/evening/night)
Baseline Channel (adaptive baseline mean/std, deviation features)
- Provides personalized reference baseline, calculating relative anomaly indicators like hrv_deviation_abs, hrv_z_score

Synergy Mechanism:

Factor Feature Aggregation: Use statistical measures (mean/std/max/min) of similar channels as high-level features, enabling the model to learn association patterns between channels
TFT Attention: Temporal Fusion Transformer's variable selection network automatically identifies which channels are most important at specific time points
Known Future Features: Time features (hour, day of week, is_weekend) help the model understand periodicity, distinguishing normal fluctuations from anomalies

Advantage: Multi-channel synergy significantly reduces single-indicator false positives (e.g., exercise-induced heart rate elevation) and improves context-aware anomaly detection, especially suitable for multi-sensor fusion scenarios in wearable devices.

📊 Core Metrics (Short-Term Window)

F1: 0.2819
Precision: 0.1769
Recall: 0.6941
Optimal Threshold: 0.53
Window Definition: 12 data points of 5-minute intervals (1-hour time window, predicting 0.5 hours ahead)

The model favors recall, suitable for "anomaly-first alert, human-machine collaborative review" scenarios. Precision and recall can be adjusted through threshold/sampling strategies.

🚀 Quick Start

1. Clone or Download the Model Repository

git clone https://huggingface.co/oscarzhang/Wearable_TimeSeries_Health_Monitor
cd Wearable_TimeSeries_Health_Monitor
pip install -r requirements.txt

2. Run the Official Inference Script

python run_official_inference.py \
  --window-file test_data/example_window.json \
  --model-dir checkpoints/phase2/exp_factor_balanced

脚本会：

读取 test_data/example_window.json（12 条真实格式的窗口数据）
调用 WearableAnomalyDetector.detect_realtime
打印完整 JSON 结果
使用 AnomalyFormatter 输出 LLM 可直接消费的 Markdown 文本

想测试自己的窗口，只需替换 --window-file 路径；该脚本不会注入随机噪声，输出与正式 API 一致。

3. Call in Business Code

from wearable_anomaly_detector import WearableAnomalyDetector

detector = WearableAnomalyDetector(
    model_dir="checkpoints/phase2/exp_factor_balanced",
    threshold=0.53,
)

result = detector.predict(data_points, return_score=True, return_details=True)
print(result)

data_points should be 12 latest 5-minute records; if static features/device information are missing, the system will automatically fill from configuration/cache.

4. Quick Simulation Script（Optional）

python test_quickstart.py

该脚本包含更多演示场景（随机噪声、7 天显著异常、缺失/低质量数据）。日志会先跑一遍示例文件推理，然后输出正常/异常窗口、模式聚合与容错样例。注意：脚本为了观察边界，会临时把阈值调至 0.50，并引入随机扰动，仅用于体验。

🧪 Real Data Tests

The following results were reproduced with the sample code above (normal vs. abnormal users, multi-day trend, baseline update, end-to-end workflow). All tests ran on CPU; the first scenario直接加载 test_data/example_window.json.

Scenario	Data Snapshot	Outcome
Real-time (sample file)	HRV≈72 ms, HR≈71 bpm, 12 points	Score ≈0.526 vs. threshold 0.50（演示用阈值）
Real-time (normal)	HRV≈76 ms, HR≈68 bpm, 12 points	Score 0.5393 vs. threshold 0.53 (marginal trigger)
Real-time (abnormal)	HRV≈69 ms, HR≈74 bpm	Score 0.4764 < threshold, requires multi-day confirmation
Pattern aggregation	7 days, last 3 days gradually down	Detected 3-day continuous anomaly, trend `stable`
Baseline storage/update	Start 75 ± 5, 30 records	After new value 70 ms ⇒ mean 74.84, records 31
Missing data tolerance	40% features removed + static info missing	Still flags anomaly (score ≈0.50) thanks to fallback defaults
Full workflow	Detect → Baseline update → LLM text	Completed successfully; 114-char structured summary

Feel free to adapt test_data/example_window.json 或脚本内的模拟逻辑，调整 HRV/HR 曲线、窗口大小或缺失比例，观察输出变化。

Quickstart 脚本默认把阈值临时调至 0.50，以便观测边界场景。实际部署时可根据业务重新设置。

🔧 Input & Output

Input (Single Data Point)

{
  "timestamp": "2024-01-01T08:00:00",
  "deviceId": "ab60",            # Optional, anonymous ID will be created if missing
  "features": {
    "hr": 72.0,
    "hrv_rmssd": 30.0,
    "time_period_primary": "morning",
    "data_quality": "high",
    ...
  }
}

Each window requires 12 data points (default 1 hour)
Whether features are required is controlled by configs/features_config.json
Missing values automatically fall back to default or category_mapping defined values

Output

{
  "is_anomaly": True,
  "anomaly_score": 0.5760,
  "threshold": 0.5300,
  "details": {
     "window_size": 12,
     "model_output": 0.5760,
     "prediction_confidence": 0.0460
  }
}

🧱 Model Architecture & Training

Model Backbone: Phased LSTM handles unevenly-spaced sequences + Temporal Fusion Transformer aggregates temporal context
Anomaly Detection Head: Enhanced attention, multi-layer MLP, optional contrastive learning/type auxiliary head
Feature System:
- Physiological: HR, HRV (RMSSD/SDNN/PNN50…)
- Activity: Steps, distance, energy consumption, acceleration, gyroscope
- Environmental: Light, day/night labels, data quality
- Baseline: Adaptive baseline mean/std + deviation features
Label Source: High-confidence questionnaire labels + low-confidence adaptive baseline labels
Training Pipeline: Stage1/2/3 data processing ➜ Phase1 self-supervised pre-training ➜ Phase2 supervised fine-tuning ➜ Threshold/case calibration

📦 Repository Structure (Partial)

├─ configs/
│   └─ features_config.json     # Feature definitions & normalization strategies
├─ wearable_anomaly_detector.py # Core wrapper: loading, prediction, batch processing
├─ feature_calculator.py        # Configuration-driven feature construction + user history cache
└─ checkpoints/phase2/...       # Model weights & summary

🧾 API 文档

API_USAGE.md：列出 WearableAnomalyDetector、AnomalyFormatter、BaselineStorage 等核心接口的参数、输入输出示例。
test_quickstart.py：可直接运行的自检脚本，便于验证接口行为。

📚 Data Source & License

Training data is based on "A continuous real-world dataset comprising wearable-based heart rate variability alongside sleep diaries" (Baigutanova et al., Scientific Data, 2025) and its Figshare dataset doi:10.1038/s41597-025-05801-3 / dataset link.
This dataset is released under Creative Commons Attribution 4.0 (CC BY 4.0) license, allowing free use, modification, and distribution, but attribution and license link must be retained.
This repository follows CC BY 4.0 requirements for original data; if you further process or publish based on this, please continue to retain the above attribution and license information.
Code/models can use MIT/Apache or other licenses as needed, but any parts involving data must still follow CC BY 4.0.

🤝 Contributions & Extensions

Welcome to:

Add new features or data sources ⇒ Update features_config.json + submit PR
Integrate new user data management/baseline strategies ⇒ Extend FeatureCalculator or contribute UserDataManager
Provide feedback on cases or real deployment experiences ⇒ Open Issues or Discussions

📄 License

Model & Code: Apache-2.0. Can be used/modified/distributed freely while retaining copyright and license notices.
Training Data: Original wearable HRV dataset uses CC BY 4.0; please continue to retain author attribution and license information when reusing.

🔖 Citation

@software{Wearable_TimeSeries_Health_Monitor,
  title  = {Wearable\_TimeSeries\_Health\_Monitor},
  author = {oscarzhang},
  year   = {2025},
  url    = {https://huggingface.co/oscarzhang/Wearable_TimeSeries_Health_Monitor}
}