oscarzhang's picture
Upload folder using huggingface_hub
23bb099 verified

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: Wearable_TimeSeries_Health_Monitor
emoji: 📟
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 4.44.0
app_file: gradio_app.py
pinned: false

library_name: pytorch pipeline_tag: time-series-forecasting language: - zh - en tags: - anomaly-detection - time-series - wearable - health - lstm - transformer - physiological-monitoring - hrv - heart-rate - real-time - multi-user - personalized - sensor-fusion - healthcare - continuous-monitoring license: apache-2.0 pretty_name: Wearable TimeSeries Health Monitor

Language / 语言: 中文 | English


Wearable_TimeSeries_Health_Monitor

面向可穿戴设备的多用户健康监控方案:一份模型、一个配置,就能为不同用户构建个性化异常检测。模型基于 **Phased LSTM + Temporal Fusion Transformer (TFT)**,并整合自适应基线、因子特征以及单位秒级的数据滑窗能力,适合当作 HuggingFace 模型或企业内部服务快速接入。


🌟 模型应用亮点

能力 说明
即插即用 内置 WearableAnomalyDetector 封装,加载模型即可预测,一次初始化后可持续监控多个用户
配置驱动特征 configs/features_config.json 描述所有特征、缺省值、类别映射,新增/删减血氧、呼吸率等只需改配置
多用户实时服务 FeatureCalculator + 轻量级 data_storage 缓存,实现用户历史管理、基线演化、批量推理
真实数据验证 README 内置“真实数据测试”操作说明,可一键模拟正常/异常用户、基线更新与多天模式检测
自适应基线支持 可扩展 UserDataManager 将个人/分组基线接入推理流程,持续改善个体敏感度

⚡ 核心特点与技术优势

🎯 自适应基线:个人与群体智能融合

模型采用自适应基线策略,根据用户历史数据量动态选择最优基线:

  • 个人基线优先:当用户有足够历史数据(如 ≥7 天)时,使用个人 HRV 均值/标准差作为基线,捕捉个体生理节律差异
  • 群体基线兜底:新用户或数据稀疏时,自动切换到群体统计基线,确保冷启动也能稳定检测
  • 平滑过渡机制:通过加权混合(如 final_mean = α × personal_mean + (1-α) × group_mean)实现从群体到个人的渐进式适应
  • 实时基线更新:推理过程中持续累积用户数据,基线随用户状态演化而动态调整,提升长期监控精度

优势:相比固定阈值或纯群体基线,自适应基线能同时兼顾个性化敏感度(减少误报)和冷启动鲁棒性(新用户可用),特别适合多用户、长周期监控场景。

⏱️ 灵活的时间窗口与周期

  • 5 分钟级粒度:每条数据点代表 5 分钟聚合,支持秒级到小时级的灵活时间尺度
  • 可配置窗口大小:默认 12 点(1 小时),可根据业务需求调整为 6 点(30 分钟)或 24 点(2 小时)
  • 不等间隔容错:Phased LSTM 架构天然处理缺失数据点,即使数据稀疏(如夜间传感器断开)也能稳定推理
  • 多时间尺度特征:同时提取短期波动(RMSSD)、中期趋势(滑动均值)和长期模式(日/周周期),捕捉不同时间尺度的异常信号

优势:适应不同设备采样频率、用户佩戴习惯,无需强制对齐时间戳,降低数据预处理复杂度。

🔄 多通道数据协同作用

模型整合4 大类特征通道,通过因子特征与注意力机制实现跨通道信息融合:

  1. 生理通道(HR、HRV 系列、呼吸率、血氧)

    • 直接反映心血管与呼吸系统状态
    • 因子特征:physiological_mean, physiological_std, physiological_max, physiological_min
  2. 活动通道(步数、距离、能量消耗、加速度、陀螺仪)

    • 捕捉运动强度与身体负荷
    • 因子特征:activity_mean, activity_std
  3. 环境通道(光线、时间周期、数据质量)

    • 提供上下文信息,区分运动性心率升高 vs 静息异常
    • 类别特征:time_period_primary(morning/day/evening/night)
  4. 基线通道(自适应基线均值/标准差、偏差特征)

    • 提供个性化参考基准,计算 hrv_deviation_abs, hrv_z_score 等相对异常指标

协同机制

  • 因子特征聚合:将同类通道的统计量(均值/标准差/最值)作为高层特征,让模型学习通道间的关联模式
  • TFT 注意力:Temporal Fusion Transformer 的变量选择网络自动识别哪些通道在特定时间点最重要
  • 已知未来特征:时间特征(小时、星期、是否周末)帮助模型理解周期性,区分正常波动与异常

优势:多通道协同能显著降低单一指标误报(如运动导致心率升高),提升异常检测的上下文感知能力,特别适合可穿戴设备的多传感器融合场景。


📊 核心指标(短期窗口)

  • F1: 0.2819
  • Precision: 0.1769
  • Recall: 0.6941
  • 最佳阈值: 0.53
  • 窗口定义: 12 条 5 分钟数据(1小时时间窗,预测未来 0.5 小时)

模型偏向召回,适合“异常先提醒、人机协同复核”的场景。可通过阈值/采样策略调节精度与召回。


🚀 快速体验

Hugging Face Space 在线体验

地址:https://huggingface.co/spaces/oscarzhang/Wearable_TimeSeries_Health_Monitor

  • 实时窗口检测:直接选择“正常 / 短期异常 / 长期异常 / 缺失数据”四个预置窗口,查看模型 JSON 输出与格式化 LLM 文本。
  • LLM 输入示例:展示项目训练数据中同款 Markdown(系统提示 + 用户输入),方便复制到其他 LLM 服务验证。
  • PatchTrAD 案例:内置“平台自带预筛”“官方 precheck”两条链路,展示预筛得分、Case JSON、LLM 输入,配合 manifest 可快速扩展新案例。

若要自定义数据,可在本地运行:

python simulate_patchad_case_pipeline.py --mode all \
  --data-file data_storage/users/your_case.jsonl \
  --save-dir demo_patchad_cases --sample-name your_case

生成的案例会直接出现在 Space 的下拉菜单里。

1. 克隆或下载模型仓库

git clone https://huggingface.co/oscarzhang/Wearable_TimeSeries_Health_Monitor
cd Wearable_TimeSeries_Health_Monitor
pip install -r requirements.txt

2. 在业务代码中调用

from wearable_anomaly_detector import WearableAnomalyDetector

detector = WearableAnomalyDetector(
    model_dir="checkpoints/phase2/exp_factor_balanced",
    threshold=0.53,
)

result = detector.predict(data_points, return_score=True, return_details=True)
print(result)

data_points 为 12 条最新的 5 分钟记录;若缺静态特征/设备信息,系统会自动从配置/缓存补齐。

3. 快速体验真实数据模拟

from datetime import datetime, timedelta
from wearable_anomaly_detector import WearableAnomalyDetector

detector = WearableAnomalyDetector("checkpoints/phase2/exp_factor_balanced", device="cpu")

def make_point(ts, hrv, hr):
    return {
        "timestamp": ts.isoformat(),
        "deviceId": "demo_user",
        "features": {
            "hr": hr,
            "hr_resting": 65,
            "hrv_rmssd": hrv,
            "time_period_primary": "day",
            "data_quality": "high",
            "baseline_hrv_mean": 75.0,
            "baseline_hrv_std": 5.0
        },
        "static_features": {
            "age_group": 2,
            "sex": 0,
            "exercise": 1
        }
    }

start = datetime.now() - timedelta(hours=1)
window = [make_point(start + timedelta(minutes=5*i), 75 - i*0.5, 70 + i*0.2) for i in range(12)]
print(detector.detect_realtime(window))

以上脚本会自动构造 12 条 5 分钟数据,完成一次实时检测。可自行调节 HRV、HR 或窗口大小模拟不同场景。


🧪 真实数据测试

以下结果来自 README 中的示例脚本(模拟正常/异常用户、基线更新、多天模式)。全部在 CPU 上完成。

场景 数据概况 结果
实时检测(正常) HRV≈76ms,HR≈68 bpm,12 条数据 异常分数 0.5393,阈值 0.53(轻微触发,模型对边缘异常敏感)
实时检测(异常) HRV≈69ms,HR≈74 bpm,12 条数据 异常分数 0.4764,未超阈值,需结合多天模式进一步观察
模式聚合(7 天) 前 3 天正常,后 4 天逐渐下行 正确识别持续 3 天的异常模式,趋势为 stable
基线存储/更新 初始基线 75±5,记录 30 条 存储成功;新值 70ms 后均值更新为 74.84,记录数 31
完整流程 实时检测 → 基线更新 → LLM 文本 全流程执行成功,生成 114 字符的结构化异常摘要

复制上文的“真实数据模拟”代码,按需调整 HRV/HR、窗口长度或异常强度即可复现同样的流程。


🔧 输入与输出

输入(单个数据点)

{
  "timestamp": "2024-01-01T08:00:00",
  "deviceId": "ab60",            # 可选,缺失时会自动创建匿名 ID
  "features": {
    "hr": 72.0,
    "hrv_rmssd": 30.0,
    "time_period_primary": "morning",
    "data_quality": "high",
    ...
  }
}
  • 每个窗口需 12 条数据(默认 1 小时)
  • 特征是否必填由 configs/features_config.json 控制
  • 缺失值会自动回落到 default 或 category_mapping 定义值

输出

{
  "is_anomaly": True,
  "anomaly_score": 0.5760,
  "threshold": 0.5300,
  "details": {
     "window_size": 12,
     "model_output": 0.5760,
     "prediction_confidence": 0.0460
  }
}

🧱 模型架构与训练

  • 模型骨干:Phased LSTM 处理不等间隔序列 + Temporal Fusion Transformer 聚合时间上下文
  • 异常检测头:增强注意力、多层 MLP、可选对比学习/类型辅助头
  • 特征体系
    • 生理:HR、HRV(RMSSD/SDNN/PNN50…)
    • 活动:步数、距离、能量消耗、加速度、陀螺仪
    • 环境:光线、昼夜标签、数据质量
    • 基线:自适应基线均值/标准差 + 偏差特征
  • 标签来源:问卷高置信度标签 + 自适应基线低置信度标签
  • 训练流程:Stage1/2/3 数据加工 ➜ Phase1 自监督预训练 ➜ Phase2 监督微调 ➜ 阈值/案例校正

📦 仓库结构(部分)

├─ configs/
│   └─ features_config.json     # 特征定义 & 归一化策略
├─ wearable_anomaly_detector.py # 核心封装:加载、预测、批处理
├─ feature_calculator.py        # 配置驱动的特征构建 + 用户历史缓存
└─ checkpoints/phase2/...       # 模型权重 & summary

📚 数据来源与许可证

  • 训练数据基于 “A continuous real-world dataset comprising wearable-based heart rate variability alongside sleep diaries”(Baigutanova et al., Scientific Data, 2025)以及其 Figshare 数据集 doi:10.1038/s41597-025-05801-3 / dataset link
  • 该数据集以 Creative Commons Attribution 4.0 (CC BY 4.0) 许可发布,可自由使用、修改、分发,但必须保留署名并附上许可证链接。
  • 本仓库沿用 CC BY 4.0 对原始数据的要求;若你在此基础上再加工或发布,请继续保留上述署名与许可证说明。
  • 代码/模型可根据需要使用 MIT/Apache 等许可证,但凡涉及数据的部分,仍需遵循 CC BY 4.0。

🤝 贡献与扩展

欢迎:

  1. 新增特征或数据源 ⇒ 更新 features_config.json + 提交 PR
  2. 接入新的用户数据管理/基线策略 ⇒ 扩展 FeatureCalculator 或贡献 UserDataManager
  3. 反馈案例或真实部署经验 ⇒ 提 Issue 或 Discussion

📄 许可证

  • 模型与代码:Apache-2.0。可在保留版权与许可证声明的前提下任意使用/修改/分发。
  • 训练数据:原始可穿戴 HRV 数据集使用 CC BY 4.0,复用时请继续保留作者署名与许可信息。

🔖 引用

@software{Wearable_TimeSeries_Health_Monitor,
  title  = {Wearable\_TimeSeries\_Health\_Monitor},
  author = {oscarzhang},
  year   = {2025},
  url    = {https://huggingface.co/oscarzhang/Wearable_TimeSeries_Health_Monitor}
}

Wearable_TimeSeries_Health_Monitor

A multi-user health monitoring solution for wearable devices: one model, one configuration, enabling personalized anomaly detection for different users. The model is based on Phased LSTM + Temporal Fusion Transformer (TFT), integrating adaptive baselines, factor features, and second-level data sliding window capabilities, suitable for deployment as a HuggingFace model or rapid integration into enterprise services.


🌟 Model Highlights

Capability Description
Plug-and-Play Built-in WearableAnomalyDetector wrapper, load the model and start predicting, supports continuous monitoring of multiple users after a single initialization
Configuration-Driven Features configs/features_config.json defines all features, default values, and category mappings; adding/removing features like blood oxygen or respiratory rate only requires configuration changes
Multi-User Real-Time Service FeatureCalculator + lightweight data_storage cache enables user history management, baseline evolution, and batch inference
Real-World Validation README ships with a “Real Data Tests” section plus sample simulation code so you can mimic normal/abnormal users in minutes
Adaptive Baseline Support Extensible UserDataManager integrates personal/group baselines into the inference pipeline, continuously improving individual sensitivity

⚡ Core Features & Technical Advantages

🎯 Adaptive Baseline: Intelligent Fusion of Personal and Group

The model employs an adaptive baseline strategy that dynamically selects the optimal baseline based on user historical data volume:

  • Personal Baseline Priority: When users have sufficient historical data (e.g., ≥7 days), use personal HRV mean/std as baseline to capture individual physiological rhythm differences
  • Group Baseline Fallback: For new users or sparse data, automatically switch to group statistical baseline, ensuring stable detection even during cold start
  • Smooth Transition Mechanism: Achieve gradual adaptation from group to personal through weighted mixing (e.g., final_mean = α × personal_mean + (1-α) × group_mean)
  • Real-Time Baseline Updates: Continuously accumulate user data during inference, baseline dynamically adjusts as user state evolves, improving long-term monitoring accuracy

Advantage: Compared to fixed thresholds or pure group baselines, adaptive baselines balance personalized sensitivity (reducing false positives) and cold-start robustness (usable for new users), especially suitable for multi-user, long-term monitoring scenarios.

⏱️ Flexible Time Windows & Periods

  • 5-Minute Granularity: Each data point represents 5-minute aggregation, supporting flexible time scales from seconds to hours
  • Configurable Window Size: Default 12 points (1 hour), adjustable to 6 points (30 minutes) or 24 points (2 hours) based on business needs
  • Uneven Interval Tolerance: Phased LSTM architecture naturally handles missing data points, stable inference even with sparse data (e.g., sensor disconnection at night)
  • Multi-Time-Scale Features: Simultaneously extract short-term fluctuations (RMSSD), medium-term trends (rolling mean), and long-term patterns (daily/weekly cycles), capturing anomaly signals at different time scales

Advantage: Adapts to different device sampling frequencies and user wearing habits, no need to force timestamp alignment, reducing data preprocessing complexity.

🔄 Multi-Channel Data Synergy

The model integrates 4 major feature channels, achieving cross-channel information fusion through factor features and attention mechanisms:

  1. Physiological Channel (HR, HRV series, respiratory rate, blood oxygen)

    • Directly reflects cardiovascular and respiratory system status
    • Factor features: physiological_mean, physiological_std, physiological_max, physiological_min
  2. Activity Channel (steps, distance, energy consumption, acceleration, gyroscope)

    • Captures exercise intensity and body load
    • Factor features: activity_mean, activity_std, etc.
  3. Environmental Channel (light, time period, data quality)

    • Provides contextual information, distinguishing exercise-induced heart rate elevation vs. resting anomalies
    • Categorical features: time_period_primary (morning/day/evening/night)
  4. Baseline Channel (adaptive baseline mean/std, deviation features)

    • Provides personalized reference baseline, calculating relative anomaly indicators like hrv_deviation_abs, hrv_z_score

Synergy Mechanism:

  • Factor Feature Aggregation: Use statistical measures (mean/std/max/min) of similar channels as high-level features, enabling the model to learn association patterns between channels
  • TFT Attention: Temporal Fusion Transformer's variable selection network automatically identifies which channels are most important at specific time points
  • Known Future Features: Time features (hour, day of week, is_weekend) help the model understand periodicity, distinguishing normal fluctuations from anomalies

Advantage: Multi-channel synergy significantly reduces single-indicator false positives (e.g., exercise-induced heart rate elevation) and improves context-aware anomaly detection, especially suitable for multi-sensor fusion scenarios in wearable devices.


📊 Core Metrics (Short-Term Window)

  • F1: 0.2819
  • Precision: 0.1769
  • Recall: 0.6941
  • Optimal Threshold: 0.53
  • Window Definition: 12 data points of 5-minute intervals (1-hour time window, predicting 0.5 hours ahead)

The model favors recall, suitable for "anomaly-first alert, human-machine collaborative review" scenarios. Precision and recall can be adjusted through threshold/sampling strategies.


🚀 Quick Start

1. Clone or Download the Model Repository

git clone https://huggingface.co/oscarzhang/Wearable_TimeSeries_Health_Monitor
cd Wearable_TimeSeries_Health_Monitor
pip install -r requirements.txt

2. Run the Official Inference Script

python run_official_inference.py \
  --window-file test_data/example_window.json \
  --model-dir checkpoints/phase2/exp_factor_balanced

脚本会:

  • 读取 test_data/example_window.json(12 条真实格式的窗口数据)
  • 调用 WearableAnomalyDetector.detect_realtime
  • 打印完整 JSON 结果
  • 使用 AnomalyFormatter 输出 LLM 可直接消费的 Markdown 文本

想测试自己的窗口,只需替换 --window-file 路径;该脚本不会注入随机噪声,输出与正式 API 一致。

3. Call in Business Code

from wearable_anomaly_detector import WearableAnomalyDetector

detector = WearableAnomalyDetector(
    model_dir="checkpoints/phase2/exp_factor_balanced",
    threshold=0.53,
)

result = detector.predict(data_points, return_score=True, return_details=True)
print(result)

data_points should be 12 latest 5-minute records; if static features/device information are missing, the system will automatically fill from configuration/cache.

4. Quick Simulation Script(Optional)

python test_quickstart.py

该脚本包含更多演示场景(随机噪声、7 天显著异常、缺失/低质量数据)。日志会先跑一遍示例文件推理,然后输出正常/异常窗口、模式聚合与容错样例。注意:脚本为了观察边界,会临时把阈值调至 0.50,并引入随机扰动,仅用于体验。


🧪 Real Data Tests

The following results were reproduced with the sample code above (normal vs. abnormal users, multi-day trend, baseline update, end-to-end workflow). All tests ran on CPU; the first scenario直接加载 test_data/example_window.json.

Scenario Data Snapshot Outcome
Real-time (sample file) HRV≈72 ms, HR≈71 bpm, 12 points Score ≈0.526 vs. threshold 0.50(演示用阈值)
Real-time (normal) HRV≈76 ms, HR≈68 bpm, 12 points Score 0.5393 vs. threshold 0.53 (marginal trigger)
Real-time (abnormal) HRV≈69 ms, HR≈74 bpm Score 0.4764 < threshold, requires multi-day confirmation
Pattern aggregation 7 days, last 3 days gradually down Detected 3-day continuous anomaly, trend stable
Baseline storage/update Start 75 ± 5, 30 records After new value 70 ms ⇒ mean 74.84, records 31
Missing data tolerance 40% features removed + static info missing Still flags anomaly (score ≈0.50) thanks to fallback defaults
Full workflow Detect → Baseline update → LLM text Completed successfully; 114-char structured summary

Feel free to adapt test_data/example_window.json 或脚本内的模拟逻辑,调整 HRV/HR 曲线、窗口大小或缺失比例,观察输出变化。


Quickstart 脚本默认把阈值临时调至 0.50,以便观测边界场景。实际部署时可根据业务重新设置。

🔧 Input & Output

Input (Single Data Point)

{
  "timestamp": "2024-01-01T08:00:00",
  "deviceId": "ab60",            # Optional, anonymous ID will be created if missing
  "features": {
    "hr": 72.0,
    "hrv_rmssd": 30.0,
    "time_period_primary": "morning",
    "data_quality": "high",
    ...
  }
}
  • Each window requires 12 data points (default 1 hour)
  • Whether features are required is controlled by configs/features_config.json
  • Missing values automatically fall back to default or category_mapping defined values

Output

{
  "is_anomaly": True,
  "anomaly_score": 0.5760,
  "threshold": 0.5300,
  "details": {
     "window_size": 12,
     "model_output": 0.5760,
     "prediction_confidence": 0.0460
  }
}

🧱 Model Architecture & Training

  • Model Backbone: Phased LSTM handles unevenly-spaced sequences + Temporal Fusion Transformer aggregates temporal context
  • Anomaly Detection Head: Enhanced attention, multi-layer MLP, optional contrastive learning/type auxiliary head
  • Feature System:
    • Physiological: HR, HRV (RMSSD/SDNN/PNN50…)
    • Activity: Steps, distance, energy consumption, acceleration, gyroscope
    • Environmental: Light, day/night labels, data quality
    • Baseline: Adaptive baseline mean/std + deviation features
  • Label Source: High-confidence questionnaire labels + low-confidence adaptive baseline labels
  • Training Pipeline: Stage1/2/3 data processing ➜ Phase1 self-supervised pre-training ➜ Phase2 supervised fine-tuning ➜ Threshold/case calibration

📦 Repository Structure (Partial)

├─ configs/
│   └─ features_config.json     # Feature definitions & normalization strategies
├─ wearable_anomaly_detector.py # Core wrapper: loading, prediction, batch processing
├─ feature_calculator.py        # Configuration-driven feature construction + user history cache
└─ checkpoints/phase2/...       # Model weights & summary

🧾 API 文档

  • API_USAGE.md:列出 WearableAnomalyDetectorAnomalyFormatterBaselineStorage 等核心接口的参数、输入输出示例。
  • test_quickstart.py:可直接运行的自检脚本,便于验证接口行为。

📚 Data Source & License

  • Training data is based on "A continuous real-world dataset comprising wearable-based heart rate variability alongside sleep diaries" (Baigutanova et al., Scientific Data, 2025) and its Figshare dataset doi:10.1038/s41597-025-05801-3 / dataset link.
  • This dataset is released under Creative Commons Attribution 4.0 (CC BY 4.0) license, allowing free use, modification, and distribution, but attribution and license link must be retained.
  • This repository follows CC BY 4.0 requirements for original data; if you further process or publish based on this, please continue to retain the above attribution and license information.
  • Code/models can use MIT/Apache or other licenses as needed, but any parts involving data must still follow CC BY 4.0.

🤝 Contributions & Extensions

Welcome to:

  1. Add new features or data sources ⇒ Update features_config.json + submit PR
  2. Integrate new user data management/baseline strategies ⇒ Extend FeatureCalculator or contribute UserDataManager
  3. Provide feedback on cases or real deployment experiences ⇒ Open Issues or Discussions

📄 License

  • Model & Code: Apache-2.0. Can be used/modified/distributed freely while retaining copyright and license notices.
  • Training Data: Original wearable HRV dataset uses CC BY 4.0; please continue to retain author attribution and license information when reusing.

🔖 Citation

@software{Wearable_TimeSeries_Health_Monitor,
  title  = {Wearable\_TimeSeries\_Health\_Monitor},
  author = {oscarzhang},
  year   = {2025},
  url    = {https://huggingface.co/oscarzhang/Wearable_TimeSeries_Health_Monitor}
}