Spaces:
Sleeping
Sleeping
devjas1
commited on
Commit
·
7602d7b
1
Parent(s):
e31bab9
(CHORE)[cleanup]: Remove deprecated files and documentation related to, including, validation protocols, and preprocessing scripts for FTIR data. This cleanup streamlines the repository by eliminating outdated resources and focuses on current implementations.
Browse files- AGENT_PROCESS.md +0 -284
- POLYMEROS_GUIDE.md +0 -238
- VALIDATION_PROTOCOLS.md +0 -423
- audit-access-check.txt +0 -1
- preprocess_ftir_legacy.py +0 -85
- train_ftir_model.py +0 -139
- train_ftir_model_cv.py +0 -201
- train_model.py +0 -135
AGENT_PROCESS.md
DELETED
|
@@ -1,284 +0,0 @@
|
|
| 1 |
-
# POLYMEROS Agent Development Process
|
| 2 |
-
|
| 3 |
-
## Mission Context
|
| 4 |
-
|
| 5 |
-
This document chronicles the decision-making process, alternatives explored, and insights gained during the implementation of the POLYMEROS Development Protocol for the polymer-aging-ml repository. It serves as both a research methodology record and a guide for future AI-assisted scientific software development.
|
| 6 |
-
|
| 7 |
-
## Initial Analysis and Strategic Planning
|
| 8 |
-
|
| 9 |
-
### Repository Assessment (Phase 1)
|
| 10 |
-
|
| 11 |
-
**Timestamp**: Initial exploration
|
| 12 |
-
**Objective**: Understand current capabilities and identify transformation opportunities
|
| 13 |
-
|
| 14 |
-
**Current State Discovered**:
|
| 15 |
-
|
| 16 |
-
- Streamlit-based web application for polymer classification
|
| 17 |
-
- Three CNN architectures: Figure2CNN, ResNet1D, ResNet18Vision
|
| 18 |
-
- Raman spectroscopy focus with binary classification (Stable vs Weathered)
|
| 19 |
-
- Modular architecture with separation of concerns
|
| 20 |
-
- Existing comprehensive analysis report with detailed roadmap
|
| 21 |
-
|
| 22 |
-
**Key Insights**:
|
| 23 |
-
|
| 24 |
-
1. **Foundation Strength**: Solid modular structure provides good base for expansion
|
| 25 |
-
2. **Scope Limitation**: Current binary classification is too narrow for research-grade applications
|
| 26 |
-
3. **Educational Gap**: No interactive learning components despite educational goals
|
| 27 |
-
4. **Collaboration Absence**: No features for team-based research or peer validation
|
| 28 |
-
|
| 29 |
-
**Decision Point 1**: Transform vs. Rebuild
|
| 30 |
-
**Choice**: Transform existing architecture rather than rebuild from scratch
|
| 31 |
-
**Rationale**: Preserve working components while adding sophisticated capabilities
|
| 32 |
-
|
| 33 |
-
### Framework Design Philosophy
|
| 34 |
-
|
| 35 |
-
**Core Assumption Challenge**:
|
| 36 |
-
Original assumption: "Software supports users"
|
| 37 |
-
POLYMEROS principle: "Platform enhances human decision-making and discovery"
|
| 38 |
-
|
| 39 |
-
**This shift drives several key design decisions**:
|
| 40 |
-
|
| 41 |
-
1. **Explanatory AI**: Every prediction must include reasoning chains
|
| 42 |
-
2. **Adaptive Learning**: System improves through user interactions
|
| 43 |
-
3. **Hypothesis Generation**: AI suggests research directions, not just classifications
|
| 44 |
-
4. **Community Integration**: Built-in collaboration and validation tools
|
| 45 |
-
|
| 46 |
-
## Implementation Strategy and Decision Rationale
|
| 47 |
-
|
| 48 |
-
### Phase 1: Foundation Building
|
| 49 |
-
|
| 50 |
-
#### Enhanced Data Management System
|
| 51 |
-
|
| 52 |
-
**Challenge**: Simple file-based input limiting research potential
|
| 53 |
-
**Solution**: Contextual knowledge networks with metadata preservation
|
| 54 |
-
|
| 55 |
-
**Implementation Decision**: Extend existing `utils/preprocessing.py` rather than replace
|
| 56 |
-
**Alternative Considered**: Complete rewrite of data pipeline
|
| 57 |
-
**Rationale**: Preserve proven preprocessing while adding advanced capabilities
|
| 58 |
-
|
| 59 |
-
**Key Enhancements Planned**:
|
| 60 |
-
|
| 61 |
-
```python
|
| 62 |
-
class EnhancedDataManager:
|
| 63 |
-
- MetadataExtractor: Automatic experimental condition tracking
|
| 64 |
-
- ProvenanceTracker: Complete data lineage recording
|
| 65 |
-
- KnowledgeGraph: Relationship mapping between samples
|
| 66 |
-
- QualityAssessment: Automated data validation
|
| 67 |
-
```
|
| 68 |
-
|
| 69 |
-
#### Transparent AI Core
|
| 70 |
-
|
| 71 |
-
**Challenge**: Black-box predictions unsuitable for scientific research
|
| 72 |
-
**Solution**: Multi-layered explainability with uncertainty quantification
|
| 73 |
-
|
| 74 |
-
**Design Decision**: Wrapper approach around existing models
|
| 75 |
-
**Alternative**: Replace existing models entirely
|
| 76 |
-
**Rationale**: Maintain compatibility while adding sophisticated capabilities
|
| 77 |
-
|
| 78 |
-
#### Educational Framework Foundation
|
| 79 |
-
|
| 80 |
-
**Challenge**: No learning progression or skill assessment
|
| 81 |
-
**Solution**: Adaptive tutorial system with competency tracking
|
| 82 |
-
|
| 83 |
-
### Phase 2: Advanced Integration
|
| 84 |
-
|
| 85 |
-
#### Multi-Modal Spectroscopy Engine
|
| 86 |
-
|
| 87 |
-
**Current Limitation**: Raman-only analysis
|
| 88 |
-
**Enhancement Target**: FTIR + Raman fusion with attention mechanisms
|
| 89 |
-
|
| 90 |
-
**Technical Decision**: Attention-based fusion over simple concatenation
|
| 91 |
-
**Justification**: Attention allows model to focus on relevant spectral regions automatically
|
| 92 |
-
|
| 93 |
-
#### Physics-Informed AI
|
| 94 |
-
|
| 95 |
-
**Innovation**: Incorporate physical laws into neural network training
|
| 96 |
-
**Implementation**: Physics-Informed Neural Networks (PINNs) for constraint enforcement
|
| 97 |
-
|
| 98 |
-
### Phase 3: Evolutionary Capabilities
|
| 99 |
-
|
| 100 |
-
#### Self-Improving Systems
|
| 101 |
-
|
| 102 |
-
**Concept**: Platform that enhances itself through usage patterns
|
| 103 |
-
**Implementation**: Meta-learning on user interactions and feedback
|
| 104 |
-
|
| 105 |
-
#### Community-Driven Development
|
| 106 |
-
|
| 107 |
-
**Vision**: Research community collaboratively improving the platform
|
| 108 |
-
**Tools**: Peer review integration, collaborative model development
|
| 109 |
-
|
| 110 |
-
## Technical Implementation Insights
|
| 111 |
-
|
| 112 |
-
### Challenge 1: Model Transparency
|
| 113 |
-
|
| 114 |
-
**Problem**: Existing CNN models provide predictions without explanations
|
| 115 |
-
**Solution**: Integrated SHAP analysis with domain-specific interpretation
|
| 116 |
-
|
| 117 |
-
**Code Pattern**:
|
| 118 |
-
|
| 119 |
-
```python
|
| 120 |
-
class TransparentPredictor:
|
| 121 |
-
def predict_with_explanation(self, spectrum):
|
| 122 |
-
prediction = self.model(spectrum)
|
| 123 |
-
explanation = self.explainer.explain(spectrum)
|
| 124 |
-
confidence = self.uncertainty_estimator(spectrum)
|
| 125 |
-
return {
|
| 126 |
-
'prediction': prediction,
|
| 127 |
-
'explanation': explanation,
|
| 128 |
-
'confidence': confidence,
|
| 129 |
-
'reasoning_chain': self.generate_reasoning(explanation)
|
| 130 |
-
}
|
| 131 |
-
```
|
| 132 |
-
|
| 133 |
-
### Challenge 2: Educational Progression
|
| 134 |
-
|
| 135 |
-
**Problem**: How to assess user knowledge and adapt accordingly
|
| 136 |
-
**Solution**: Bayesian knowledge tracing with competency mapping
|
| 137 |
-
|
| 138 |
-
**Insight**: Traditional linear tutorials fail for diverse backgrounds
|
| 139 |
-
**Innovation**: Adaptive pathways based on demonstrated understanding
|
| 140 |
-
|
| 141 |
-
### Challenge 3: Community Validation
|
| 142 |
-
|
| 143 |
-
**Problem**: Ensuring scientific rigor in collaborative environment
|
| 144 |
-
**Solution**: Weighted consensus with expertise tracking
|
| 145 |
-
|
| 146 |
-
**Design Pattern**:
|
| 147 |
-
|
| 148 |
-
```python
|
| 149 |
-
class CommunityValidator:
|
| 150 |
-
def validate_claim(self, claim, user_submissions):
|
| 151 |
-
expertise_weights = self.calculate_expertise(user_submissions)
|
| 152 |
-
consensus_score = self.weighted_consensus(claim, expertise_weights)
|
| 153 |
-
confidence = self.uncertainty_estimation(consensus_score)
|
| 154 |
-
return ValidationResult(consensus_score, confidence, evidence_trail)
|
| 155 |
-
```
|
| 156 |
-
|
| 157 |
-
## Lessons Learned and Pattern Recognition
|
| 158 |
-
|
| 159 |
-
### Pattern 1: Incremental Sophistication
|
| 160 |
-
|
| 161 |
-
**Observation**: Users prefer gradual capability introduction over complete overhaul
|
| 162 |
-
**Application**: Progressive disclosure of advanced features
|
| 163 |
-
**Implementation**: Feature flags and user-controlled complexity levels
|
| 164 |
-
|
| 165 |
-
### Pattern 2: Explanation Hierarchy
|
| 166 |
-
|
| 167 |
-
**Discovery**: Different users need different levels of explanation detail
|
| 168 |
-
**Solution**: Layered explanations from high-level to molecular detail
|
| 169 |
-
**Code Structure**: Hierarchical explanation system with drill-down capability
|
| 170 |
-
|
| 171 |
-
### Pattern 3: Community Dynamics
|
| 172 |
-
|
| 173 |
-
**Insight**: Scientific collaboration requires trust and reputation systems
|
| 174 |
-
**Implementation**: Transparent expertise tracking with contribution history
|
| 175 |
-
**Balance**: Encouraging participation while maintaining quality standards
|
| 176 |
-
|
| 177 |
-
## Novel AI Methodology Observations
|
| 178 |
-
|
| 179 |
-
### Emergent Behavior in Scientific AI
|
| 180 |
-
|
| 181 |
-
**Observation**: AI systems trained on scientific data exhibit different patterns than general ML
|
| 182 |
-
**Specific Findings**:
|
| 183 |
-
|
| 184 |
-
1. **Uncertainty Awareness**: Scientific AI must communicate confidence more precisely
|
| 185 |
-
2. **Explanation Requirements**: Scientific users demand mechanistic understanding, not just statistical correlations
|
| 186 |
-
3. **Collaboration Patterns**: Scientific AI benefits from ensemble human-AI reasoning
|
| 187 |
-
|
| 188 |
-
### Transfer Learning in Domain Science
|
| 189 |
-
|
| 190 |
-
**Discovery**: Standard transfer learning approaches often fail in scientific domains
|
| 191 |
-
**Reason**: Distribution shifts in scientific data are more complex than natural images
|
| 192 |
-
**Solution**: Physics-informed transfer learning with domain adaptation
|
| 193 |
-
|
| 194 |
-
### Interactive Learning Dynamics
|
| 195 |
-
|
| 196 |
-
**Finding**: Users learn AI system capabilities while AI learns user preferences
|
| 197 |
-
**Implication**: Co-evolution of human and AI capabilities
|
| 198 |
-
**Design Response**: Adaptive interfaces that grow with user expertise
|
| 199 |
-
|
| 200 |
-
## Validation Methodology
|
| 201 |
-
|
| 202 |
-
### Scientific Validation Approach
|
| 203 |
-
|
| 204 |
-
1. **Benchmark Testing**: Against established polymer classification datasets
|
| 205 |
-
2. **Expert Review**: Validation of AI-generated hypotheses by domain experts
|
| 206 |
-
3. **Reproducibility Testing**: Independent replication of results across institutions
|
| 207 |
-
4. **Long-term Studies**: Tracking research outcomes using the platform
|
| 208 |
-
|
| 209 |
-
### Educational Effectiveness Metrics
|
| 210 |
-
|
| 211 |
-
1. **Learning Velocity**: Time to demonstrate competency on standard tasks
|
| 212 |
-
2. **Knowledge Retention**: Long-term recall testing
|
| 213 |
-
3. **Transfer Capability**: Application of learned concepts to novel problems
|
| 214 |
-
4. **Engagement Sustainability**: Continued platform usage over time
|
| 215 |
-
|
| 216 |
-
### Community Impact Assessment
|
| 217 |
-
|
| 218 |
-
1. **Collaboration Frequency**: Inter-institutional project initiation rates
|
| 219 |
-
2. **Knowledge Sharing**: Community contribution quality and quantity
|
| 220 |
-
3. **Innovation Metrics**: Novel research directions discovered through platform use
|
| 221 |
-
4. **Adoption Patterns**: Spread to related scientific domains
|
| 222 |
-
|
| 223 |
-
## Future Research Directions
|
| 224 |
-
|
| 225 |
-
### AI-Assisted Scientific Discovery
|
| 226 |
-
|
| 227 |
-
**Question**: Can AI systems generate novel scientific hypotheses that lead to discoveries?
|
| 228 |
-
**Approach**: Hypothesis generation algorithms with experimental validation tracking
|
| 229 |
-
**Success Metric**: Number of AI-suggested experiments leading to published findings
|
| 230 |
-
|
| 231 |
-
### Adaptive Scientific Interfaces
|
| 232 |
-
|
| 233 |
-
**Question**: How should scientific software interfaces evolve with user expertise?
|
| 234 |
-
**Investigation**: User interface adaptation algorithms based on competency assessment
|
| 235 |
-
**Measurement**: Task completion efficiency and user satisfaction across expertise levels
|
| 236 |
-
|
| 237 |
-
### Community-Driven Model Development
|
| 238 |
-
|
| 239 |
-
**Question**: Can research communities collaboratively improve AI models?
|
| 240 |
-
**Framework**: Distributed model training with contribution attribution
|
| 241 |
-
**Validation**: Model performance improvement rates and community engagement levels
|
| 242 |
-
|
| 243 |
-
## Reflection on AI Development Process
|
| 244 |
-
|
| 245 |
-
### Meta-Learning Insights
|
| 246 |
-
|
| 247 |
-
**Observation**: The process of developing AI for science reveals patterns applicable to AI development generally
|
| 248 |
-
**Key Insight**: Domain expertise integration is more critical than algorithm sophistication
|
| 249 |
-
**Application**: Prioritize domain expert involvement over purely technical optimization
|
| 250 |
-
|
| 251 |
-
### Human-AI Collaboration Patterns
|
| 252 |
-
|
| 253 |
-
**Discovery**: Most effective scientific AI systems augment rather than replace human reasoning
|
| 254 |
-
**Implementation**: Design for human-AI symbiosis, not automation
|
| 255 |
-
**Evidence**: Better outcomes when AI provides evidence and humans make decisions
|
| 256 |
-
|
| 257 |
-
### Sustainability Considerations
|
| 258 |
-
|
| 259 |
-
**Challenge**: Ensuring long-term platform viability and community engagement
|
| 260 |
-
**Strategy**: Design for community ownership and contribution
|
| 261 |
-
**Approach**: Open development with stakeholder involvement from inception
|
| 262 |
-
|
| 263 |
-
## Conclusion
|
| 264 |
-
|
| 265 |
-
The POLYMEROS development process demonstrates that successful scientific AI requires deep integration of domain knowledge, community needs, and technical capabilities. The key to success lies not in maximizing any single metric, but in creating adaptive systems that enhance human scientific reasoning.
|
| 266 |
-
|
| 267 |
-
**Primary Insights**:
|
| 268 |
-
|
| 269 |
-
1. **Transparency Over Performance**: Scientific users prioritize understanding over accuracy
|
| 270 |
-
2. **Community Over Technology**: Long-term success depends on user community engagement
|
| 271 |
-
3. **Evolution Over Perfection**: Adaptive systems that improve through use outperform static optimal solutions
|
| 272 |
-
|
| 273 |
-
**Next Steps**:
|
| 274 |
-
|
| 275 |
-
1. Implement core framework components identified in this analysis
|
| 276 |
-
2. Begin user testing with domain experts to validate design decisions
|
| 277 |
-
3. Establish community feedback loops for continuous improvement
|
| 278 |
-
4. Document emergent behaviors and unexpected use patterns
|
| 279 |
-
|
| 280 |
-
This process record serves as both a methodology guide and a research artifact, contributing to the broader understanding of AI development for scientific applications.
|
| 281 |
-
|
| 282 |
-
---
|
| 283 |
-
|
| 284 |
-
_This document will be updated throughout the development process to capture new insights, pattern recognition, and methodology refinements._
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
POLYMEROS_GUIDE.md
DELETED
|
@@ -1,238 +0,0 @@
|
|
| 1 |
-
# POLYMEROS Development Framework Guide
|
| 2 |
-
|
| 3 |
-
## Executive Summary
|
| 4 |
-
|
| 5 |
-
The POLYMEROS (Polymer Research Operating System) framework represents a transformative approach to polymer science research, integrating AI-driven analysis, advanced materials data management, and interactive educational tools into a cohesive, adaptive system. This guide outlines the technical architecture, implementation strategy, and evolutionary roadmap for building a next-generation polymer research platform.
|
| 6 |
-
|
| 7 |
-
## Framework Overview
|
| 8 |
-
|
| 9 |
-
### Core Philosophy
|
| 10 |
-
|
| 11 |
-
POLYMEROS operates on the principle that scientific research tools should be:
|
| 12 |
-
|
| 13 |
-
- **Adaptive**: Self-improving through use and feedback
|
| 14 |
-
- **Transparent**: Providing clear explanations for all predictions and recommendations
|
| 15 |
-
- **Educational**: Facilitating knowledge transfer from novice to expert levels
|
| 16 |
-
- **Collaborative**: Enabling seamless teamwork across disciplines and institutions
|
| 17 |
-
|
| 18 |
-
### Three-Pillar Architecture
|
| 19 |
-
|
| 20 |
-
#### 1. AI for Scientific Analysis
|
| 21 |
-
|
| 22 |
-
- **Reasoning-Focused Models**: Beyond prediction to hypothesis generation and testing
|
| 23 |
-
- **Transparent Decision Pathways**: Every prediction includes confidence intervals and reasoning chains
|
| 24 |
-
- **Multi-Modal Integration**: Seamless fusion of Raman, FTIR, and emerging spectroscopy techniques
|
| 25 |
-
- **Uncertainty Quantification**: Bayesian approaches for robust confidence estimation
|
| 26 |
-
|
| 27 |
-
#### 2. Materials Science Data Handling
|
| 28 |
-
|
| 29 |
-
- **Contextual Knowledge Networks**: Data structures that preserve scientific context and relationships
|
| 30 |
-
- **Intelligent Metadata Management**: Automatic extraction and tracking of experimental conditions
|
| 31 |
-
- **Synthetic Data Generation**: Physics-informed augmentation for limited dataset scenarios
|
| 32 |
-
- **Version Control for Science**: Complete provenance tracking for reproducibility
|
| 33 |
-
|
| 34 |
-
#### 3. Educational Tools for Knowledge Building
|
| 35 |
-
|
| 36 |
-
- **Interactive Exploration**: Guided discovery rather than passive information delivery
|
| 37 |
-
- **Adaptive Learning Paths**: Personalized progression based on user expertise and interests
|
| 38 |
-
- **Virtual Laboratory**: Hands-on experimentation in simulated environments
|
| 39 |
-
- **Collaborative Learning**: Peer-to-peer knowledge sharing and validation
|
| 40 |
-
|
| 41 |
-
## Technical Design
|
| 42 |
-
|
| 43 |
-
### Data Architecture
|
| 44 |
-
|
| 45 |
-
```python
|
| 46 |
-
class PolymerosDataEngine:
|
| 47 |
-
"""
|
| 48 |
-
Adaptive data management system that treats data as structured knowledge networks
|
| 49 |
-
rather than simple input files.
|
| 50 |
-
"""
|
| 51 |
-
- SpectroscopyManager: Multi-modal spectral data handling with metadata
|
| 52 |
-
- KnowledgeGraph: Relationship mapping between materials, conditions, and properties
|
| 53 |
-
- ProvenanceTracker: Complete audit trail for scientific reproducibility
|
| 54 |
-
- SyntheticGenerator: Physics-informed data augmentation
|
| 55 |
-
```
|
| 56 |
-
|
| 57 |
-
### AI Reasoning Engine
|
| 58 |
-
|
| 59 |
-
```python
|
| 60 |
-
class PolymerosAI:
|
| 61 |
-
"""
|
| 62 |
-
Transparent AI system that provides explanations alongside predictions.
|
| 63 |
-
"""
|
| 64 |
-
- HypothesisGenerator: Automated scientific hypothesis creation from patterns
|
| 65 |
-
- ExplanationEngine: SHAP-based feature importance with domain context
|
| 66 |
-
- UncertaintyEstimator: Bayesian confidence intervals for all predictions
|
| 67 |
-
- BiasDetector: Automated identification of potential systematic errors
|
| 68 |
-
```
|
| 69 |
-
|
| 70 |
-
### Educational Framework
|
| 71 |
-
|
| 72 |
-
```python
|
| 73 |
-
class PolymerosEducation:
|
| 74 |
-
"""
|
| 75 |
-
Interactive learning system that adapts to user expertise and learning goals.
|
| 76 |
-
"""
|
| 77 |
-
- KnowledgeAssessment: Dynamic evaluation of user understanding
|
| 78 |
-
- LearningPathOptimizer: Personalized curriculum generation
|
| 79 |
-
- VirtualLab: Simulated experimentation environment
|
| 80 |
-
- CollaborationHub: Peer learning and expert mentorship tools
|
| 81 |
-
```
|
| 82 |
-
|
| 83 |
-
## Implementation Timeline
|
| 84 |
-
|
| 85 |
-
### Phase 1: Foundation (Months 1-3)
|
| 86 |
-
|
| 87 |
-
**Core Infrastructure Development**
|
| 88 |
-
|
| 89 |
-
1. **Enhanced Data Pipeline**
|
| 90 |
-
|
| 91 |
-
- Implement contextual metadata extraction
|
| 92 |
-
- Build knowledge graph foundations
|
| 93 |
-
- Create provenance tracking system
|
| 94 |
-
- Develop physics-informed synthetic data generator
|
| 95 |
-
|
| 96 |
-
2. **Transparent AI Core**
|
| 97 |
-
|
| 98 |
-
- Integrate SHAP explainability
|
| 99 |
-
- Implement Bayesian uncertainty quantification
|
| 100 |
-
- Build hypothesis generation algorithms
|
| 101 |
-
- Create bias detection framework
|
| 102 |
-
|
| 103 |
-
3. **Educational Foundation**
|
| 104 |
-
- Design adaptive assessment system
|
| 105 |
-
- Build interactive tutorial framework
|
| 106 |
-
- Create virtual laboratory environment
|
| 107 |
-
- Implement collaborative tools
|
| 108 |
-
|
| 109 |
-
### Phase 2: Integration (Months 4-6)
|
| 110 |
-
|
| 111 |
-
**System Convergence and Enhancement**
|
| 112 |
-
|
| 113 |
-
1. **Multi-Modal Spectroscopy**
|
| 114 |
-
|
| 115 |
-
- FTIR integration with attention-based fusion
|
| 116 |
-
- Advanced preprocessing with automated parameter optimization
|
| 117 |
-
- Cross-modal validation and consistency checking
|
| 118 |
-
|
| 119 |
-
2. **Advanced AI Capabilities**
|
| 120 |
-
|
| 121 |
-
- Transformer-based architectures for spectral analysis
|
| 122 |
-
- Physics-Informed Neural Networks (PINNs) integration
|
| 123 |
-
- Transfer learning for cross-domain applications
|
| 124 |
-
|
| 125 |
-
3. **Research Tools**
|
| 126 |
-
- Automated literature integration
|
| 127 |
-
- Citation tracking and methodology validation
|
| 128 |
-
- Collaborative research project management
|
| 129 |
-
|
| 130 |
-
### Phase 3: Evolution (Months 7-12)
|
| 131 |
-
|
| 132 |
-
**Self-Improving and Community-Driven Development**
|
| 133 |
-
|
| 134 |
-
1. **Adaptive Systems**
|
| 135 |
-
|
| 136 |
-
- Machine learning on user interactions for system improvement
|
| 137 |
-
- Automated hyperparameter optimization
|
| 138 |
-
- Dynamic model selection based on data characteristics
|
| 139 |
-
|
| 140 |
-
2. **Community Features**
|
| 141 |
-
|
| 142 |
-
- Peer review integration with AI assistance
|
| 143 |
-
- Community-driven dataset expansion
|
| 144 |
-
- Collaborative model development tools
|
| 145 |
-
|
| 146 |
-
3. **Research Acceleration**
|
| 147 |
-
- Automated research direction suggestions
|
| 148 |
-
- Cross-disciplinary connection identification
|
| 149 |
-
- Novel material property prediction
|
| 150 |
-
|
| 151 |
-
## Expected Outcomes
|
| 152 |
-
|
| 153 |
-
### Quantitative Targets (Year 1)
|
| 154 |
-
|
| 155 |
-
- **Publications**: 50+ peer-reviewed papers using the platform
|
| 156 |
-
- **Validated Discoveries**: 5+ independently confirmed novel findings
|
| 157 |
-
- **User Engagement**: 2x faster concept mastery for beginners
|
| 158 |
-
- **Research Efficiency**: 30% reduction in time-to-discovery
|
| 159 |
-
|
| 160 |
-
### Qualitative Indicators
|
| 161 |
-
|
| 162 |
-
- **Novel Question Types**: Users asking questions not previously considered
|
| 163 |
-
- **AI-Human Collaboration**: AI suggestions matching expert-level insights
|
| 164 |
-
- **Cross-Disciplinary Adoption**: Platform use beyond polymer science
|
| 165 |
-
- **Educational Impact**: Students contributing original research within months
|
| 166 |
-
|
| 167 |
-
## Success Metrics and Validation
|
| 168 |
-
|
| 169 |
-
### Scientific Validation
|
| 170 |
-
|
| 171 |
-
- Benchmark against state-of-the-art polymer classification systems
|
| 172 |
-
- Cross-validation with independent experimental datasets
|
| 173 |
-
- Peer review of AI-generated hypotheses
|
| 174 |
-
- Reproducibility testing across different research groups
|
| 175 |
-
|
| 176 |
-
### Educational Effectiveness
|
| 177 |
-
|
| 178 |
-
- Learning outcome assessments compared to traditional methods
|
| 179 |
-
- User satisfaction and engagement metrics
|
| 180 |
-
- Time-to-competency measurements
|
| 181 |
-
- Knowledge retention studies
|
| 182 |
-
|
| 183 |
-
### Platform Evolution
|
| 184 |
-
|
| 185 |
-
- Community contribution rates
|
| 186 |
-
- Feature adoption and usage patterns
|
| 187 |
-
- System performance improvements over time
|
| 188 |
-
- Novel use cases discovered by users
|
| 189 |
-
|
| 190 |
-
## Risk Mitigation
|
| 191 |
-
|
| 192 |
-
### Technical Risks
|
| 193 |
-
|
| 194 |
-
- **Model Reliability**: Extensive validation protocols and uncertainty quantification
|
| 195 |
-
- **Data Quality**: Automated quality assessment and provenance tracking
|
| 196 |
-
- **System Complexity**: Modular architecture with clear interfaces
|
| 197 |
-
|
| 198 |
-
### Scientific Risks
|
| 199 |
-
|
| 200 |
-
- **Reproducibility**: Complete provenance tracking and version control
|
| 201 |
-
- **Bias Introduction**: Automated bias detection and diverse training data
|
| 202 |
-
- **Overfitting**: Cross-validation and independent test sets
|
| 203 |
-
|
| 204 |
-
### Adoption Risks
|
| 205 |
-
|
| 206 |
-
- **Learning Curve**: Progressive disclosure and adaptive tutorials
|
| 207 |
-
- **Integration Challenges**: API-first design and standard data formats
|
| 208 |
-
- **Community Resistance**: Transparent development and user involvement
|
| 209 |
-
|
| 210 |
-
## Future Directions
|
| 211 |
-
|
| 212 |
-
### Emerging Technologies Integration
|
| 213 |
-
|
| 214 |
-
- **Quantum Computing**: Quantum-enhanced molecular simulation
|
| 215 |
-
- **Edge Computing**: Real-time analysis on portable spectrometers
|
| 216 |
-
- **AR/VR**: Immersive molecular visualization and manipulation
|
| 217 |
-
|
| 218 |
-
### Expanding Domains
|
| 219 |
-
|
| 220 |
-
- **Materials Beyond Polymers**: Metals, ceramics, composites
|
| 221 |
-
- **Process Optimization**: Manufacturing and recycling workflows
|
| 222 |
-
- **Environmental Monitoring**: Pollution detection and remediation
|
| 223 |
-
|
| 224 |
-
### Advanced AI Capabilities
|
| 225 |
-
|
| 226 |
-
- **Causal Inference**: Understanding cause-effect relationships in material properties
|
| 227 |
-
- **Few-Shot Learning**: Rapid adaptation to new material classes
|
| 228 |
-
- **Autonomous Experimentation**: AI-designed and executed experiments
|
| 229 |
-
|
| 230 |
-
## Conclusion
|
| 231 |
-
|
| 232 |
-
The POLYMEROS framework represents a paradigm shift from traditional research tools to intelligent, adaptive systems that enhance human scientific capabilities. By integrating advanced AI, sophisticated data management, and interactive education, POLYMEROS aims to accelerate scientific discovery while democratizing access to cutting-edge research tools.
|
| 233 |
-
|
| 234 |
-
The success of POLYMEROS will be measured not just by its technical capabilities, but by its impact on the scientific community's ability to understand, predict, and design new materials that address global challenges in sustainability, energy, and health.
|
| 235 |
-
|
| 236 |
-
---
|
| 237 |
-
|
| 238 |
-
_This document serves as a living guide that will evolve with the platform's development and user feedback. Regular updates will incorporate lessons learned, technological advances, and community contributions._
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
VALIDATION_PROTOCOLS.md
DELETED
|
@@ -1,423 +0,0 @@
|
|
| 1 |
-
# POLYMEROS Validation Protocols
|
| 2 |
-
|
| 3 |
-
## Overview
|
| 4 |
-
|
| 5 |
-
This document establishes comprehensive validation protocols for the POLYMEROS platform, ensuring scientific rigor, educational effectiveness, and system reliability. The validation framework operates on multiple levels: technical performance, scientific accuracy, educational impact, and community adoption.
|
| 6 |
-
|
| 7 |
-
## Scientific Validation Framework
|
| 8 |
-
|
| 9 |
-
### 1. Model Accuracy and Reliability
|
| 10 |
-
|
| 11 |
-
#### Benchmark Testing Protocol
|
| 12 |
-
|
| 13 |
-
**Objective**: Establish platform performance against established datasets and state-of-the-art methods
|
| 14 |
-
|
| 15 |
-
**Implementation**:
|
| 16 |
-
|
| 17 |
-
1. **Standard Dataset Validation**
|
| 18 |
-
|
| 19 |
-
- NIST Polymer Database cross-validation
|
| 20 |
-
- FTIR-Plastics Database compatibility testing
|
| 21 |
-
- Independent laboratory dataset validation
|
| 22 |
-
- Cross-instrument validation (multiple Raman/FTIR systems)
|
| 23 |
-
|
| 24 |
-
2. **Performance Metrics**
|
| 25 |
-
|
| 26 |
-
```python
|
| 27 |
-
validation_metrics = {
|
| 28 |
-
"accuracy": ">95% on standard datasets",
|
| 29 |
-
"precision": ">90% per polymer class",
|
| 30 |
-
"recall": ">90% per polymer class",
|
| 31 |
-
"f1_score": ">90% overall",
|
| 32 |
-
"uncertainty_calibration": "Brier score <0.1"
|
| 33 |
-
}
|
| 34 |
-
```
|
| 35 |
-
|
| 36 |
-
3. **Robustness Testing**
|
| 37 |
-
- Noise resilience (SNR degradation testing)
|
| 38 |
-
- Baseline drift tolerance
|
| 39 |
-
- Instrumental variation effects
|
| 40 |
-
- Sample preparation differences
|
| 41 |
-
|
| 42 |
-
#### Statistical Validation
|
| 43 |
-
|
| 44 |
-
**Protocol**: Bayesian validation with uncertainty quantification
|
| 45 |
-
|
| 46 |
-
**Requirements**:
|
| 47 |
-
|
| 48 |
-
- Confidence intervals for all predictions
|
| 49 |
-
- Cross-validation with stratified sampling
|
| 50 |
-
- Bootstrap resampling for stability assessment
|
| 51 |
-
- Multiple independent test sets
|
| 52 |
-
|
| 53 |
-
### 2. Scientific Hypothesis Validation
|
| 54 |
-
|
| 55 |
-
#### AI-Generated Hypothesis Testing
|
| 56 |
-
|
| 57 |
-
**Objective**: Validate the scientific merit of AI-generated hypotheses
|
| 58 |
-
|
| 59 |
-
**Protocol**:
|
| 60 |
-
|
| 61 |
-
1. **Expert Review Process**
|
| 62 |
-
|
| 63 |
-
- Panel of 5+ polymer science experts
|
| 64 |
-
- Blind evaluation of AI vs. human-generated hypotheses
|
| 65 |
-
- Scoring criteria: novelty, testability, scientific rigor
|
| 66 |
-
|
| 67 |
-
2. **Experimental Validation**
|
| 68 |
-
|
| 69 |
-
- Select top 10% of AI hypotheses for experimental testing
|
| 70 |
-
- Collaborate with research institutions for validation
|
| 71 |
-
- Track hypothesis-to-discovery conversion rate
|
| 72 |
-
|
| 73 |
-
3. **Literature Integration**
|
| 74 |
-
- Automatic citation checking for hypothesis claims
|
| 75 |
-
- Novelty verification against existing literature
|
| 76 |
-
- Impact tracking through subsequent research
|
| 77 |
-
|
| 78 |
-
#### Reproducibility Standards
|
| 79 |
-
|
| 80 |
-
**Requirements**:
|
| 81 |
-
|
| 82 |
-
- Complete provenance tracking for all analyses
|
| 83 |
-
- Reproducible computational environments
|
| 84 |
-
- Version control for all algorithms and datasets
|
| 85 |
-
- Independent replication by external groups
|
| 86 |
-
|
| 87 |
-
### 3. Data Quality and Integrity
|
| 88 |
-
|
| 89 |
-
#### Metadata Validation
|
| 90 |
-
|
| 91 |
-
**Protocol**: Comprehensive metadata verification
|
| 92 |
-
|
| 93 |
-
**Implementation**:
|
| 94 |
-
|
| 95 |
-
```python
|
| 96 |
-
metadata_validation = {
|
| 97 |
-
"completeness": "90% of critical fields populated",
|
| 98 |
-
"accuracy": "Cross-reference with instrument logs",
|
| 99 |
-
"consistency": "Temporal and spatial consistency checks",
|
| 100 |
-
"traceability": "Complete chain of custody documentation"
|
| 101 |
-
}
|
| 102 |
-
```
|
| 103 |
-
|
| 104 |
-
#### Quality Assessment Metrics
|
| 105 |
-
|
| 106 |
-
1. **Spectral Quality Indicators**
|
| 107 |
-
|
| 108 |
-
- Signal-to-noise ratio thresholds
|
| 109 |
-
- Baseline stability measures
|
| 110 |
-
- Peak resolution requirements
|
| 111 |
-
- Calibration accuracy verification
|
| 112 |
-
|
| 113 |
-
2. **Automated Quality Control**
|
| 114 |
-
- Real-time quality assessment during upload
|
| 115 |
-
- Automated flagging of problematic spectra
|
| 116 |
-
- Quality score integration into model training
|
| 117 |
-
- User feedback loop for quality improvement
|
| 118 |
-
|
| 119 |
-
## Educational Effectiveness Validation
|
| 120 |
-
|
| 121 |
-
### 1. Learning Outcome Assessment
|
| 122 |
-
|
| 123 |
-
#### Competency Measurement Protocol
|
| 124 |
-
|
| 125 |
-
**Objective**: Validate educational effectiveness across user groups
|
| 126 |
-
|
| 127 |
-
**Methodology**:
|
| 128 |
-
|
| 129 |
-
1. **Pre/Post Assessment Design**
|
| 130 |
-
|
| 131 |
-
- Standardized competency tests
|
| 132 |
-
- Practical skill evaluations
|
| 133 |
-
- Knowledge retention testing (1, 3, 6 months)
|
| 134 |
-
- Transfer learning assessment
|
| 135 |
-
|
| 136 |
-
2. **Control Group Studies**
|
| 137 |
-
- Traditional learning methods comparison
|
| 138 |
-
- Platform vs. textbook learning outcomes
|
| 139 |
-
- Instructor-led vs. self-guided comparison
|
| 140 |
-
- Long-term retention studies
|
| 141 |
-
|
| 142 |
-
#### Learning Analytics Validation
|
| 143 |
-
|
| 144 |
-
**Metrics**:
|
| 145 |
-
|
| 146 |
-
```python
|
| 147 |
-
educational_metrics = {
|
| 148 |
-
"learning_velocity": "Time to competency achievement",
|
| 149 |
-
"knowledge_retention": "Long-term recall accuracy",
|
| 150 |
-
"skill_transfer": "Novel problem-solving capability",
|
| 151 |
-
"engagement_sustainability": "Continued platform usage rates"
|
| 152 |
-
}
|
| 153 |
-
```
|
| 154 |
-
|
| 155 |
-
### 2. Adaptive Learning System Validation
|
| 156 |
-
|
| 157 |
-
#### Personalization Effectiveness
|
| 158 |
-
|
| 159 |
-
**Protocol**: A/B testing of adaptive vs. static learning paths
|
| 160 |
-
|
| 161 |
-
**Measurements**:
|
| 162 |
-
|
| 163 |
-
- Learning outcome improvements with personalization
|
| 164 |
-
- User satisfaction and engagement metrics
|
| 165 |
-
- Completion rates across different user types
|
| 166 |
-
- Expert validation of learning path recommendations
|
| 167 |
-
|
| 168 |
-
#### Assessment Accuracy
|
| 169 |
-
|
| 170 |
-
**Validation Process**:
|
| 171 |
-
|
| 172 |
-
- Expert validation of competency assessments
|
| 173 |
-
- Cross-validation with external skill tests
|
| 174 |
-
- Bias detection in assessment algorithms
|
| 175 |
-
- Fairness across demographic groups
|
| 176 |
-
|
| 177 |
-
### 3. Virtual Laboratory Validation
|
| 178 |
-
|
| 179 |
-
#### Simulation Accuracy
|
| 180 |
-
|
| 181 |
-
**Requirements**:
|
| 182 |
-
|
| 183 |
-
- Physics-based validation of simulations
|
| 184 |
-
- Comparison with real experimental data
|
| 185 |
-
- Expert review of simulation parameters
|
| 186 |
-
- User feedback on realism and educational value
|
| 187 |
-
|
| 188 |
-
#### Learning Effectiveness
|
| 189 |
-
|
| 190 |
-
**Protocol**:
|
| 191 |
-
|
| 192 |
-
- Pre/post knowledge testing
|
| 193 |
-
- Hands-on skill transfer to real laboratories
|
| 194 |
-
- Expert evaluation of virtual experiment designs
|
| 195 |
-
- Long-term learning impact assessment
|
| 196 |
-
|
| 197 |
-
## System Performance Validation
|
| 198 |
-
|
| 199 |
-
### 1. Technical Performance
|
| 200 |
-
|
| 201 |
-
#### Scalability Testing
|
| 202 |
-
|
| 203 |
-
**Protocol**: Load testing and performance benchmarking
|
| 204 |
-
|
| 205 |
-
**Requirements**:
|
| 206 |
-
|
| 207 |
-
```python
|
| 208 |
-
performance_requirements = {
|
| 209 |
-
"response_time": "<2 seconds for predictions",
|
| 210 |
-
"concurrent_users": "1000+ simultaneous users",
|
| 211 |
-
"data_throughput": "100+ spectra/minute processing",
|
| 212 |
-
"uptime": "99.5% availability",
|
| 213 |
-
"data_integrity": "Zero data loss tolerance"
|
| 214 |
-
}
|
| 215 |
-
```
|
| 216 |
-
|
| 217 |
-
#### Security Validation
|
| 218 |
-
|
| 219 |
-
**Components**:
|
| 220 |
-
|
| 221 |
-
- Data encryption verification
|
| 222 |
-
- User authentication testing
|
| 223 |
-
- Access control validation
|
| 224 |
-
- Privacy compliance (GDPR, FERPA)
|
| 225 |
-
- Vulnerability scanning and penetration testing
|
| 226 |
-
|
| 227 |
-
### 2. User Experience Validation
|
| 228 |
-
|
| 229 |
-
#### Usability Testing
|
| 230 |
-
|
| 231 |
-
**Protocol**: Multi-group usability studies
|
| 232 |
-
|
| 233 |
-
**Participants**:
|
| 234 |
-
|
| 235 |
-
- Novice users (students, new researchers)
|
| 236 |
-
- Expert users (experienced scientists)
|
| 237 |
-
- Educators and instructors
|
| 238 |
-
- Industry professionals
|
| 239 |
-
|
| 240 |
-
**Metrics**:
|
| 241 |
-
|
| 242 |
-
- Task completion rates
|
| 243 |
-
- Time to task completion
|
| 244 |
-
- Error rates and recovery
|
| 245 |
-
- User satisfaction scores
|
| 246 |
-
- Feature adoption rates
|
| 247 |
-
|
| 248 |
-
#### Accessibility Validation
|
| 249 |
-
|
| 250 |
-
**Requirements**:
|
| 251 |
-
|
| 252 |
-
- WCAG 2.1 AA compliance
|
| 253 |
-
- Screen reader compatibility
|
| 254 |
-
- Keyboard navigation support
|
| 255 |
-
- Color blindness accommodation
|
| 256 |
-
- Multi-language support validation
|
| 257 |
-
|
| 258 |
-
## Community Impact Validation
|
| 259 |
-
|
| 260 |
-
### 1. Adoption and Usage Metrics
|
| 261 |
-
|
| 262 |
-
#### Platform Adoption
|
| 263 |
-
|
| 264 |
-
**Tracking Metrics**:
|
| 265 |
-
|
| 266 |
-
```python
|
| 267 |
-
adoption_metrics = {
|
| 268 |
-
"user_growth": "Monthly active user increase",
|
| 269 |
-
"institution_adoption": "Academic/industry organization uptake",
|
| 270 |
-
"geographic_spread": "Global usage distribution",
|
| 271 |
-
"retention_rate": "Long-term user engagement",
|
| 272 |
-
"feature_utilization": "Platform capability usage patterns"
|
| 273 |
-
}
|
| 274 |
-
```
|
| 275 |
-
|
| 276 |
-
#### Research Impact
|
| 277 |
-
|
| 278 |
-
**Validation**:
|
| 279 |
-
|
| 280 |
-
- Publications citing platform use
|
| 281 |
-
- Novel discoveries enabled by platform
|
| 282 |
-
- Research collaboration facilitation
|
| 283 |
-
- Cross-disciplinary adoption tracking
|
| 284 |
-
- Industry application validation
|
| 285 |
-
|
| 286 |
-
### 2. Community Contribution Validation
|
| 287 |
-
|
| 288 |
-
#### Peer Review System
|
| 289 |
-
|
| 290 |
-
**Protocol**: Community-driven validation mechanisms
|
| 291 |
-
|
| 292 |
-
**Implementation**:
|
| 293 |
-
|
| 294 |
-
- Expert reviewer credentialing
|
| 295 |
-
- Contribution quality scoring
|
| 296 |
-
- Consensus mechanism validation
|
| 297 |
-
- Bias detection in peer review
|
| 298 |
-
- Reputation system accuracy
|
| 299 |
-
|
| 300 |
-
#### Knowledge Sharing Effectiveness
|
| 301 |
-
|
| 302 |
-
**Metrics**:
|
| 303 |
-
|
| 304 |
-
- Community-generated content quality
|
| 305 |
-
- Knowledge transfer success rates
|
| 306 |
-
- Collaborative project outcomes
|
| 307 |
-
- Cross-institutional collaboration frequency
|
| 308 |
-
- Innovation emergence tracking
|
| 309 |
-
|
| 310 |
-
## Continuous Validation Process
|
| 311 |
-
|
| 312 |
-
### 1. Ongoing Monitoring
|
| 313 |
-
|
| 314 |
-
#### Real-Time Validation
|
| 315 |
-
|
| 316 |
-
**System**: Continuous monitoring and alerting
|
| 317 |
-
|
| 318 |
-
**Components**:
|
| 319 |
-
|
| 320 |
-
- Model performance drift detection
|
| 321 |
-
- Data quality degradation alerts
|
| 322 |
-
- User experience issue identification
|
| 323 |
-
- Security threat monitoring
|
| 324 |
-
- Educational outcome tracking
|
| 325 |
-
|
| 326 |
-
#### Feedback Integration
|
| 327 |
-
|
| 328 |
-
**Process**: Systematic incorporation of validation results
|
| 329 |
-
|
| 330 |
-
**Implementation**:
|
| 331 |
-
|
| 332 |
-
- Regular validation report generation
|
| 333 |
-
- Stakeholder feedback collection
|
| 334 |
-
- Improvement priority assessment
|
| 335 |
-
- Update validation protocols
|
| 336 |
-
- Community validation participation
|
| 337 |
-
|
| 338 |
-
### 2. Validation Reporting
|
| 339 |
-
|
| 340 |
-
#### Transparency Requirements
|
| 341 |
-
|
| 342 |
-
**Documentation**:
|
| 343 |
-
|
| 344 |
-
- Public validation reports (quarterly)
|
| 345 |
-
- Methodology transparency
|
| 346 |
-
- Limitation acknowledgment
|
| 347 |
-
- Improvement roadmap publication
|
| 348 |
-
- Community feedback incorporation
|
| 349 |
-
|
| 350 |
-
#### Stakeholder Communication
|
| 351 |
-
|
| 352 |
-
**Audiences**:
|
| 353 |
-
|
| 354 |
-
- Scientific community updates
|
| 355 |
-
- Educational institution reports
|
| 356 |
-
- Industry partner briefings
|
| 357 |
-
- Regulatory compliance documentation
|
| 358 |
-
- Public transparency reports
|
| 359 |
-
|
| 360 |
-
## Validation Success Criteria
|
| 361 |
-
|
| 362 |
-
### Year 1 Targets
|
| 363 |
-
|
| 364 |
-
```python
|
| 365 |
-
year_1_targets = {
|
| 366 |
-
"scientific_accuracy": {
|
| 367 |
-
"benchmark_performance": ">95% accuracy on standard datasets",
|
| 368 |
-
"hypothesis_validation": "20% of AI hypotheses experimentally confirmed",
|
| 369 |
-
"reproducibility": "100% of analyses reproducible by independent groups"
|
| 370 |
-
},
|
| 371 |
-
"educational_effectiveness": {
|
| 372 |
-
"learning_improvement": "2x faster competency achievement vs. traditional methods",
|
| 373 |
-
"knowledge_retention": "90% retention after 6 months",
|
| 374 |
-
"user_satisfaction": "4.5/5 average user rating"
|
| 375 |
-
},
|
| 376 |
-
"platform_performance": {
|
| 377 |
-
"uptime": "99.5% system availability",
|
| 378 |
-
"response_time": "<2 seconds average",
|
| 379 |
-
"user_growth": "1000+ active monthly users"
|
| 380 |
-
},
|
| 381 |
-
"community_impact": {
|
| 382 |
-
"publications": "50+ papers citing platform use",
|
| 383 |
-
"discoveries": "5+ validated novel findings",
|
| 384 |
-
"collaborations": "100+ inter-institutional collaborations"
|
| 385 |
-
}
|
| 386 |
-
}
|
| 387 |
-
```
|
| 388 |
-
|
| 389 |
-
### Long-Term Success Indicators
|
| 390 |
-
|
| 391 |
-
- Platform becomes standard tool in polymer research community
|
| 392 |
-
- Educational curricula integrate platform usage
|
| 393 |
-
- Industry adoption for quality control and R&D
|
| 394 |
-
- Emergence of unexpected use cases and applications
|
| 395 |
-
- Self-sustaining community-driven development
|
| 396 |
-
|
| 397 |
-
## Risk Mitigation
|
| 398 |
-
|
| 399 |
-
### Validation Risks
|
| 400 |
-
|
| 401 |
-
1. **Confirmation Bias**: Multiple independent validation groups
|
| 402 |
-
2. **Overfitting to Benchmarks**: Regular benchmark updates and novel test sets
|
| 403 |
-
3. **Selection Bias**: Diverse user group representation
|
| 404 |
-
4. **Temporal Drift**: Longitudinal validation studies
|
| 405 |
-
5. **Cultural Bias**: International collaboration and validation
|
| 406 |
-
|
| 407 |
-
### Mitigation Strategies
|
| 408 |
-
|
| 409 |
-
- Independent validation committees
|
| 410 |
-
- Adversarial testing protocols
|
| 411 |
-
- Diverse stakeholder involvement
|
| 412 |
-
- Transparent methodology publication
|
| 413 |
-
- Regular protocol updates based on emerging best practices
|
| 414 |
-
|
| 415 |
-
## Conclusion
|
| 416 |
-
|
| 417 |
-
The POLYMEROS validation framework ensures that the platform meets the highest standards of scientific rigor, educational effectiveness, and system reliability. Through comprehensive, multi-layered validation protocols, we aim to build trust in the platform's capabilities while continuously improving its performance and impact.
|
| 418 |
-
|
| 419 |
-
Success in validation will be measured not only by meeting technical benchmarks but by the platform's ability to accelerate scientific discovery, enhance educational outcomes, and foster collaboration within the polymer research community.
|
| 420 |
-
|
| 421 |
-
---
|
| 422 |
-
|
| 423 |
-
_This validation protocol will be updated regularly based on emerging best practices, community feedback, and technological advances. All validation results will be transparently reported to maintain scientific integrity and community trust._
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
audit-access-check.txt
DELETED
|
@@ -1 +0,0 @@
|
|
| 1 |
-
chatgpt-access-check: 2025-09-02
|
|
|
|
|
|
preprocess_ftir_legacy.py
DELETED
|
@@ -1,85 +0,0 @@
|
|
| 1 |
-
import sys
|
| 2 |
-
import os
|
| 3 |
-
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
|
| 4 |
-
import numpy as np
|
| 5 |
-
import pandas as pd
|
| 6 |
-
from scipy.interpolate import interp1d
|
| 7 |
-
from scipy.signal import savgol_filter
|
| 8 |
-
from sklearn.preprocessing import minmax_scale
|
| 9 |
-
|
| 10 |
-
def parse_ftir_file(filepath, num_points=500):
|
| 11 |
-
df = pd.read_csv(filepath, skiprows=5, header=None, names=["Wavenumber", "Transmittance"])
|
| 12 |
-
df.dropna(inplace=True)
|
| 13 |
-
x = df["Wavenumber"].values
|
| 14 |
-
y = df["Transmittance"].values
|
| 15 |
-
|
| 16 |
-
if x[0] > x[-1]: # Ensure increasing order
|
| 17 |
-
x = x[::-1]
|
| 18 |
-
y = y[::-1]
|
| 19 |
-
|
| 20 |
-
f_interp = interp1d(x, y, kind="linear", fill_value="extrapolate")
|
| 21 |
-
x_uniform = np.linspace(x.min(), x.max(), num_points)
|
| 22 |
-
y_uniform = f_interp(x_uniform)
|
| 23 |
-
|
| 24 |
-
return y_uniform
|
| 25 |
-
|
| 26 |
-
def label_from_filename(filename):
|
| 27 |
-
""" Assign label based on sample prefix:
|
| 28 |
-
- '2015a', '2015b', '2015c', '2016a', '2016b' => 1 (Aged)
|
| 29 |
-
- All other number prefixes (1 . . 19) => 0 (Unaged)
|
| 30 |
-
"""
|
| 31 |
-
lower_filename = filename.lower()
|
| 32 |
-
if lower_filename.startswith(("2015a", "2015b", "2015c", "2016a", "2016b")):
|
| 33 |
-
return 1 # Aged
|
| 34 |
-
else:
|
| 35 |
-
return 0 # Unaged
|
| 36 |
-
|
| 37 |
-
def remove_baseline(y):
|
| 38 |
-
x = np.arange(len(y))
|
| 39 |
-
coeffs = np.polyfit(x, y, deg=2)
|
| 40 |
-
baseline = np.polyval(coeffs, x)
|
| 41 |
-
return y - baseline
|
| 42 |
-
|
| 43 |
-
def normalize_spectrum(y):
|
| 44 |
-
return minmax_scale(y)
|
| 45 |
-
|
| 46 |
-
def smooth_spectrum(y, window_length=11, polyorder=2):
|
| 47 |
-
return savgol_filter(y, window_length, polyorder)
|
| 48 |
-
|
| 49 |
-
def preprocess_ftir(directory, target_len=500, baseline_correction=False, apply_smoothing=False, normalize=False):
|
| 50 |
-
X, y = [], []
|
| 51 |
-
|
| 52 |
-
for root, _, files in os.walk(directory):
|
| 53 |
-
for file in files:
|
| 54 |
-
if file.endswith(".csv") and "fcl" in file.lower():
|
| 55 |
-
filepath = os.path.join(root, file)
|
| 56 |
-
try:
|
| 57 |
-
spectrum = parse_ftir_file(filepath, num_points=target_len)
|
| 58 |
-
|
| 59 |
-
if baseline_correction:
|
| 60 |
-
spectrum = remove_baseline(spectrum)
|
| 61 |
-
if apply_smoothing:
|
| 62 |
-
spectrum = smooth_spectrum(spectrum)
|
| 63 |
-
if normalize:
|
| 64 |
-
spectrum = normalize_spectrum(spectrum)
|
| 65 |
-
|
| 66 |
-
label = label_from_filename(file)
|
| 67 |
-
X.append(spectrum)
|
| 68 |
-
y.append(label)
|
| 69 |
-
except Exception as e:
|
| 70 |
-
print(f"Failed to process {file}: {e}")
|
| 71 |
-
with open("scripts/ftir_failures.log", "a", encoding="utf-8") as log_file:
|
| 72 |
-
log_file.write(f"{file} - {e}\n")
|
| 73 |
-
|
| 74 |
-
X = np.array(X)
|
| 75 |
-
y = np.array(y)
|
| 76 |
-
|
| 77 |
-
print(f"Processed {len(X)} FTIR samples.")
|
| 78 |
-
return X, y
|
| 79 |
-
|
| 80 |
-
if __name__ == "__main__":
|
| 81 |
-
data_dir = os.path.join("datasets", "ftir")
|
| 82 |
-
X, y = preprocess_ftir(data_dir)
|
| 83 |
-
print(f"X shape: {X.shape}")
|
| 84 |
-
print(f"y shape: {y.shape}")
|
| 85 |
-
print(f"Label distribution: {np.bincount(y)}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
train_ftir_model.py
DELETED
|
@@ -1,139 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
Module: train_ftir_model
|
| 3 |
-
------------------------
|
| 4 |
-
This module trains the Figure2CNN model on preprocessed FTIR (Fourier-transform infrared spectroscopy) data.
|
| 5 |
-
It includes data loading, preprocessing, splitting into training and testing sets, and a training loop
|
| 6 |
-
with loss and accuracy tracking.
|
| 7 |
-
|
| 8 |
-
Workflow:
|
| 9 |
-
1. Load preprocessed FTIR data using `preprocess_ftir`.
|
| 10 |
-
2. Split the data into training and testing sets (80% train, 20% test).
|
| 11 |
-
3. Convert the data into PyTorch tensors and wrap them in DataLoaders.
|
| 12 |
-
4. Instantiate the Figure2CNN model and define the loss function and optimizer.
|
| 13 |
-
5. Train the model over a specified number of epochs, tracking loss and accuracy.
|
| 14 |
-
6. Evaluate the model on the test set and report accuracy.
|
| 15 |
-
|
| 16 |
-
Dependencies:
|
| 17 |
-
- os
|
| 18 |
-
- numpy
|
| 19 |
-
- torch
|
| 20 |
-
- sklearn.model_selection (for train-test split)
|
| 21 |
-
- preprocess_ftir (for data preprocessing)
|
| 22 |
-
- models.figure2_cnn (for the CNN model)
|
| 23 |
-
|
| 24 |
-
Usage:
|
| 25 |
-
Run this script directly to train the Figure2CNN model on the FTIR dataset.
|
| 26 |
-
The script outputs training loss and accuracy for each epoch, as well as test set accuracy.
|
| 27 |
-
"""
|
| 28 |
-
import sys
|
| 29 |
-
import os
|
| 30 |
-
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
|
| 31 |
-
import numpy as np
|
| 32 |
-
import torch
|
| 33 |
-
from torch import optim
|
| 34 |
-
from torch import nn
|
| 35 |
-
from sklearn.model_selection import train_test_split
|
| 36 |
-
from torch.utils.data import TensorDataset, DataLoader
|
| 37 |
-
from preprocess_ftir import preprocess_ftir
|
| 38 |
-
from models.figure2_cnn import Figure2CNN
|
| 39 |
-
|
| 40 |
-
# Load preprocessed Raman data
|
| 41 |
-
data_dir = os.path.join("datasets", "ftir")
|
| 42 |
-
X, y = preprocess_ftir(data_dir)
|
| 43 |
-
|
| 44 |
-
# Print shape for confirmation
|
| 45 |
-
print(f"Total Samples: {X.shape[0]}")
|
| 46 |
-
print(f"Feature Shape per Sample: {X.shape[1]}")
|
| 47 |
-
|
| 48 |
-
# Split into 80% train, 20% test
|
| 49 |
-
X_train, X_test, y_train, y_test = train_test_split(
|
| 50 |
-
X,
|
| 51 |
-
y,
|
| 52 |
-
test_size=0.2,
|
| 53 |
-
random_state=42, # for reproductability
|
| 54 |
-
stratify=y, # balances label distribution (e.g.,0s & 1s)
|
| 55 |
-
)
|
| 56 |
-
|
| 57 |
-
# Confirm Results
|
| 58 |
-
print(f"Training Samples: {X_train.shape[0]}")
|
| 59 |
-
print(f"Test Samples: {X_test.shape[0]}")
|
| 60 |
-
|
| 61 |
-
# Convert to PyTorch tensors
|
| 62 |
-
X_train_tensor = torch.tensor(X_train, dtype=torch.float32).unsqueeze(
|
| 63 |
-
1
|
| 64 |
-
) # shape (N, 1, 500)
|
| 65 |
-
y_train_tensor = torch.tensor(y_train, dtype=torch.long)
|
| 66 |
-
X_test_tensor = torch.tensor(X_test, dtype=torch.float32).unsqueeze(1)
|
| 67 |
-
y_test_tensor = torch.tensor(y_test, dtype=torch.long)
|
| 68 |
-
|
| 69 |
-
# Wrap in TensorDataset
|
| 70 |
-
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
|
| 71 |
-
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)
|
| 72 |
-
|
| 73 |
-
# Create DataLoaders
|
| 74 |
-
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
|
| 75 |
-
test_loader = DataLoader(test_dataset, batch_size=16)
|
| 76 |
-
|
| 77 |
-
# Instantiate model
|
| 78 |
-
model = Figure2CNN(input_length=500)
|
| 79 |
-
|
| 80 |
-
# Set to evaluation or training mode as needed
|
| 81 |
-
model.train()
|
| 82 |
-
|
| 83 |
-
# Define loss function
|
| 84 |
-
criterion = nn.CrossEntropyLoss()
|
| 85 |
-
|
| 86 |
-
# Optimizer
|
| 87 |
-
optimizer = optim.Adam(model.parameters(), lr=0.001)
|
| 88 |
-
|
| 89 |
-
# Training loop
|
| 90 |
-
num_epochs = 10
|
| 91 |
-
|
| 92 |
-
for epoch in range(num_epochs):
|
| 93 |
-
model.train()
|
| 94 |
-
running_loss = 0.0
|
| 95 |
-
correct = 0
|
| 96 |
-
total = 0
|
| 97 |
-
|
| 98 |
-
for inputs, labels in train_loader:
|
| 99 |
-
# Forward pass
|
| 100 |
-
outputs = model(inputs)
|
| 101 |
-
loss = criterion(outputs, labels)
|
| 102 |
-
|
| 103 |
-
# Backpropagation + optimization
|
| 104 |
-
optimizer.zero_grad()
|
| 105 |
-
loss.backward()
|
| 106 |
-
optimizer.step()
|
| 107 |
-
|
| 108 |
-
# Track metrics
|
| 109 |
-
running_loss += loss.item()
|
| 110 |
-
_, predicted = torch.max(outputs, 1)
|
| 111 |
-
total += labels.size(0)
|
| 112 |
-
correct += (predicted == labels).sum().item()
|
| 113 |
-
|
| 114 |
-
# Epoch summary
|
| 115 |
-
epoch_loss = running_loss / len(train_loader)
|
| 116 |
-
epoch_acc = correct / total * 100
|
| 117 |
-
|
| 118 |
-
print(
|
| 119 |
-
f"Epoch [{epoch+1}/{num_epochs}] - Loss: {epoch_loss:.4f} - Accuracy: {epoch_acc:.2f}%"
|
| 120 |
-
)
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
# -------------------------------
|
| 124 |
-
# Step 6.1.5 — Evaluation on Test Set
|
| 125 |
-
# -------------------------------
|
| 126 |
-
|
| 127 |
-
model.eval()
|
| 128 |
-
correct = 0
|
| 129 |
-
total = 0
|
| 130 |
-
|
| 131 |
-
with torch.no_grad():
|
| 132 |
-
for inputs, labels in test_loader:
|
| 133 |
-
outputs = model(inputs)
|
| 134 |
-
_, predicted = torch.max(outputs, 1)
|
| 135 |
-
total += labels.size(0)
|
| 136 |
-
correct += (predicted == labels).sum().item()
|
| 137 |
-
|
| 138 |
-
test_acc = correct / total * 100
|
| 139 |
-
print(f"\nTest Set Accuracy: {test_acc:.2f}%")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
train_ftir_model_cv.py
DELETED
|
@@ -1,201 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
This script performs 10-fold cross-validation on FTIR data using a CNN model.
|
| 3 |
-
It includes optional preprocessing steps such as baseline correction, smoothing, and normalization.
|
| 4 |
-
"""
|
| 5 |
-
|
| 6 |
-
import os
|
| 7 |
-
import sys
|
| 8 |
-
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
|
| 9 |
-
import json
|
| 10 |
-
import argparse
|
| 11 |
-
from datetime import datetime
|
| 12 |
-
import warnings
|
| 13 |
-
|
| 14 |
-
import numpy as np
|
| 15 |
-
import torch
|
| 16 |
-
from torch.utils.data import TensorDataset, DataLoader
|
| 17 |
-
from sklearn.model_selection import StratifiedKFold
|
| 18 |
-
from sklearn.metrics import confusion_matrix
|
| 19 |
-
|
| 20 |
-
from models.figure2_cnn import Figure2CNN
|
| 21 |
-
from depracated_scripts.preprocess_ftir import preprocess_ftir
|
| 22 |
-
|
| 23 |
-
# Argument parser
|
| 24 |
-
parser = argparse.ArgumentParser(
|
| 25 |
-
description="Run 10-fold CV on FTIR data with optional preprocessing.")
|
| 26 |
-
parser.add_argument(
|
| 27 |
-
"--target-len", type=int, default=500,
|
| 28 |
-
help="Number of points to resample spectra to"
|
| 29 |
-
)
|
| 30 |
-
parser.add_argument(
|
| 31 |
-
"--baseline", action="store_true",
|
| 32 |
-
help="Apply baseline correction"
|
| 33 |
-
)
|
| 34 |
-
parser.add_argument(
|
| 35 |
-
"--smooth", action="store_true",
|
| 36 |
-
help="Apply Savitzky-Golay smoothing"
|
| 37 |
-
)
|
| 38 |
-
parser.add_argument(
|
| 39 |
-
"--normalize", action="store_true",
|
| 40 |
-
help="Apply min-max normalization"
|
| 41 |
-
)
|
| 42 |
-
parser.add_argument(
|
| 43 |
-
"--batch-size", type=int, default=16,
|
| 44 |
-
help="Batch size for training"
|
| 45 |
-
)
|
| 46 |
-
parser.add_argument(
|
| 47 |
-
"--epochs", type=int, default=10,
|
| 48 |
-
help="Number of training epochs."
|
| 49 |
-
)
|
| 50 |
-
parser.add_argument(
|
| 51 |
-
"--learning-rate", type=float,
|
| 52 |
-
default=1e-3, help="Learning rate for optimizer."
|
| 53 |
-
)
|
| 54 |
-
|
| 55 |
-
args = parser.parse_args()
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
# Print configuration
|
| 59 |
-
print("Preprocessing Configuration:")
|
| 60 |
-
print(f" Reseample to : {args.target_len} points")
|
| 61 |
-
print(f" Baseline Correct: {'✅' if args.baseline else '❌'}")
|
| 62 |
-
print(f" Smoothing : {'✅' if args.smooth else '❌'}")
|
| 63 |
-
print(f" Normalization : {'✅' if args.normalize else '❌'}")
|
| 64 |
-
|
| 65 |
-
# Constants
|
| 66 |
-
DATASET_PATH = 'datasets/ftir'
|
| 67 |
-
BATCH_SIZE = args.batch_size
|
| 68 |
-
EPOCHS = args.epochs
|
| 69 |
-
NUM_FOLDS = 10
|
| 70 |
-
LEARNING_RATE = args.learning_rate
|
| 71 |
-
DEVICE = torch.device("cuda" if torch.cuda.is_available() else 'cpu')
|
| 72 |
-
|
| 73 |
-
# Load and preprocess dataset
|
| 74 |
-
print("🔄 Loading and preprocessing FTIR data ...")
|
| 75 |
-
X, y = preprocess_ftir(
|
| 76 |
-
DATASET_PATH,
|
| 77 |
-
target_len=args.target_len,
|
| 78 |
-
baseline_correction=args.baseline,
|
| 79 |
-
apply_smoothing=args.smooth,
|
| 80 |
-
normalize=args.normalize
|
| 81 |
-
)
|
| 82 |
-
X = np.array(X, dtype=np.float32)
|
| 83 |
-
y = np.array(y, dtype=np.int64)
|
| 84 |
-
print(f"✅ Data Loaded: {X.shape[0]} samples, {X.shape[1]} features each.")
|
| 85 |
-
input_channels = 4
|
| 86 |
-
|
| 87 |
-
# Cross-validation setup
|
| 88 |
-
skf = StratifiedKFold(n_splits=NUM_FOLDS, shuffle=True, random_state=42)
|
| 89 |
-
fold_accuracies = []
|
| 90 |
-
all_conf_matrices = []
|
| 91 |
-
|
| 92 |
-
# Cross-validation loop
|
| 93 |
-
with warnings.catch_warnings():
|
| 94 |
-
warnings.simplefilter("ignore", category=UserWarning)
|
| 95 |
-
for fold, (train_idx, val_idx) in enumerate(skf.split(X, y), 1):
|
| 96 |
-
print(f"\n🔁 Fold {fold}/{NUM_FOLDS} Training...")
|
| 97 |
-
|
| 98 |
-
X_train, X_val = X[train_idx], X[val_idx]
|
| 99 |
-
y_train, y_val = y[train_idx], y[val_idx]
|
| 100 |
-
|
| 101 |
-
train_dataset = TensorDataset(torch.tensor(X_train), torch.tensor(y_train))
|
| 102 |
-
val_dataset = TensorDataset(torch.tensor(X_val), torch.tensor(y_val))
|
| 103 |
-
train_loader = DataLoader(
|
| 104 |
-
train_dataset, batch_size=BATCH_SIZE, shuffle=True)
|
| 105 |
-
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE)
|
| 106 |
-
|
| 107 |
-
model = Figure2CNN(input_length=args.target_len, input_channels=input_channels).to(DEVICE)
|
| 108 |
-
model.describe_model()
|
| 109 |
-
optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)
|
| 110 |
-
criterion = torch.nn.CrossEntropyLoss()
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
# Training
|
| 114 |
-
for epoch in range(EPOCHS):
|
| 115 |
-
model.train()
|
| 116 |
-
RUNNING_LOSS = 0.0
|
| 117 |
-
for inputs, labels in train_loader:
|
| 118 |
-
inputs = inputs.view(inputs.size(0), input_channels, args.target_len).to(DEVICE)
|
| 119 |
-
labels = labels.to(DEVICE)
|
| 120 |
-
|
| 121 |
-
optimizer.zero_grad()
|
| 122 |
-
outputs = model(inputs)
|
| 123 |
-
loss = criterion(outputs, labels)
|
| 124 |
-
loss.backward()
|
| 125 |
-
optimizer.step()
|
| 126 |
-
RUNNING_LOSS += loss.item()
|
| 127 |
-
|
| 128 |
-
# After fold loop (outside the epoch loop), print 1 line:
|
| 129 |
-
print(f"✅ Fold {fold} completed. Final training loss: {RUNNING_LOSS:.4f}")
|
| 130 |
-
|
| 131 |
-
torch.save(model.state_dict(), "outputs/ftir_model.pth")
|
| 132 |
-
|
| 133 |
-
# Evaluation
|
| 134 |
-
model.eval()
|
| 135 |
-
all_true = []
|
| 136 |
-
all_pred = []
|
| 137 |
-
with torch.no_grad():
|
| 138 |
-
for inputs, labels in val_loader:
|
| 139 |
-
inputs = inputs.view(inputs.size(0), input_channels, args.target_len).to(DEVICE)
|
| 140 |
-
labels = labels.to(DEVICE)
|
| 141 |
-
outputs = model(inputs)
|
| 142 |
-
_, predicted = torch.max(outputs, 1)
|
| 143 |
-
all_true.extend(labels.cpu().numpy())
|
| 144 |
-
all_pred.extend(predicted.cpu().numpy())
|
| 145 |
-
|
| 146 |
-
accuracy = 100 * np.mean(np.array(all_true) == np.array(all_pred))
|
| 147 |
-
fold_accuracies.append(accuracy)
|
| 148 |
-
conf_mat = confusion_matrix(all_true, all_pred)
|
| 149 |
-
all_conf_matrices.append(conf_mat)
|
| 150 |
-
|
| 151 |
-
print(f"✅ Fold {fold} Accuracy: {accuracy:.2f}%")
|
| 152 |
-
print(f"Confusion Matrix Fold {fold}):\n{conf_mat}")
|
| 153 |
-
|
| 154 |
-
# Final summary
|
| 155 |
-
print("\n📊 Final Cross-Validation Results:")
|
| 156 |
-
for i, acc in enumerate(fold_accuracies, 1):
|
| 157 |
-
print(f"Fold {i}: {acc:.2f}%")
|
| 158 |
-
|
| 159 |
-
mean_acc = np.mean(fold_accuracies)
|
| 160 |
-
std_acc = np.std(fold_accuracies)
|
| 161 |
-
print(f"\n✅ Mean Accuracy: {mean_acc:.2f}% ± {std_acc:.2f}%")
|
| 162 |
-
print("✅ Model saved to outputs/ftir_model.pth")
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
# Diagnostics log saving
|
| 166 |
-
def save_diagnostics_log(fold_accs, conf_matrices, config_args, output_path="logs/ftir_cv_diagnostics.json"):
|
| 167 |
-
fold_metrics = [
|
| 168 |
-
{
|
| 169 |
-
"fold": i + 1,
|
| 170 |
-
"accuracy": acc,
|
| 171 |
-
"confusion_matrix": cm.tolist()
|
| 172 |
-
}
|
| 173 |
-
for i, (acc, cm) in enumerate(zip(fold_accs, conf_matrices))
|
| 174 |
-
]
|
| 175 |
-
log = {
|
| 176 |
-
"timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
| 177 |
-
"preprocessing": {
|
| 178 |
-
"target_len": config_args.target_len,
|
| 179 |
-
"baseline_correction": config_args.baseline,
|
| 180 |
-
"smoothin": config_args.smooth,
|
| 181 |
-
"normalization": config_args.normalize,
|
| 182 |
-
},
|
| 183 |
-
"fold_metrics": fold_metrics,
|
| 184 |
-
"overall": {
|
| 185 |
-
"mean_accuracy": float(np.mean(fold_accs)),
|
| 186 |
-
"std_accuracy": float(np.std(fold_accs)),
|
| 187 |
-
"num_folds": len(fold_accs),
|
| 188 |
-
"batch_size": BATCH_SIZE,
|
| 189 |
-
"epochs": EPOCHS,
|
| 190 |
-
"learning_rate": LEARNING_RATE,
|
| 191 |
-
"device": str(DEVICE)
|
| 192 |
-
}
|
| 193 |
-
}
|
| 194 |
-
os.makedirs("logs", exist_ok=True)
|
| 195 |
-
with open(output_path, "w", encoding="utf-8") as f:
|
| 196 |
-
json.dump(log, f, indent=2)
|
| 197 |
-
print(f"🧠 Diagnostics written to {output_path}")
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
# Run diagnostics save
|
| 201 |
-
save_diagnostics_log(fold_accuracies, all_conf_matrices, args)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
train_model.py
DELETED
|
@@ -1,135 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
Module: train_model
|
| 3 |
-
-------------------
|
| 4 |
-
This module trains the Figure2CNN model on preprocessed Raman spectroscopy data.
|
| 5 |
-
It includes data loading, preprocessing, splitting into training and testing sets,
|
| 6 |
-
and a training loop with loss and accuracy tracking.
|
| 7 |
-
|
| 8 |
-
Workflow:
|
| 9 |
-
1. Load preprocessed Raman data using `preprocess_dataset`.
|
| 10 |
-
2. Split the data into training and testing sets (80% train, 20% test).
|
| 11 |
-
3. Convert the data into PyTorch tensors and wrap them in DataLoaders.
|
| 12 |
-
4. Instantiate the Figure2CNN model and define the loss function and optimizer.
|
| 13 |
-
5. Train the model over a specified number of epochs, tracking loss and accuracy.
|
| 14 |
-
|
| 15 |
-
Dependencies:
|
| 16 |
-
- os
|
| 17 |
-
- numpy
|
| 18 |
-
- torch
|
| 19 |
-
- sklearn.model_selection (for train-test split)
|
| 20 |
-
- preprocess_dataset (for data preprocessing)
|
| 21 |
-
- models.figure2_cnn (for the CNN model)
|
| 22 |
-
|
| 23 |
-
Usage:
|
| 24 |
-
Run this script directly to train the Figure2CNN model on the Raman dataset.
|
| 25 |
-
The script outputs training loss and accuracy for each epoch.
|
| 26 |
-
"""
|
| 27 |
-
import sys
|
| 28 |
-
import os
|
| 29 |
-
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
|
| 30 |
-
import torch
|
| 31 |
-
from torch.utils.data import TensorDataset, DataLoader
|
| 32 |
-
from sklearn.model_selection import train_test_split
|
| 33 |
-
import torch.nn as nn
|
| 34 |
-
import torch.optim as optim
|
| 35 |
-
from preprocess_dataset import preprocess_dataset
|
| 36 |
-
from models.figure2_cnn import Figure2CNN
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
# Load preprocessed Raman data
|
| 40 |
-
data_dir = os.path.join("datasets", "rdwp")
|
| 41 |
-
X, y = preprocess_dataset(data_dir)
|
| 42 |
-
|
| 43 |
-
# Print shape for confirmation
|
| 44 |
-
print(f"Total Samples: {X.shape[0]}")
|
| 45 |
-
print(f"Feature Shape per Sample: {X.shape[1]}")
|
| 46 |
-
|
| 47 |
-
# Split into 80% train, 20% test
|
| 48 |
-
X_train, X_test, y_train, y_test = train_test_split(
|
| 49 |
-
X, y,
|
| 50 |
-
test_size=0.2,
|
| 51 |
-
random_state=42, # for reproductability
|
| 52 |
-
stratify=y # balances label distribution (e.g.,0s & 1s)
|
| 53 |
-
)
|
| 54 |
-
|
| 55 |
-
# Confirm Results
|
| 56 |
-
print(f"Training Samples: {X_train.shape[0]}")
|
| 57 |
-
print(f"Test Samples: {X_test.shape[0]}")
|
| 58 |
-
|
| 59 |
-
# Convert to PyTorch tensors
|
| 60 |
-
X_train_tensor = torch.tensor(
|
| 61 |
-
X_train, dtype=torch.float32).unsqueeze(1) # shape (N, 1, 500)
|
| 62 |
-
y_train_tensor = torch.tensor(y_train, dtype=torch.long)
|
| 63 |
-
X_test_tensor = torch.tensor(X_test, dtype=torch.float32).unsqueeze(1)
|
| 64 |
-
y_test_tensor = torch.tensor(y_test, dtype=torch.long)
|
| 65 |
-
|
| 66 |
-
# Wrap in TensorDataset
|
| 67 |
-
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
|
| 68 |
-
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)
|
| 69 |
-
|
| 70 |
-
# Create DataLoaders
|
| 71 |
-
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
|
| 72 |
-
test_loader = DataLoader(test_dataset, batch_size=16)
|
| 73 |
-
|
| 74 |
-
# Instantiate model
|
| 75 |
-
model = Figure2CNN(input_length=500)
|
| 76 |
-
|
| 77 |
-
# Set to evaluation or training mode as needed
|
| 78 |
-
model.train()
|
| 79 |
-
|
| 80 |
-
# Define loss function
|
| 81 |
-
criterion = nn.CrossEntropyLoss()
|
| 82 |
-
|
| 83 |
-
# Optimizer
|
| 84 |
-
optimizer = optim.Adam(model.parameters(), lr=0.001)
|
| 85 |
-
|
| 86 |
-
# Training loop
|
| 87 |
-
num_epochs = 10
|
| 88 |
-
|
| 89 |
-
for epoch in range(num_epochs):
|
| 90 |
-
model.train()
|
| 91 |
-
running_loss = 0.0
|
| 92 |
-
correct = 0
|
| 93 |
-
total = 0
|
| 94 |
-
|
| 95 |
-
for inputs, labels in train_loader:
|
| 96 |
-
# Forward pass
|
| 97 |
-
outputs = model(inputs)
|
| 98 |
-
loss = criterion(outputs, labels)
|
| 99 |
-
|
| 100 |
-
# Backpropagation + optimization
|
| 101 |
-
optimizer.zero_grad()
|
| 102 |
-
loss.backward()
|
| 103 |
-
optimizer.step()
|
| 104 |
-
|
| 105 |
-
# Track metrics
|
| 106 |
-
running_loss += loss.item()
|
| 107 |
-
_, predicted = torch.max(outputs, 1)
|
| 108 |
-
total += labels.size(0)
|
| 109 |
-
correct += (predicted == labels).sum().item()
|
| 110 |
-
|
| 111 |
-
# Epoch summary
|
| 112 |
-
epoch_loss = running_loss / len(train_loader)
|
| 113 |
-
epoch_acc = correct / total * 100
|
| 114 |
-
|
| 115 |
-
print(
|
| 116 |
-
f"Epoch [{epoch+1}/{num_epochs}] - Loss: {epoch_loss:.4f} - Accuracy: {epoch_acc:.2f}%")
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
# -------------------------------
|
| 120 |
-
# Step 6.1.5 — Evaluation on Test Set
|
| 121 |
-
# -------------------------------
|
| 122 |
-
|
| 123 |
-
model.eval()
|
| 124 |
-
correct = 0
|
| 125 |
-
total = 0
|
| 126 |
-
|
| 127 |
-
with torch.no_grad():
|
| 128 |
-
for inputs, labels in test_loader:
|
| 129 |
-
outputs = model(inputs)
|
| 130 |
-
_, predicted = torch.max(outputs, 1)
|
| 131 |
-
total += labels.size(0)
|
| 132 |
-
correct += (predicted == labels).sum().item()
|
| 133 |
-
|
| 134 |
-
test_acc = correct / total * 100
|
| 135 |
-
print(f"\nTest Set Accuracy: {test_acc:.2f}%")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|