YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Dimensionality Reduction: Comprehensive Implementation and Analysis

A comprehensive implementation and analysis of dimensionality reduction techniques including PCA, t-SNE, UMAP, and Autoencoders. This repository demonstrates the theory, implementation, and evaluation of these methods on standard datasets.

🎯 Overview

Dimensionality reduction is crucial in machine learning for:

Data Visualization: Projecting high-dimensional data to 2D/3D for human interpretation
Computational Efficiency: Reducing feature space for faster processing
Noise Reduction: Eliminating redundant or noisy features
Storage Optimization: Compressing data while preserving essential information

This project provides a complete suite of dimensionality reduction methods with detailed explanations, implementations, and performance comparisons.

📊 Methods Implemented

1. Principal Component Analysis (PCA)

Type: Linear dimensionality reduction
Key Feature: Finds directions of maximum variance
Best For: Data with linear structure, feature compression
Results:
- Iris: 97.5% accuracy retention with 2 components
- Digits: 52.4% accuracy retention with 2 components

2. t-SNE (t-Distributed Stochastic Neighbor Embedding)

Type: Non-linear manifold learning
Key Feature: Preserves local neighborhood structure
Best For: Data visualization, clustering analysis
Results:
- Iris: 105.0% accuracy retention
- Digits: 100.4% accuracy retention

3. UMAP (Uniform Manifold Approximation and Projection)

Type: Non-linear manifold learning
Key Feature: Preserves both local and global structure
Best For: Balanced visualization, scalable to large datasets
Results:
- Iris: 102.5% accuracy retention
- Digits: 99.2% accuracy retention

4. Autoencoder (Neural Network)

Type: Non-linear neural network approach
Key Feature: Learns optimal encoding through reconstruction
Best For: Complex non-linear relationships, customizable architectures
Architecture: Input → 128 → 64 → Encoding → 64 → 128 → Output

🗂️ Project Structure

dimensionality-reduction/
├── implementation.ipynb          # Complete Jupyter notebook with theory and code
├── dimensionality_reduction.log  # Detailed execution logs
├── models/                      # Saved trained models
│   ├── pca_iris.pkl
│   ├── pca_digits.pkl
│   ├── umap_iris.pkl
│   ├── umap_digits.pkl
│   ├── autoencoder_iris.pth
│   └── autoencoder_digits.pth
├── results/                     # Analysis results
│   └── dimensionality_reduction_summary.json
├── visualizations/              # Generated plots and comparisons
│   ├── pca_explained_variance.png
│   ├── iris_comparison.png
│   └── digits_comparison.png
└── README.md                    # This file

🚀 Quick Start

Prerequisites

pip install numpy pandas scikit-learn matplotlib seaborn plotly umap-learn torch torchvision

Running the Analysis

Clone the repository:

git clone https://github.com/GruheshKurra/dimensionality-reduction.git
cd dimensionality-reduction

Install dependencies:
```
pip install -r requirements.txt
```

Run the complete analysis:

jupyter notebook implementation.ipynb

Or execute the main script:

python main.py

📈 Results Summary

Dataset Information

Iris Dataset: 150 samples, 4 features, 3 classes
Digits Dataset: 1797 samples, 64 features, 10 classes

Performance Comparison (Accuracy Retention)

Method	Iris Dataset	Digits Dataset
PCA	97.5%	52.4%
t-SNE	105.0%	100.4%
UMAP	102.5%	99.2%

Key Insights

PCA works well for low-dimensional data (Iris) but struggles with high-dimensional complex patterns (Digits)
t-SNE excels at preserving local structure, sometimes even improving classification performance
UMAP provides excellent balance between local and global structure preservation
Autoencoders offer flexibility but require careful tuning

🔍 Detailed Analysis

PCA Explained Variance

Iris: First 2 components explain 95.8% of variance
Digits: First 2 components explain only 21.6% of variance

Method Characteristics

Aspect	PCA	t-SNE	UMAP	Autoencoder
Linearity	Linear	Non-linear	Non-linear	Non-linear
Speed	Fast	Slow	Medium	Medium
Deterministic	Yes	No	Yes*	Yes*
New Data	✅	❌	✅	✅
Interpretability	High	Low	Medium	Low

*With fixed random seed

📖 Educational Content

The implementation.ipynb notebook includes:

Theory Explanation: Mathematical foundations and intuitive explanations
Step-by-step Implementation: Detailed code with comprehensive comments
Visual Comparisons: Side-by-side plots showing method differences
Performance Evaluation: Classification accuracy retention analysis
Best Practices: When to use each method and parameter selection

🛠️ Technical Details

Dependencies

numpy: Numerical computing
pandas: Data manipulation
scikit-learn: Machine learning algorithms
matplotlib, seaborn: Data visualization
umap-learn: UMAP implementation
torch: Neural network autoencoder
plotly: Interactive visualizations

Key Features

Comprehensive Logging: Detailed execution logs for reproducibility
Model Persistence: Save and load trained models
Evaluation Framework: Systematic performance comparison
Visualization Suite: Publication-quality plots
Structured Results: JSON summary for further analysis

🎓 Learning Outcomes

After working through this project, you will understand:

Mathematical Foundations: How each method works mathematically
Implementation Details: How to implement these methods from scratch
Performance Trade-offs: When to use each method
Evaluation Strategies: How to assess dimensionality reduction quality
Practical Applications: Real-world use cases and considerations

🤝 Contributing

Contributions are welcome! Please feel free to:

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🔗 Links

GitHub Repository: dimensionality-reduction
Hugging Face Space: karthik-2905/dimensionality-reduction
Documentation: Implementation Notebook

📞 Contact

For questions or feedback, please:

Open an issue on GitHub
Contact the maintainer: Karthik

Note: This is an educational project designed to demonstrate dimensionality reduction techniques. The implementations prioritize clarity and understanding over performance optimization.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support