Navigation_env / index.html
bdurgaprasadreddy's picture
add: index.html
a0d0972
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>AIDK — Building the Kernel of Autonomous Industry</title>
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;600;700&family=Outfit:wght@400;700&display=swap" rel="stylesheet">
<style>
:root {
--bg: #0f172a;
--card: #1e293b;
--primary: #38bdf8;
--secondary: #818cf8;
--text-dim: #94a3b8;
--text-bright: #f8fafc;
--accent: #f472b6;
--highlight: #0ea5e9;
}
body {
background: var(--bg);
color: var(--text-bright);
font-family: 'Inter', sans-serif;
max-width: 850px;
margin: auto;
padding: 60px 20px;
line-height: 1.8;
}
h1, h2, h3 {
font-family: 'Outfit', sans-serif;
letter-spacing: -0.02em;
}
h1 {
text-align: center;
font-size: 3.5rem;
background: linear-gradient(to right, var(--primary), var(--secondary));
-webkit-background-clip: text;
-webkit-text-fill-color: transparent;
margin-bottom: 20px;
}
.subtitle {
text-align: center;
font-size: 1.25rem;
color: var(--text-dim);
margin-bottom: 50px;
font-weight: 300;
}
h2 {
margin-top: 60px;
font-size: 2.2rem;
color: var(--primary);
border-bottom: 1px solid #334155;
padding-bottom: 12px;
}
p {
color: #cbd5e1;
margin-bottom: 25px;
font-size: 1.05rem;
}
b, strong {
color: var(--primary);
}
blockquote {
border-left: 4px solid var(--secondary);
padding: 24px;
background: var(--card);
border-radius: 0 16px 16px 0;
margin: 45px 0;
font-size: 1.2rem;
font-style: italic;
color: var(--primary);
line-height: 1.6;
}
.highlight-box {
background: linear-gradient(135deg, #1e293b 0%, #0f172a 100%);
border: 2px solid var(--highlight);
border-radius: 20px;
padding: 35px;
margin: 55px 0;
box-shadow: 0 0 40px rgba(14, 165, 233, 0.2);
}
.highlight-box h3 {
margin-top: 0;
color: var(--highlight);
font-size: 1.7rem;
display: flex;
align-items: center;
gap: 12px;
}
.nav-strip {
display: flex;
justify-content: center;
gap: 20px;
margin-bottom: 50px;
flex-wrap: wrap;
}
.nav-link {
display: inline-flex;
align-items: center;
padding: 12px 24px;
background: var(--card);
border: 1px solid #334155;
border-radius: 10px;
text-decoration: none;
color: var(--text-bright);
font-weight: 600;
font-size: 0.95rem;
transition: all 0.3s ease;
}
.nav-link:hover {
border-color: var(--primary);
transform: translateY(-3px);
box-shadow: 0 8px 25px rgba(56, 189, 248, 0.25);
}
.nav-link.active {
background: var(--primary);
color: var(--bg);
}
.capabilities-grid {
display: grid;
grid-template-columns: 1fr 1fr;
gap: 25px;
margin: 40px 0;
}
.cap-card {
background: var(--card);
padding: 25px;
border-radius: 14px;
border: 1px solid #334155;
}
.cap-card h4 {
margin-top: 0;
font-size: 1.2rem;
margin-bottom: 15px;
}
.cap-card.success h4 { color: #4ade80; }
.cap-card.challenge h4 { color: #f472b6; }
.results-container {
margin: 45px 0;
overflow-x: auto;
}
table {
width: 100%;
border-collapse: collapse;
background: var(--card);
border-radius: 14px;
overflow: hidden;
border: 1px solid #334155;
font-size: 1rem;
}
th, td {
padding: 20px;
text-align: center;
border-bottom: 1px solid #334155;
}
th {
background: #1e293b;
color: var(--primary);
text-transform: uppercase;
font-size: 0.9rem;
letter-spacing: 0.12em;
font-weight: 700;
}
.val-bad { color: #f87171; }
.val-good { color: #4ade80; font-weight: 700; }
img {
max-width: 100%;
border-radius: 20px;
margin: 45px auto;
display: block;
box-shadow: 0 25px 60px rgba(0,0,0,0.6);
border: 1px solid #334155;
}
.badge-strip {
display: flex;
justify-content: center;
gap: 12px;
margin-bottom: 45px;
}
.badge {
padding: 5px 15px;
background: #334155;
border-radius: 25px;
font-size: 0.85rem;
color: var(--primary);
}
hr {
border: 0;
height: 1px;
background: linear-gradient(to right, transparent, #334155, transparent);
margin: 65px 0;
}
ul {
list-style: none;
padding-left: 0;
}
li::before {
content: "→";
color: var(--primary);
margin-right: 15px;
font-weight: bold;
}
li {
margin-bottom: 14px;
color: #cbd5e1;
}
footer {
text-align: center;
padding-top: 90px;
color: var(--text-dim);
font-size: 0.95rem;
}
</style>
</head>
<body>
<header>
<h1>AIDK</h1>
<p class="subtitle">The Autonomous Industrial Decision Kernel</p>
<div class="nav-strip">
<a href="https://huggingface.co/spaces/bdurgaprasadreddy/Navigation_env" class="nav-link" target="_blank">HF Space</a>
<a href="https://github.com/Durgaprasad-Developer/AIDK" class="nav-link" target="_blank">GitHub Repo</a>
<a href="https://colab.research.google.com/drive/1vlCSJAViWl9ZVAwPJb-AcgBZ4nSTTphA?usp=sharing" class="nav-link active" target="_blank">Train in Colab</a>
</div>
<div class="badge-strip">
<span class="badge">Multi-Agent RL</span>
<span class="badge">V15 High-Performance</span>
<span class="badge">Industrial Strength</span>
</div>
</header>
<section class="story-section">
<h2>The Vision: From Robots to Decision Kernels</h2>
<p>
My journey into AI started with a simple question: <i>How do warehouse robots actually coordinate when the environment is chaos?</i> I realized that physical hardware is only as good as the kernel that drives it.
</p>
<p>
I set out to build a multi-agent coordinating warehouse environment with real-world constraints—energy management, collision hazards, and dynamic task pools. <b>AIDK</b> is my contribution to the field of robotics and autonomous systems, born from the ambition to see RL master the complexities of modern fulfillment centers.
</p>
<blockquote>
"We didn't just build an environment. We built a stress-test for Reinforcement Learning."
</blockquote>
</section>
<div class="highlight-box">
<h3>🔍 The Complexity of Chaos: Industrial-Grade Discipline</h3>
<p>
Building for a warehouse means building for <b>resilience</b>. In AIDK, every episode triggers a stochastic map generator obstacles, task origins, and delivery goals are randomized. Memorization is impossible; only <b>generalized logic</b> survives.
</p>
<p>
<b>The Reward Discipline:</b> We implemented a reward system that mirrors the harsh reality of industrial automation. It is designed to kill "lazy" or "exploitative" behavior:
</p>
<ul>
<li><b>Step Penalty:</b> Agents are penalized (-0.1) for every movement to discourage wandering and ensure energy efficiency.</li>
<li><b>Anti-Oscillation:</b> Agents are heavily penalized (-1.0) if they loop back and forth between states. They MUST move with purpose.</li>
<li><b>Inactivity Penalty:</b> Choosing to "Stay" or stalling burns energy and incurs a penalty. Idle time is wasted industrial throughput.</li>
<li><b>Collision Zero-Tolerance:</b> A global penalty (-5.0) hits the system for every impact. Coordination is not a luxury; it is the only way to stay profitable.</li>
</ul>
<p>
<b>The Proof:</b> This isn't just theory. Our agents are forced to learn coordination because the "cheap" ways to get reward don't exist. They learn that careful, collaborative movement is the only path to a positive delivery bonus.
</p>
</div>
<section class="story-section">
<h2>The Architecture: Why Tabular Q-Learning?</h2>
<p>
In industrial environments, <b>purity</b> and <b>predictability</b> are everything. A "black-box" neural network can be a liability in a warehouse with human workers nearby.
</p>
<p>
We chose <b>Tabular Q-Learning</b> for the AIDK kernel because:
</p>
<ul>
<li><b>Total Transparency:</b> Every decision can be traced to a specific value in the Q-table. No hallucinations.</li>
<li><b>Sample Efficiency:</b> For structured, discrete industrial states, the agent learns robust patterns much faster.</li>
<li><b>Verification:</b> You can verify that there is NO "reward farming" by auditing the learned weights directly.</li>
</ul>
</section>
<section class="story-section">
<h2>Learning Proofs & The "Negative Reward" Mystery</h2>
<p>
Look closely at the learning curves below. You might notice that even the "Expert" agent operates with a <b>negative reward</b> throughout its journey.
</p>
<img src="https://raw.githubusercontent.com/Durgaprasad-Developer/AIDK/main/assets/reward_curve.png" alt="Reward Progress Curve">
<p>
<b>Why are rewards negative?</b> In AIDK, we follow strict industrial safety. Every second a robot moves, it burns energy (Step Penalty: -0.1). If it stalls or oscillates, it burns more. While a successful delivery gives a large positive reward (+10.0), the <i>cumulative</i> cost of careful, safe navigation in a stochastic world results in a negative sum.
</p>
<p>
The "Expert" is the agent that has learned to <b>minimize this industrial loss</b> while maximizing deliveries. The learning signal isn't about getting "points" it's about learning the <b>most efficient path to task completion</b>.
</p>
<img src="https://raw.githubusercontent.com/Durgaprasad-Developer/AIDK/main/assets/loss_curve.png" alt="Loss Convergence Curve">
<p>
We have verified this kernel across <b>15,000 episodes</b> locally, and our architecture is engineered to scale effectively to over a <b>million episodes</b>, ensuring it never hits a performance ceiling in complex terrains.
</p>
</section>
<section class="story-section">
<h2>The Results: Quantitative Comparison</h2>
<p>
The data confirms the transition from entropic movement to industrial precision. The difference between the baseline and our trained kernel represents the bridge between chaos and automation.
</p>
<div class="results-container">
<table>
<thead>
<tr>
<th>Agent Profile</th>
<th>Avg. Episode Reward</th>
<th>Avg. Deliveries</th>
<th>System Health</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Random Baseline</strong></td>
<td class="val-bad">-426.80</td>
<td class="val-bad">0.10</td>
<td>Erratic</td>
</tr>
<tr>
<td><strong>AIDK Expert (V15)</strong></td>
<td class="val-good">-212.16</td>
<td class="val-good">2.80</td>
<td class="val-good">Optimized</td>
</tr>
</tbody>
</table>
</div>
<p>
Our Expert achieves a <b>2800% increase in deliveries</b> and a massive reduction in cumulative energy wastage (reward improvement by over 214 points).
</p>
</section>
<section class="story-section">
<h2>The Efficiency Frontier: Strengths & Challenges</h2>
<p>
No environment is universal. To build trust in AI, we must be clear about where it excels and where the current frontier lies.
</p>
<div class="capabilities-grid">
<div class="cap-card success">
<h4>Where AIDK Succeeds</h4>
<p>Our kernel is world-class at <b>discrete industrial coordination</b>. It masters energy-aware routing, shared task-pool prioritization, and long-horizon planning where the goal is distant and sparse. It turns "unpredictability" into a training advantage.</p>
</div>
<div class="cap-card challenge">
<h4>Where Challenges Remain</h4>
<p>Currently, the tabular architecture is optimized for <b>coordinating pairs</b>. In scenarios requiring thousands of simultaneous agents in continuous, non-grid spaces, the system would require a transition to Deep RL to manage the "curse of dimensionality."</p>
</div>
</div>
</section>
<section class="story-section">
<h2>Contribution to the Frontier</h2>
<p>
AIDK contributes to the frontier of RL by demonstrating that <b>Multi-Agent Coordination</b> in long-horizon tasks doesn't require massive compute—it requires <b>precise environmental design</b>.
</p>
<p>
By focusing on <b>Reward Hardening</b> and <b>Energy Constraints</b>, I am providing a template for how RL should be applied to real robotics: with safety, transparency, and industrial efficiency at the core.
</p>
</section>
<hr>
<footer style="margin-bottom: 40px;">
<h3>Ready to Train?</h3>
<p>Experience the full learning lifecycle in our dedicated simulation notebook.</p>
<a href="https://colab.research.google.com/drive/1vlCSJAViWl9ZVAwPJb-AcgBZ4nSTTphA?usp=sharing" class="nav-link active" target="_blank">Launch AIDK Training Lab</a>
</footer>
<footer>
<p>RLEnvironment by Durgaprasad | Built for the Future of Autonomous Decision Making</p>
</footer>
</body>
</html>