Taiping Wang PRO

tpwang199655

AI & ML interests

None yet

Recent Activity

posted an update about 1 hour ago

[Methodology] Establishing a "Signal-to-Noise" Standard for Long-Context Windows (Upper-Bounded by DeepSeek 1M) Overview Following our empirical stress test of DeepSeek's 1M context model, this post introduces a quantitative framework to evaluate data quality within ultra-long windows. While length is the new frontier, effective information density remains the bottleneck. Using DeepSeek's 1M context as the experimental upper bound, we propose a standard to measure and optimize the Signal-to-Noise Ratio (SNR) in long-context tasks. Key Findings: Structural Noise Quantification: Empirical analysis reveals that raw long-context inputs contain 25%–65% structural noise (redundancy, irrelevant details), which dilutes reasoning efficiency without adding cognitive value. Three-Stage Purification Framework: We developed an L1-L2-L3 convergence pipeline to systematically filter noise: L1 (Coarse Pruning): Statistical redundancy removal. L2 (Structural Extraction): Logic graph skeletonization. L3 (Semantic Refinement): High-fidelity information retention. Effective Cognitive Baseline: Applying this framework establishes a quantifiable baseline for "usable context," demonstrating that purity > length for complex reasoning tasks. Evidence The attached chart illustrates the "Three-Stage Convergence" curve, showing the sharp decline in noise ratio and the corresponding rise in task accuracy across L1, L2, and L3 stages. Resources Full methodology reports (EN/CN PDFs), the purification codebase, and processed datasets are open-sourced at: 🔗 Project Page: https://tpwang-lab.github.io Welcome community feedback on the SNR metrics and reproduction attempts! Tags: #DeepSeek #LongContext #DataQuality #SignalToNoise #LLM #Benchmark #AIResearch

replied to their post 2 days ago

[Empirical Study] DeepSeek's New 1M Context Model: Full-Window Stress Test & Cognitive Emergence Overview This post shares an empirical study on DeepSeek's new long-context model (released Feb 2026, web/mobile version), which extends the context window to 1,000,000 tokens. We conducted a full-window stress test, pushing the limit to ~1.53M tokens, and analyzed the model's behavior across three key dimensions: Key Findings: Interaction Token Budget: A complete project lifecycle consumes 1.2M–1.6M tokens, varying by input format and internal sparse attention mechanisms. Long-Range Recall & Synthesis: The model demonstrates high-fidelity memory across the entire context, capable of retrieving initial instructions and synthesizing comprehensive reports without external RAG. Emergence of Collaborative Cognition: Beyond a certain threshold, the model shifts from a "Q&A Engine" to a "Cognitive Partner", adopting user reasoning styles and maintaining global coherence—a capability absent in standard 128k windows. Evidence The test reached the hard limit at 1,536,000 tokens (see attached screenshot: "Conversation length limit reached"). Resources Full reports (EN/CN PDFs), source code, and detailed data analysis are open-sourced at: 🔗 Project Page: https://tpwang-lab.github.io 🔗 GitHub Repo: https://github.com/tpwang-lab/deepseek-million-token Welcome feedback and reproduction attempts from the community! Tags: #DeepSeek #LLM #LongContext #EmpiricalStudy #AI

posted an update 2 days ago

View all activity

Organizations

Posts 3

Post

[Methodology] Establishing a "Signal-to-Noise" Standard for Long-Context Windows (Upper-Bounded by DeepSeek 1M)

Overview

Following our empirical stress test of DeepSeek's 1M context model, this post introduces a quantitative framework to evaluate data quality within ultra-long windows. While length is the new frontier, effective information density remains the bottleneck.
Using DeepSeek's 1M context as the experimental upper bound, we propose a standard to measure and optimize the Signal-to-Noise Ratio (SNR) in long-context tasks.
Key Findings:
Structural Noise Quantification: Empirical analysis reveals that raw long-context inputs contain 25%–65% structural noise (redundancy, irrelevant details), which dilutes reasoning efficiency without adding cognitive value.

Three-Stage Purification Framework: We developed an L1-L2-L3 convergence pipeline to systematically filter noise:
L1 (Coarse Pruning): Statistical redundancy removal.
L2 (Structural Extraction): Logic graph skeletonization.
L3 (Semantic Refinement): High-fidelity information retention.
Effective Cognitive Baseline: Applying this framework establishes a quantifiable baseline for "usable context," demonstrating that purity > length for complex reasoning tasks.
Evidence

The attached chart illustrates the "Three-Stage Convergence" curve, showing the sharp decline in noise ratio and the corresponding rise in task accuracy across L1, L2, and L3 stages.

Resources
Full methodology reports (EN/CN PDFs), the purification codebase, and processed datasets are open-sourced at:
🔗 Project Page: https://tpwang-lab.github.io

Welcome community feedback on the SNR metrics and reproduction attempts!
Tags: #DeepSeek #LongContext #DataQuality #SignalToNoise #LLM #Benchmark #AIResearch

Post

174

[Empirical Study] DeepSeek's New 1M Context Model: Full-Window Stress Test & Cognitive Emergence
Overview
This post shares an empirical study on DeepSeek's new long-context model (released Feb 2026, web/mobile version), which extends the context window to 1,000,000 tokens.
We conducted a full-window stress test, pushing the limit to ~1.53M tokens, and analyzed the model's behavior across three key dimensions:
Key Findings:
Interaction Token Budget: A complete project lifecycle consumes 1.2M–1.6M tokens, varying by input format and internal sparse attention mechanisms.
Long-Range Recall & Synthesis: The model demonstrates high-fidelity memory across the entire context, capable of retrieving initial instructions and synthesizing comprehensive reports without external RAG.
Emergence of Collaborative Cognition: Beyond a certain threshold, the model shifts from a "Q&A Engine" to a "Cognitive Partner", adopting user reasoning styles and maintaining global coherence—a capability absent in standard 128k windows.
Evidence
The test reached the hard limit at 1,536,000 tokens (see attached screenshot: "Conversation length limit reached").

Resources
Full reports (EN/CN PDFs), source code, and detailed data analysis are open-sourced at:
🔗 Project Page: https://tpwang-lab.github.io
🔗 GitHub Repo: https://github.com/tpwang-lab/deepseek-million-token
Welcome feedback and reproduction attempts from the community!
Tags: #DeepSeek #LLM #LongContext #EmpiricalStudy #AI

View all Posts