·
AI & ML interests
None yet
Recent Activity
posted an
update
about 5 hours ago [Methodology] Establishing a "Signal-to-Noise" Standard for Long-Context Windows (Upper-Bounded by DeepSeek 1M)
Overview
Following our empirical stress test of DeepSeek's 1M context model, this post introduces a quantitative framework to evaluate data quality within ultra-long windows. While length is the new frontier, effective information density remains the bottleneck.
Using DeepSeek's 1M context as the experimental upper bound, we propose a standard to measure and optimize the Signal-to-Noise Ratio (SNR) in long-context tasks.
Key Findings:
Structural Noise Quantification: Empirical analysis reveals that raw long-context inputs contain 25%–65% structural noise (redundancy, irrelevant details), which dilutes reasoning efficiency without adding cognitive value.
Three-Stage Purification Framework: We developed an L1-L2-L3 convergence pipeline to systematically filter noise:
L1 (Coarse Pruning): Statistical redundancy removal.
L2 (Structural Extraction): Logic graph skeletonization.
L3 (Semantic Refinement): High-fidelity information retention.
Effective Cognitive Baseline: Applying this framework establishes a quantifiable baseline for "usable context," demonstrating that purity > length for complex reasoning tasks.
Evidence
The attached chart illustrates the "Three-Stage Convergence" curve, showing the sharp decline in noise ratio and the corresponding rise in task accuracy across L1, L2, and L3 stages.
Resources
Full methodology reports (EN/CN PDFs), the purification codebase, and processed datasets are open-sourced at:
🔗 Project Page: https://tpwang-lab.github.io
Welcome community feedback on the SNR metrics and reproduction attempts!
Tags: #DeepSeek #LongContext #DataQuality #SignalToNoise #LLM #Benchmark #AIResearch replied to their post 2 days ago [Empirical Study] DeepSeek's New 1M Context Model: Full-Window Stress Test & Cognitive Emergence
Overview
This post shares an empirical study on DeepSeek's new long-context model (released Feb 2026, web/mobile version), which extends the context window to 1,000,000 tokens.
We conducted a full-window stress test, pushing the limit to ~1.53M tokens, and analyzed the model's behavior across three key dimensions:
Key Findings:
Interaction Token Budget: A complete project lifecycle consumes 1.2M–1.6M tokens, varying by input format and internal sparse attention mechanisms.
Long-Range Recall & Synthesis: The model demonstrates high-fidelity memory across the entire context, capable of retrieving initial instructions and synthesizing comprehensive reports without external RAG.
Emergence of Collaborative Cognition: Beyond a certain threshold, the model shifts from a "Q&A Engine" to a "Cognitive Partner", adopting user reasoning styles and maintaining global coherence—a capability absent in standard 128k windows.
Evidence
The test reached the hard limit at 1,536,000 tokens (see attached screenshot: "Conversation length limit reached").
Resources
Full reports (EN/CN PDFs), source code, and detailed data analysis are open-sourced at:
🔗 Project Page: https://tpwang-lab.github.io
🔗 GitHub Repo: https://github.com/tpwang-lab/deepseek-million-token
Welcome feedback and reproduction attempts from the community!
Tags: #DeepSeek #LLM #LongContext #EmpiricalStudy #AI
posted an
update
2 days ago [Empirical Study] DeepSeek's New 1M Context Model: Full-Window Stress Test & Cognitive Emergence
Overview
This post shares an empirical study on DeepSeek's new long-context model (released Feb 2026, web/mobile version), which extends the context window to 1,000,000 tokens.
We conducted a full-window stress test, pushing the limit to ~1.53M tokens, and analyzed the model's behavior across three key dimensions:
Key Findings:
Interaction Token Budget: A complete project lifecycle consumes 1.2M–1.6M tokens, varying by input format and internal sparse attention mechanisms.
Long-Range Recall & Synthesis: The model demonstrates high-fidelity memory across the entire context, capable of retrieving initial instructions and synthesizing comprehensive reports without external RAG.
Emergence of Collaborative Cognition: Beyond a certain threshold, the model shifts from a "Q&A Engine" to a "Cognitive Partner", adopting user reasoning styles and maintaining global coherence—a capability absent in standard 128k windows.
Evidence
The test reached the hard limit at 1,536,000 tokens (see attached screenshot: "Conversation length limit reached").
Resources
Full reports (EN/CN PDFs), source code, and detailed data analysis are open-sourced at:
🔗 Project Page: https://tpwang-lab.github.io
🔗 GitHub Repo: https://github.com/tpwang-lab/deepseek-million-token
Welcome feedback and reproduction attempts from the community!
Tags: #DeepSeek #LLM #LongContext #EmpiricalStudy #AI
View all activity
Organizations
-
-
-
-
-
-
-
-
-
-
-
view article The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+