CEM888.AI MemoryAgentBench Results

99.9% AR Β· 77.2% BEAM β€” Filesystem-native memory agent on MemoryAgentBench (ICLR 2026).

Scores

Benchmark CEM888 (Vetta) Best Published
AR Retrieval 99.9% 71.5% (Hindsight)
BEAM Memory 77.2% 64.1% (Hindsight honest)
  • AR: 2,000 retrieval questions β€” 2 misses out of 2,000
  • BEAM: 200 multi-category memory questions

Architecture

  • Model: DeepSeek V4 Pro
  • Retrieval: Filesystem-first, deterministic search β€” no RAG, no embeddings, no vector DB
  • Memory: Agent-native sovereign vault β€” the filesystem is ground truth
  • Deployment: Fully local. No cloud. No data leakage.

Contents

  • AR-Results-99.9pct.md β€” Full AR breakdown with all categories
  • Vetta-BEAM-Honest-77.2pct.md β€” BEAM methodology and per-category scores
  • vetta_beam_v9_results.jsonl β€” All 200 BEAM questions with scores
  • vetta_live_results.jsonl β€” All 2,000 AR questions with scores

Links


Building this solo. Looking for sponsors and collaborators.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for CEM888AI/cem888-benchmarks