CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies Paper • 2606.16613 • Published 18 days ago • 9
CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies Paper • 2606.16613 • Published 18 days ago • 9
CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies Paper • 2606.16613 • Published 18 days ago • 9
EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements Paper • 2506.08762 • Published Jun 10, 2025 • 1
llm-jp-modernbert: A ModernBERT Model Trained on a Large-Scale Japanese Corpus with Long Context Length Paper • 2504.15544 • Published Apr 22, 2025 • 1
HakushoBench: A Japanese Chart and Table VQA Benchmark from Governmental White Papers Paper • 2606.01132 • Published May 31 • 6
HakushoBench: A Japanese Chart and Table VQA Benchmark from Governmental White Papers Paper • 2606.01132 • Published May 31 • 6
HakushoBench: A Japanese Chart and Table VQA Benchmark from Governmental White Papers Paper • 2606.01132 • Published May 31 • 6