Lijun Wu's picture

Lijun Wu

apeters

·

https://apeterswu.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a collection 2 days ago

upvoted a paper 14 days ago

Unlocking Data Value in Finance: A Study on Distillation and Difficulty-Aware Training

submitted a paper 14 days ago

Unlocking Data Value in Finance: A Study on Distillation and Difficulty-Aware Training

View all activity

Organizations

upvoted a collection 2 days ago

ODA-Scored

ODA-Scored-Data by implemented multiple data scores. • 2 items • Updated 2 days ago • 3

upvoted a paper 14 days ago

Unlocking Data Value in Finance: A Study on Distillation and Difficulty-Aware Training

Paper • 2603.07223 • Published 17 days ago • 13

upvoted a paper 29 days ago

GLM-5: from Vibe Coding to Agentic Engineering

Paper • 2602.15763 • Published Feb 17 • 117

upvoted a collection about 2 months ago

MMFineReason

High-quality STEM reasoning dataset for Multimodal LLM post-training. • 8 items • Updated 22 days ago • 22

upvoted 3 papers about 2 months ago

MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods

Paper • 2601.21821 • Published Jan 29 • 61

Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility

Paper • 2601.17027 • Published Jan 17 • 42

ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch

Paper • 2601.13606 • Published Jan 20 • 11

upvoted 3 collections 2 months ago

BioT5

BioT5 and BioT5+ collections • 18 items • Updated Oct 23, 2025 • 3

ODA-Mixture

High-quality mixture datasets for post-training covering multiple domains. • 7 items • Updated Jan 17 • 5

ODA-Math

High-quality mathematical datasets for post training. • 5 items • Updated Jan 17 • 1

upvoted a paper 2 months ago

Closing the Data Loop: Using OpenDataArena to Engineer Superior Training Datasets

Paper • 2601.09733 • Published Dec 30, 2025 • 9

upvoted 2 papers 3 months ago

Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

Paper • 2512.17260 • Published Dec 19, 2025 • 52

OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value

Paper • 2512.14051 • Published Dec 16, 2025 • 47

upvoted a paper 4 months ago

Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights

Paper • 2512.01816 • Published Dec 1, 2025 • 94

upvoted 4 papers 6 months ago

Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning

Paper • 2510.04081 • Published Oct 5, 2025 • 23

ScaleDiff: Scaling Difficult Problems for Advanced Mathematical Reasoning

Paper • 2509.21070 • Published Sep 25, 2025 • 9

Sequential Diffusion Language Models

Paper • 2509.24007 • Published Sep 28, 2025 • 46

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26, 2025 • 153

upvoted 2 papers 8 months ago

Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning

Paper • 2507.17512 • Published Jul 23, 2025 • 37

REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once

Paper • 2507.10541 • Published Jul 14, 2025 • 30