Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
Shichengf 
posted an update 3 days ago
Post
57
We are happy to share our new survey:

Scaling LLM Agent Learning with Data Synthesis: A Comprehensive Survey

LLM agents are moving from passive chatbots to interactive systems that use memory, tools, planning, and external environments. Scaling agent learning requires more than input-output pairs: agents need synthetic tasks, trajectories, feedback signals, and environments that can support long-horizon interaction.

In this survey, we organize data synthesis for LLM agent learning around four core artifacts:

Task-level synthesis
Trajectory-level synthesis
Feedback-level synthesis
Environment-level synthesis

We also review quality control, learning frameworks, synthetic evaluation data, and applications across software engineering, agentic search, AI for science, social good, and AI safety/security.

Links:

ResearchGate: [https://www.researchgate.net/publication/406488336_Scaling_LLM_Agent_Learning_with_Data_Synthesis_A_Comprehensive_Survey](https://www.researchgate.net/publication/406488336_Scaling_LLM_Agent_Learning_with_Data_Synthesis_A_Comprehensive_Survey)

OpenReview: [https://openreview.net/forum?id=pQYwkpYmLy](https://openreview.net/forum?id=pQYwkpYmLy)

arXiv: under moderation, link coming soon

#LLMAgents #DataSynthesis #AgentLearning #LLM #Survey
In this post