MLEvolve
#1 on MLE-bench in 12 Hours, Now Open Source
MLEvolve is an automated machine learning engineering system for Kaggle-style competitions. It combines progressive MCGS search, multi-agent collaboration, and experience-driven memory to form a full loop from planning to coding, validation, and iterative optimization.
🚀 Open Source: InternScience / MLEvolveMain Idea
In long-horizon automation tasks, a system should not stop at writing one solution. It needs to continuously search, validate, and refine. MLEvolve turns Plan → Build → Evaluate → Evolve into a repeatable optimization loop so agents can approach better solutions under limited budgets.
Core Innovations
- 🌲 Progressive MCGS Search: Instead of following a single trial-and-error path, MLEvolve explores multiple candidate branches in parallel on a graph. When progress stalls, it performs cross-branch fusion by recombining useful strategies from top nodes. With budget-aware explore/exploit switching, search moves smoothly from broad exploration to focused refinement, improving convergence speed and robustness.
- 🧠 Experience-Driven Global Memory: Every attempt is stored as a retrievable quadruple of plan, code, metrics, and success/failure tags. Future nodes can reuse proven patterns and avoid known failure routes, reducing repeated mistakes. As memory grows during runtime, the system becomes increasingly task-aware and self-improving.
- 🛠️ Multi-Mode Adaptive Planning: MLEvolve follows a plan-code decoupled workflow and dynamically selects Base / Stepwise / Diff modes by task state. Base quickly builds full solutions, Stepwise decomposes long reasoning chains, and Diff performs targeted incremental patches. These modes can be chained for efficient iterations from strategy to precise fixes.
- 🔁 Closed-Loop Validation and Optimization: MLEvolve connects proposal generation, code execution, metric feedback, and strategy updates into an automated loop. Each round of feedback directly adjusts search and planning priorities for the next round, turning the system from simple code generation into result-driven decision making.
Results and Impact
MLEvolve ranks first on MLE-bench with an Any Medal rate of 61.33% using only a 12-hour runtime budget. It also serves as a key optimization engine in InternAgent 1.5 for longer-horizon scientific discovery workflows.