MLEvolve | #1 on MLE-bench with 65.3% Medal Rate

About MLEvolve

MLEvolve is a self-evolving multi-agent system that automatically solves Kaggle-style ML competitions through Monte Carlo Graph Search (MCGS). It combines progressive search, experience-driven memory, and multi-mode adaptive code generation into a closed-loop optimization framework.

Beyond ML engineering, MLEvolve generalizes to open mathematical optimization problems, matching or surpassing purpose-built optimization frameworks like AlphaEvolve.

Core Innovations

Progressive MCGS Search

Explores multiple branches in parallel with budget-aware explore/exploit switching and cross-branch fusion when progress stalls.

Experience-Driven Memory

Stores plan, code, metrics, and success/failure labels per node. BM25 + FAISS retrieval reinforces proven strategies.

Multi-Mode Code Generation

Dynamically selects Base / Stepwise / Diff modes by task state for efficient iterations from strategy to precise fixes.

Closed-Loop Optimization

Connects code execution, metric feedback, and strategy updates into an automated loop for result-driven decisions.

MLE-bench Results

Performance on the full MLE-bench set (75 tasks). MLEvolve achieves 65.3% medal rate with only a 12-hour runtime budget, ranking #1 among all methods.

Low

80.3 ± 1.5

Medium

64.0 ± 0.9

High

46.7 ± 0.0

All

65.3 ± 0.8

Mathematical Optimization

Beyond ML engineering, MLEvolve generalizes to open mathematical optimization problems. On 15 tasks from the AlphaEvolve benchmark, MLEvolve achieves competitive or superior results against purpose-built optimization frameworks including AlphaEvolve and AlphaEvolve-v2.

Problem	Dir	AlphaEvolve	MLEvolve
Hex packing	↓	3.930092	3.928476
Circle/square	↑	2.635863	2.635983
Circle/rectangle	↑	2.365832	2.365832
Heilbronn convex	↑	0.030937	0.030937
Heilbronn triangles	↑	0.036530	0.036530
Kissing number d11	↑	593	592
Sum-diff 1	↑	1.147989	1.190177
Sum-diff 2	↑	1.158417	1.158546
Uncertainty inequality	↓	0.352099	0.352099
Autocorrelation 1st	↓	1.505294	1.502863
Autocorrelation 3rd (v)	↓	1.468762	1.458770
Autocorrelation 3rd	↓	1.455643	1.454851
Autocorrelation 2nd	↑	0.896280	0.905422
Max-to-min ratios	↓	12.889266	12.889230
Minimum overlap	↓	0.380923	0.380897

↑ higher is better, ↓ lower is better. Green = MLEvolve wins. Gold = AlphaEvolve wins. MLEvolve matches or surpasses AlphaEvolve on 14 of 15 tasks.

Resources

Paper (arXiv)

MLEvolve: A Self-Evolving Framework for Automated ML Algorithm Discovery

HuggingFace Paper

Community discussion and upvotes

GitHub Repository

InternScience/MLEvolve

Citation

@article{du2026mlevolve,
  title={MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery},
  author={Du, Shangheng and Yan, Xiangchao and Shi, Jinxin and Cao, Zongsheng and Feng, Shiyang
          and Liang, Zichen and Sun, Boyuan and Peng, Tianshuo and Zhou, Yifan
          and Li, Xin and Zhou, Jie and He, Liang and Zhang, Bo and Bai, Lei},
  journal={arXiv preprint arXiv:2606.06473},
  year={2026}
}

@article{du2025automlgen,
  title={AutoMLGen: Navigating Fine-Grained Optimization for Coding Agents},
  author={Du, Shangheng and Yan, Xiangchao and Jiang, Dengyang and Yuan, Jiakang
          and Hu, Yusong and Li, Xin and He, Liang and Zhang, Bo and Bai, Lei},
  journal={arXiv preprint arXiv:2510.08511},
  year={2025}
}

@article{feng2026internagent,
  title={InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery},
  author={Shiyang Feng and Runmin Ma and Xiangchao Yan and Yue Fan and Yusong Hu and others},
  journal={arXiv preprint arXiv:2602.08990},
  year={2026}
}

Acknowledgments

The authors would like to thank Yunfeng Zhao and Yazhou Li from The Heart of The Machine (Beijing) Technology Co., Ltd. for their support in the application and promotion of MLEvolve.