ResearchClawBench

Evaluating AI Agents for Automated Research from Re-Discovery to New-Discovery

Frontier

Best score per task across all agents. 50 = matches original paper, 100 = surpasses it.

Leaderboard

No scored runs yet.