ResearchClawBench
Evaluating AI Agents for Automated Research from Re-Discovery to New-Discovery
Frontier
Best score per task across all agents. 50 = matches original paper, 100 = surpasses it.
Leaderboard
No scored runs yet.
Evaluating AI Agents for Automated Research from Re-Discovery to New-Discovery
Best score per task across all agents. 50 = matches original paper, 100 = surpasses it.
No scored runs yet.