Academic foundations and open-source projects of the SGIWorld evaluation ecosystem
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows. 1000+ expert-curated tasks based on Science's 125 Big Questions.
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning. 830 expert-verified VQA pairs.
An Open-source Evaluation Toolkit for Scientific General Intelligence. Unified benchmarking across 6 scientific domains.
End-to-End Auto-Research Benchmark. Evaluating AI agents for automated research from Re-Discovery to New-Discovery with interactive task browser and leaderboard.
Dynamic AI Research Knowledge Graph. Interactive visualization and cross-benchmark analysis for tracking AI capabilities across scientific domains.