Latest developments from the SGIWorld ecosystem
Recent developments from the SGIWorld ecosystem
Self-evolving research agent framework released! MarkScientist features a three-agent workflow (Proposer → Solver → Reviewer) with JudgeBuddy system for scenario-aware evaluation. Includes 15 research scenarios and 12 reviewer personas with built-in taste learning.
End-to-end auto-research evaluation benchmark launched! ResearchClawBench measures AI agents' ability to conduct complete research workflows — from literature review and hypothesis generation to experimental execution and paper writing. Now accepting task submissions from all research domains.
SGI-Bench leaderboard now features 30+ models! The Scientific General Intelligence Benchmark continues to evaluate frontier models across multi-disciplinary scientific tasks. New model submissions welcome.
A unified evaluation toolkit and leaderboard for rigorously assessing scientific intelligence of LLMs and VLMs. Features 7 core capability dimensions across 6 scientific disciplines. Now integrated with OpenCompass for standardized evaluation.
SFE dataset now available on Hugging Face! Comprehensive evaluation of frontier scientific knowledge across physics, chemistry, biology, materials science, and more. Designed to probe the cutting edge of AI scientific understanding.
Follow us on GitHub for the latest announcements, releases, and research updates.
Follow InternScience