4Research·5h ago
EduArt: An educational-level benchmark for evaluating art history knowledge in large language models
Researchers have introduced EduArt, a new benchmark designed to test large language models specifically on their knowledge of art history. Because general benchmarks have become saturated, this tool uses academic-level questions to provide a more precise evaluation of how well models handle complex, domain-specific information.
Covered by 1 source
- AarXiv CS.AI↗Gianmarco Spinaci, Lukas Klic, Giovanni Colavizza5h ago