4Research·5h ago
IsoSci: A Benchmark of Isomorphic Cross-Domain Science Problems for Evaluating Reasoning versus Knowledge Retrieval in LLMs
Researchers have introduced IsoSci, a new benchmark that tests large language models using pairs of science problems with identical logical structures but different subject matter. By decoupling reasoning capabilities from domain-specific knowledge, the tool aims to provide a clearer measurement of whether a model is genuinely solving problems or simply recalling information from its training data.
Covered by 1 source
- AarXiv CS.AI↗Samir Abdaljalil, Erchin Serpedin, Hasan Kurban5h ago