← Back to Model Beat
4Research·5h ago

IsoSci: A Benchmark of Isomorphic Cross-Domain Science Problems for Evaluating Reasoning versus Knowledge Retrieval in LLMs

Researchers have introduced IsoSci, a new benchmark that tests large language models using pairs of science problems with identical logical structures but different subject matter. By decoupling reasoning capabilities from domain-specific knowledge, the tool aims to provide a clearer measurement of whether a model is genuinely solving problems or simply recalling information from its training data.

Covered by 1 source

Related stories

ResearchOn Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMsJun 29 · 13 sourcesResearchAnti-Causal Domain Generalization: Leveraging Unlabeled DataJul 1 · 2 sourcesResearchLearning Unmasking Policies for Diffusion Language ModelsJun 29 · 6 sourcesResearchRedKnot: Efficient Long-Context LLM Serving with Head-Aware KV Reuse and SegPagedAttentionJun 29