← Back to Model Beat
7Research·Mar 31

ProText: A Benchmark Dataset for Measuring (Mis)gendering in Long-Form Texts

We introduce ProText, a dataset for measuring gendering and misgendering in stylistically diverse long-form English texts. ProText spans three dimensions: Theme nouns (names, occupations, titles, kinship terms), Theme category (stereotypically male, stereotypically female, gender-neutral/non-gendered), and Pronoun category (masculine, feminine, gender-neutral, none). The dataset is designed to probe (mis)gendering in text transformations such as summarization and rewrites using state-of-the-art Large Language Models, extending beyond traditional pronoun resolution benchmarks and beyond the…

Covered by 1 source

Related stories

ResearchAI as scientist? Machine-written papers clear academic reviews, raise questions - MSNApr 2ResearchSakana AI's AI Scientist Clears Academic Conference Review - 조선일보Apr 2ResearchNew AI scientist conducts its own research - UBC ScienceMar 27ResearchAustralian government and Anthropic sign MOU for AI safety and research - AnthropicMar 31