4Open Source·1d ago

Reported Confidence in LLMs Tracks Commitment More Than Correctness

A new study suggests that large language models express confidence based on their internal commitment to a specific output rather than an objective calculation of accuracy. This finding indicates that verbal confidence scores may be unreliable indicators of whether a model's response is actually correct.

Covered by 1 source

AarXiv CS.AI↗Dharshan Kumaran1d ago

Reported Confidence in LLMs Tracks Commitment More Than Correctness

Covered by 1 source

Related stories