4Open Source·1d ago
Reported Confidence in LLMs Tracks Commitment More Than Correctness
A new study suggests that large language models express confidence based on their internal commitment to a specific output rather than an objective calculation of accuracy. This finding indicates that verbal confidence scores may be unreliable indicators of whether a model's response is actually correct.
Covered by 1 source
- AarXiv CS.AI↗Dharshan Kumaran1d ago