← Back to Model Beat
3Hardware·Apr 16

Calibrated Speculative Decoding: Frequency-Guided Candidate Selection for Efficient Inference

arXiv:2604.13634v1 Announce Type: new Abstract: Speculative decoding accelerates autoregressive generation by letting draft tokens bypass full verification, but conventional frameworks suffer from frequent false rejections, particularly when draft models produce semantically correct but lexically divergent outputs. In this paper, we present Calibrated Speculative Decoding (CSD), a training-free framework that recovers valid tokens discarded by standard verification. Guided by the principle of "Frequency-Guided Candidate Selection and Probability-Guarded Acceptance," CSD incorporates two lightweight modules: Online Correction Memory, which aggregates historical rejections to propose recurring divergence patterns as rescue candidates, and Semantic Consistency Gating, which verifies candidate admissibility using probability ratios instead of exact token matching. Our evaluation across diverse large language models demonstrates that CSD outperforms existing methods, achieving a peak throughput speedup of 2.33x. CSD preserves model accuracy across all tasks while further boosting performance on complex reasoning datasets. These results establish CSD as a highly effective, lightweight solution for practical…

Covered by 3 sources

  • AarXiv CS.AIXuwen Zhou, Fangxin Liu, Chao Wang, Xiao Zheng, Hao Zheng, Min He, Li Jiang, Haibing GuanApr 16
  • AarXiv CS.AIZihong Zhang, Zuchao Li, Lefei Zhang, Ping Wang, Hai ZhaoApr 17
  • AarXiv CS.AIKiran Purohit, Ramasuri Narayanam, Soumyabrata PalApr 17

Related stories

HardwareGoogle Gemma 4 Runs Natively on iPhone with Full Offline AI InferenceApr 13 · 2 sourcesHardwareOpenAI Takes on Google With New AI Model Aimed at Drug DiscoveryApr 16HardwareNvidia Alum Rides China’s Robotics Wave to 187% Debut PopApr 16HardwareStellantis Inks Artificial Intelligence Deal With MicrosoftApr 16