4Research·5h ago
SPARCLE: SPeaker-aware Aligned Representations via Contrastive Language Embeddings
Researchers have introduced SPARCLE, a new method for speech synthesis that aligns text and audio representations without relying on traditional phoneme conversion. By using contrastive learning to map graphemes directly to acoustic features, the system improves the naturalness of synthesized speech while reducing the errors typically introduced by intermediary linguistic processing.
Covered by 1 source
- AarXiv CS.AI↗Priyam Mazumdar, Yurii Halychanskyi, Steven Guo, Mark Hasegawa-Johnson, Volodymyr Kindratenko5h ago