LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec

Yiwei Guo, Zhihan Li, Chenpeng Du, Hankun Wang, Xie Chen, Kai Yu·October 21, 2024

Summary

LSCodec, a low-bitrate speech codec, uses a three-stage unsupervised training framework with speaker perturbation. It creates a continuous information bottleneck, vector quantization for discrete speaker-decoupled space, and a discrete token vocoder for acoustic refinement. Demonstrates superior intelligibility, audio quality, and speaker disentanglement with a small vocabulary size and lowest bitrate (0.25kbps). Evaluated on LibriTTS, outperforms baselines in WER, GPE, and MOS, showing high-quality reconstruction at low bitrates.

Key findings

1

Tables

2

Advanced features