Abstract: Zero-shot text-to-speech (TTS) has recently achieved remarkable performance by leveraging a speech prompt instead of a speaker embedding, as it provides richer information. However, ...
Abstract: Achieving high fidelity and speaker similarity in text-to-speech speaker adaptation with limited amount of data is a challenging task. Most existing methods only consider adapting to the ...