The SVASR System for Text-dependent Speaker Verification (TdSV) AAIC Challenge 2024
Mohammadreza Molavi, Reza Khodadadi·November 25, 2024
Summary
The paper introduces an efficient pipeline for text-dependent speaker verification (TdSV) using a Fast-Conformer-based ASR module. It proposes a feature fusion approach combining speaker embeddings from wav2vec-BERT and ReDimNet models. The system achieves competitive results on the TDSV 2024 Challenge test set, with a normalized min-DCF of 0.0452 (rank 2). The text focuses on text-dependent speaker verification, requiring both speaker identity and spoken phrase match. The system overview discusses using a dual-head strategy and a single speech recognition model for improved results.
Introduction
Background
Overview of text-dependent speaker verification (TdSV)
Importance of TdSV in various applications
Objective
Aim of the research: developing an efficient TdSV pipeline
Highlighting the use of a Fast-Conformer-based ASR module
Method
Data Collection
Sources of data for TdSV
Characteristics of the collected data
Data Preprocessing
Techniques for preparing the data for the pipeline
Importance of data quality in TdSV
Feature Fusion
Description of the feature fusion approach
Integration of speaker embeddings from wav2vec-BERT and ReDimNet models
Dual-Head Strategy
Explanation of the dual-head approach
Benefits of using a single speech recognition model for improved performance
System Overview
Fast-Conformer-based ASR Module
Description of the Fast-Conformer architecture
Role in the TdSV pipeline
Speaker Embeddings
Overview of wav2vec-BERT and ReDimNet models
How speaker embeddings contribute to the verification process
Normalized Min-DCF
Explanation of the metric used for evaluating the system's performance
Importance in the context of TdSV
Results
TDSV 2024 Challenge
Participation and ranking of the proposed system
Achieved normalized min-DCF score (0.0452, rank 2)
Conclusion
Summary of Contributions
Recap of the system's innovative aspects
Future Work
Potential areas for further research and development
Impact
Discussion on the broader implications of the research
Basic info
papers
sound
audio and speech processing
artificial intelligence
Advanced features