RBA-FE: A Robust Brain-Inspired Audio Feature Extractor for Depression Diagnosis

Yu-Xuan Wu, Ziyan Huang, Bin Hu, Zhi-Hong Guan·June 08, 2025

Summary

RBA-FE, a brain-inspired audio feature extractor, excels in depression diagnosis, outperforming existing models on the MODMA dataset. It uses an improved hierarchical network with six acoustic features, an adaptive rate smooth leaky integrate-and-fire neuron model for robustness, and achieves high accuracy. On the DAIC-WOZ dataset, RBA-FE surpasses state-of-the-art models, demonstrating strong generalization. Our AI model scores 0.89 AUC, outperforming competitors with an F1 of 68.08%. It features a RBA-FE model incorporating temporal convolution, multi-head attention, and bidirectional LSTM. The ARSLIF neuron model boosts robustness in noisy conditions. Future work focuses on elucidating the model's operational principles.

Introduction
Background
Overview of depression diagnosis challenges
Importance of accurate and efficient diagnostic tools
Introduction to brain-inspired computing and its applications in AI
Objective
To present RBA-FE, a novel brain-inspired audio feature extractor
Highlight its performance in depression diagnosis
Discuss its superiority over existing models on MODMA and DAIC-WOZ datasets
Method
Data Collection
Description of the MODMA and DAIC-WOZ datasets
Process of collecting audio data for depression diagnosis
Data Preprocessing
Techniques used for cleaning and preparing the audio data
Model Architecture
Overview of the RBA-FE model
Components: improved hierarchical network, six acoustic features, ARSLIF neuron model
Detailed explanation of the adaptive rate smooth leaky integrate-and-fire neuron model
Training and Evaluation
Training process of the RBA-FE model
Evaluation metrics: AUC, F1 score
Comparison with state-of-the-art models on DAIC-WOZ dataset
Results
Performance on MODMA Dataset
Detailed results and analysis
Performance on DAIC-WOZ Dataset
Comparative analysis with state-of-the-art models
Discussion on the model's generalization capabilities
Model Components
Temporal Convolution
Function and implementation in RBA-FE
Multi-Head Attention
Explanation and role in enhancing model performance
Bidirectional LSTM
Description and integration into the RBA-FE architecture
Robustness and Generalization
ARSLIF Neuron Model
Description and how it contributes to the model's robustness
Handling Noisy Conditions
Discussion on the model's performance in noisy environments
Future Work
Model Interpretation
Plans to understand the operational principles of RBA-FE
Enhancements and Applications
Potential improvements and future research directions
Integration with Other Technologies
Exploration of combining RBA-FE with other AI techniques or hardware
Conclusion
Summary of Contributions
Implications for Depression Diagnosis
Call for Further Research
Basic info
papers
sound
audio and speech processing
artificial intelligence
Advanced features
Insights
What are the key components and architecture of the RBA-FE model used for depression diagnosis?
What future work is planned to better understand the operational principles of the RBA-FE model?
What are the six acoustic features used in the improved hierarchical network of the RBA-FE model?
How does the Adaptive Rate Smooth Leaky Integrate-and-Fire (ARSLIF) neuron model enhance the robustness of the RBA-FE model, especially in noisy conditions?