XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework

Sida Tian, Can Zhang, Wei Yuan, Wei Tan, Wenjie Zhu·January 15, 2025

Summary

XMusic is a framework for symbolic music generation that supports flexible prompts to create emotionally controllable and high-quality music. It comprises XProjector and XComposer, with XProjector parsing prompts into symbolic elements and XComposer generating music using a Generator and Selector. The Selector evaluates music quality through multi-task learning. XMusic uses the XMIDI dataset, containing 108,023 MIDI files annotated with emotions and genres. This framework significantly outperforms current methods in music quality, as evidenced by its recognition as one of the nine highlights at WAIC 2023.

Key findings

6

Introduction
Background
Overview of symbolic music generation
Importance of emotional control in music creation
Current challenges in music generation frameworks
Objective
To present XMusic, a novel framework for symbolic music generation
Highlight its capabilities in creating emotionally controllable and high-quality music
Method
XProjector
Functionality of XProjector in parsing prompts
How it converts prompts into symbolic elements
XComposer
Overview of XComposer's role in music generation
Integration of a Generator and Selector within XComposer
Explanation of the Selector's multi-task learning approach for music quality evaluation
Data Utilization
Description of the XMIDI dataset
How XMusic leverages the XMIDI dataset for training and evaluation
Results
Performance Metrics
Key performance indicators for XMusic
Comparison with current music generation methods
Recognition and Awards
XMusic's recognition at WAIC 2023
Highlighting its achievements and impact in the field
Conclusion
Future Directions
Potential areas for further research and development
Expected advancements in XMusic's capabilities
Basic info
papers
sound
audio and speech processing
artificial intelligence
Advanced features
Insights
How does XMusic evaluate the quality of generated music?
What is XMusic and what does it support in terms of symbolic music generation?
What are the two main components of XMusic and what are their functions?
What dataset does XMusic use for training and what does it contain?