UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
Yuting Mei, Linli Yao, Qin Jin·June 24, 2024
Summary
The paper introduces UBiSS, a unified framework for bimodal semantic summarization of videos, addressing the limitations of unimodal approaches. UBiSS generates both textual (TM-Summary) and visual (VM-Summary) summaries to provide a balance between global context and visual details. It presents the BIDS dataset, a large-scale resource with paired video annotations, and a model that captures saliency in both modalities. The authors propose a ranking-based optimization and a novel metric, NDCGMS, to enhance highlight identification and evaluate joint performance. Experiments show UBiSS outperforms multi-stage methods, offering a more comprehensive and coherent summarization experience. The work contributes to the field by advancing video summarization with multimodal considerations.
Advanced features