Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios

Yunkai Dang, Mengxi Gao, Yibo Yan, Xin Zou, Yanggan Gu, Aiwei Liu, Xuming Hu·November 05, 2024

Summary

The study introduces a two-stage pipeline to evaluate Multimodal Large Language Models (MLLMs) response uncertainty under misleading scenarios. It establishes the Multimodal Uncertainty Benchmark (MUB) using explicit and implicit misleading instructions to assess MLLMs' vulnerability across diverse domains. The experiments show that all open-source and close-source MLLMs are highly susceptible to misleading instructions, with an average misleading rate exceeding 86%. Fine-tuning MLLMs with misleading data improves their robustness.

Key findings

20

Tables

5

Advanced features