TinyAlign: Boosting Lightweight Vision-Language Models by Mitigating Modal Alignment Bottlenecks

Yuanze Hu, Zhaoxin Fan, Xinyu Wang, Gen Li, Ye Qiu, Zhichao Yang, Wenjun Wu, Kejian Wu, Yifan Sun, Xiaotie Deng, Jin Dong·May 19, 2025

Summary

TinyAlign optimizes lightweight VLMs, improving mutual information with less data. It features a memory bank for enriched context, excelling in resource-constrained applications. Experiments show superior performance in convergence, alignment, and task execution compared to conventional methods. Six papers focus on multimodal and language models, including MM-VET, TINYGPT-V, MMMU, SIGMOID LOSS, TINYLLAMA, and MINIGPT-4. Kun Zhu et al.'s paper introduces an information bottleneck for noise filtering in retrieval-augmented generation tasks. The discussion covers guidelines for asset documentation, human subject research, and NeurIPS ethics.

Advanced features