AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction

Zhen Xing, Qi Dai, Zejia Weng, Zuxuan Wu, Yu-Gang Jiang·June 10, 2024

Summary

The paper presents AID, a method for adapting image-to-video models to generate instruction-guided videos. AID addresses limitations of previous techniques by transferring video priors, incorporating an MLLM with a DQFormer for instruction control, and using adapters for efficient adaptation. The model significantly outperforms state-of-the-art on four datasets, demonstrating its effectiveness in tasks like robotic manipulation and VR applications, with improved video prediction and controllability.

Key findings

11

Tables

2

I. Introduction

A. Background on image-to-video generation

B. Limitations of previous methods

C. Purpose of AID: addressing challenges and improving controllability

II. Methodology

A. Video Prior Transfer

1.1. Extracting video style and structure

1.2. Incorporating priors into the adaptation process

B. MLLM with DQFormer for Instruction Control

2.1. Multi-Modal Language Learning Model (MLLM)

2.2. DQFormer: Dynamic Query Generation for Instruction Understanding

2.3. Integration of MLLM and DQFormer for precise control

C. Adapter-based Adaptation

3.1. Adapters for efficient fine-tuning

3.2. Adapter design and training

3.3. Adapting to new tasks and domains

III. Experiments and Results

A. Datasets and Evaluation Metrics

B. Comparison with State-of-the-Art

Robotic Manipulation

Virtual Reality Applications

C. Improved Video Prediction and Controllability

Quantitative Analysis

Qualitative Examples

IV. Applications and Implications

A. Real-world use cases

B. Advantages over previous methods

C. Future directions and potential improvements

V. Conclusion

A. Summary of AID's achievements

B. Significance for the field of video generation

C. Open questions and future research directions

Basic info

papers

computer vision and pattern recognition

computation and language

multimedia

machine learning

artificial intelligence

Advanced features