LaViC: Adapting Large Vision-Language Models to Visually-Aware Conversational Recommendation

Hyunsik Jeon, Satoshi Koide, Yu Wang, Zhankui He, Julian McAuley·March 30, 2025

Summary

LaViC integrates visual data into dialogue-based recommendation systems, using a two-stage process for visual knowledge self-distillation and recommendation prompt tuning. It outperforms text-only methods and vision-language baselines, emphasizing visual data's role in capturing product attributes. LaViC highlights visual information's importance in visually oriented domains, enhancing recommendation quality. It references 35 works on conversational recommendation systems, including advancements in conversational AI, multimodal interactions, and user preference modeling. Contributions include Sentence-BERT, LLaMA, Llama 2, DualGNN, and LLaVA-v1.6, addressing conversational recommendation, multimedia recommendation, and large language models.

Introduction

Background

Overview of dialogue-based recommendation systems

Importance of visual data in product recommendation

Objective

To integrate visual data into dialogue-based recommendation systems using a two-stage process for visual knowledge self-distillation and recommendation prompt tuning

Method

Two-Stage Process

Stage 1: Visual Knowledge Self-Distillation

Techniques for extracting visual knowledge from images

Methods for distilling visual information into a form usable by recommendation systems

Stage 2: Recommendation Prompt Tuning

Strategies for refining recommendation prompts with visual inputs

Integration of visual data to enhance the relevance and personalization of recommendations

Performance Evaluation

Comparison with text-only methods

Benchmarking against vision-language baselines

Visual Data's Role

Capturing Product Attributes

How visual data contributes to a deeper understanding of product characteristics

Enhancing Recommendation Quality

The impact of visual information on improving the accuracy and relevance of recommendations

Contributions and References

Relevant Works

35 works on conversational recommendation systems

Advances in conversational AI, multimodal interactions, and user preference modeling

Key Technologies

Sentence-BERT, LLaMA, Llama 2, DualGNN, and LLaVA-v1.6

Contributions to conversational recommendation, multimedia recommendation, and large language models

Conclusion

Summary of LaViC's Achievements

Future Directions

Potential improvements and extensions of LaViC's approach

Ongoing research challenges in integrating visual data into dialogue-based recommendation systems

Basic info

papers

computer vision and pattern recognition

artificial intelligence

Advanced features

Insights

What innovative approaches does LaViC employ to enhance recommendation quality in visually oriented domains?

How does LaViC compare to text-only methods and vision-language baselines in terms of recommendation quality?

How does LaViC integrate visual data into dialogue-based recommendation systems?

What are the key contributions of LaViC in the context of conversational recommendation systems?