Is AI fun? HumorDB: a curated dataset and benchmark to investigate graphical humor

Veedant Jain, Felipe dos Santos Alves Feitosa, Gabriel Kreiman·June 19, 2024

Summary

HumorDB is a novel image dataset introduced in the paper to investigate visual humor understanding in computer vision. The dataset consists of 3,545 image pairs with varying humor ratings, designed to challenge models in understanding subtle humor and context. Traditional vision-only models struggle, while vision-language models, particularly those incorporating large language models like LLaVA and GPT-4, exhibit better performance. The study evaluates models on tasks such as binary classification, humor ranking, and image comparison, with a focus on pretraining and the role of multimodal inputs. Human evaluations, involving 550 participants, ensure dataset reliability and highlight the importance of originality in humor perception. The dataset, released under CC BY 4.0, serves as a benchmark for humor detection and abstract concept comprehension in AI systems, with implications for content moderation and future research in the field.

Key findings

12

Tables

2

Introduction
Background
Emergence of humor in computer vision research
Challenges for traditional vision-only models
Objective
To investigate visual humor understanding in AI systems
Evaluate the impact of vision-language models and multimodal inputs
Method
Data Collection
Image pair selection: 3,545 diverse and humor-rated examples
Context and subtlety: Purposeful variety in humor types
Data Preprocessing
Humor ratings: Annotation process and rating scale
Pairing methodology: Ensuring balanced and diverse samples
Model Evaluation
Binary classification: Identifying humorous vs. non-humorous images
Humor ranking: Assessing models' ability to rank humor levels
Image comparison: Evaluating model comprehension of humor differences
Human Evaluation
550 participants: Reliability and originality in humor perception
Ground truth validation: Human judgment as benchmark
Multimodal Inputs and Pretraining
Impact of LLaVA and GPT-4: Performance comparison
Role of multimodal fusion in humor understanding
Dataset Characteristics
CC BY 4.0 license: Accessibility and attribution requirements
Applications: Content moderation and abstract concept comprehension
Conclusion
Benchmark for humor detection in AI
Future research directions in humor understanding and AI systems
Implications for real-world applications and ethical considerations
Advanced features