At First Sight: Zero-Shot Classification of Astronomical Images with Large Multimodal Models
Dimitrios Tanoglidis, Bhuvnesh Jain·June 24, 2024
Summary
This research investigates the zero-shot classification capabilities of large Vision-Language Models (VLMs), GPT-4 and LLaVA-NeXT, in the field of astronomy. Without additional training, GPT-4 exhibits high accuracy (83% for LSBGs and artifacts) while LLaVA-NeXT performs less well. The study addresses the lack of labeled data by using natural language prompts for zero-shot learning. It evaluates the models on tasks to differentiate LSBGs, classify galaxy shapes (round, cigar, edge-on, spiral), and the GalaxyMNIST dataset. The research suggests that VLMs, though not yet outperforming custom-built or fine-tuned models, show promise in astronomy and can be a teaching tool. Future work should focus on improving performance through domain-specific fine-tuning and addressing the challenges in distinguishing between similar galaxy types.
Advanced features