At First Sight: Zero-Shot Classification of Astronomical Images with Large Multimodal Models

Dimitrios Tanoglidis, Bhuvnesh Jain·June 24, 2024

Summary

This research investigates the zero-shot classification capabilities of large Vision-Language Models (VLMs), GPT-4 and LLaVA-NeXT, in the field of astronomy. Without additional training, GPT-4 exhibits high accuracy (83% for LSBGs and artifacts) while LLaVA-NeXT performs less well. The study addresses the lack of labeled data by using natural language prompts for zero-shot learning. It evaluates the models on tasks to differentiate LSBGs, classify galaxy shapes (round, cigar, edge-on, spiral), and the GalaxyMNIST dataset. The research suggests that VLMs, though not yet outperforming custom-built or fine-tuned models, show promise in astronomy and can be a teaching tool. Future work should focus on improving performance through domain-specific fine-tuning and addressing the challenges in distinguishing between similar galaxy types.

Key findings

2
  • header
  • header

Advanced features