A Survey of Large Language Model Agents for Question Answering

Murong Yue·March 24, 2025

Summary

LLM-based agents excel in question answering, surpassing traditional systems. This survey analyzes QA stages, LLM agent design, and future research. It draws insights from NLP sources, covering agent theories, generative agents, and knowledge graph QA. Recent advancements highlight benchmarks, datasets, and models like Olympiad-Bench, Big-Bench, Folio, and Agentverse. Studies also focus on text summarization, model evaluation, and reliability enhancements. Key areas for future research include uncertainty quantification, intrinsic representation, synthetic data creation, and LLM self-training.

Introduction
Background
Evolution of question answering systems
Role of LLMs in modern AI
Objective
To analyze the stages of question answering with LLM-based agents
To explore the design principles of LLM agents
To outline future research directions in the field
Question Answering Stages with LLM-based Agents
Pre-Processing
Data collection and preparation
Context understanding and extraction
Model Selection and Training
Choice of LLM architectures
Training methodologies and datasets
Inference and Answer Generation
Answer retrieval and synthesis
Post-processing and refinement
Evaluation and Feedback
Metrics for performance assessment
Iterative improvement strategies
Design of LLM-based Agents
Theoretical Frameworks
Agent theories and their application
Generative Agent Models
Architectures for generative QA
Knowledge Graph Integration
Utilization of knowledge graphs in QA
Multi-Modal Approaches
Handling text, images, and other media
Contextual Understanding
Enhancing context-aware responses
Recent Advancements and Benchmarks
Benchmark Datasets
Overview of datasets like Olympiad-Bench, Big-Bench, Folio, Agentverse
Model Evaluations
Comparative analysis of QA models
Performance Metrics
Key performance indicators in QA
Challenges and Future Research
Uncertainty Quantification
Assessing model confidence in answers
Intrinsic Representations
Understanding model decision-making processes
Synthetic Data Generation
Creating diverse training data
LLM Self-Training
Enhancing model learning through self-supervised tasks
Reliability and Robustness
Improving model reliability in real-world scenarios
Conclusion
Summary of Findings
Implications for Future Research
Open Questions and Opportunities
Basic info
papers
computation and language
human-computer interaction
artificial intelligence
Advanced features