A Survey of Large Language Model Agents for Question Answering

Murong Yue·March 24, 2025

Summary

LLM-based agents excel in question answering, surpassing traditional systems. This survey analyzes QA stages, LLM agent design, and future research. It draws insights from NLP sources, covering agent theories, generative agents, and knowledge graph QA. Recent advancements highlight benchmarks, datasets, and models like Olympiad-Bench, Big-Bench, Folio, and Agentverse. Studies also focus on text summarization, model evaluation, and reliability enhancements. Key areas for future research include uncertainty quantification, intrinsic representation, synthetic data creation, and LLM self-training.

Introduction

Background

Evolution of question answering systems

Role of LLMs in modern AI

Objective

To analyze the stages of question answering with LLM-based agents

To explore the design principles of LLM agents

To outline future research directions in the field

Question Answering Stages with LLM-based Agents

Pre-Processing

Data collection and preparation

Context understanding and extraction

Model Selection and Training

Choice of LLM architectures

Training methodologies and datasets

Inference and Answer Generation

Answer retrieval and synthesis

Post-processing and refinement

Evaluation and Feedback

Metrics for performance assessment

Iterative improvement strategies

Design of LLM-based Agents

Theoretical Frameworks

Agent theories and their application

Generative Agent Models

Architectures for generative QA

Knowledge Graph Integration

Utilization of knowledge graphs in QA

Multi-Modal Approaches

Handling text, images, and other media

Contextual Understanding

Enhancing context-aware responses

Recent Advancements and Benchmarks

Benchmark Datasets

Overview of datasets like Olympiad-Bench, Big-Bench, Folio, Agentverse

Model Evaluations

Comparative analysis of QA models

Performance Metrics

Key performance indicators in QA

Challenges and Future Research

Uncertainty Quantification

Assessing model confidence in answers

Intrinsic Representations

Understanding model decision-making processes

Synthetic Data Generation

Creating diverse training data

LLM Self-Training

Enhancing model learning through self-supervised tasks

Reliability and Robustness

Improving model reliability in real-world scenarios

Conclusion

Summary of Findings

Implications for Future Research

Open Questions and Opportunities

Basic info

papers

computation and language

human-computer interaction

artificial intelligence

Advanced features