Beyond Scores: A Modular RAG-Based System for Automatic Short Answer Scoring with Feedback

Menna Fateen, Bo Wang, Tsunenori Mine·September 30, 2024

Summary

A modular retrieval augmented generation system, ASAS-F, is introduced for automatic short answer scoring with feedback. It uses large language models within a RAG framework to provide detailed, explainable feedback, addressing limitations of existing methods. The system improves scoring accuracy by 9% on unseen questions compared to fine-tuning, offering a scalable and cost-effective solution. It aims to efficiently score short answers and generate clear feedback, minimizing extensive fine-tuning and making the solution computationally efficient and adaptable. The study shares code, model outputs, and evaluations on Github for transparency and reproducibility.

Key findings

Tables

Introduction

Background

Overview of automatic short answer scoring systems

Challenges in current systems: limitations and inefficiencies

Objective

Aim of ASAS-F: improving scoring accuracy and providing detailed feedback

Key features: modular design, use of large language models, and RAG framework

Method

Data Collection

Source of training data

Data preprocessing steps

Model Architecture

Components of ASAS-F: large language models, RAG framework

Integration of retrieval and generation processes

Training and Evaluation

Training methodology: fine-tuning vs. ASAS-F

Metrics for scoring accuracy and feedback quality

Results

Performance Comparison

ASAS-F vs. fine-tuning on unseen questions

Improvement in scoring accuracy (9%)

Feedback Quality

Detailed and explainable feedback provided by ASAS-F

Evaluation of feedback quality and relevance

Implementation and Scalability

Computational Efficiency

ASAS-F's approach to efficient scoring and feedback generation

Adaptability to varying question complexities

Cost-effectiveness

Comparison with other scoring systems in terms of cost

Scalability for handling large volumes of questions

Case Studies and Applications

Real-world Scenarios

Examples of ASAS-F in educational settings or automated assessment

Future Directions

Potential improvements and extensions of ASAS-F

Conclusion

Summary of ASAS-F's contributions

Recommendations for further research

Appendix

Code and Model Outputs

Availability on Github for transparency and reproducibility

Detailed Evaluations

Additional metrics and analyses not covered in the main text

Basic info

papers

computation and language

artificial intelligence

Advanced features

Insights

How does ASAS-F utilize large language models within a RAG framework to provide feedback?

What is the main focus of the ASAS-F system introduced in the document?

What improvements does ASAS-F offer in terms of scoring accuracy compared to fine-tuning methods?

What are the key features of ASAS-F that contribute to its scalability, cost-effectiveness, and computational efficiency?