Beyond Scores: A Modular RAG-Based System for Automatic Short Answer Scoring with Feedback
Menna Fateen, Bo Wang, Tsunenori Mine·September 30, 2024
Summary
A modular retrieval augmented generation system, ASAS-F, is introduced for automatic short answer scoring with feedback. It uses large language models within a RAG framework to provide detailed, explainable feedback, addressing limitations of existing methods. The system improves scoring accuracy by 9% on unseen questions compared to fine-tuning, offering a scalable and cost-effective solution. It aims to efficiently score short answers and generate clear feedback, minimizing extensive fine-tuning and making the solution computationally efficient and adaptable. The study shares code, model outputs, and evaluations on Github for transparency and reproducibility.
Introduction
Background
Overview of automatic short answer scoring systems
Challenges in current systems: limitations and inefficiencies
Objective
Aim of ASAS-F: improving scoring accuracy and providing detailed feedback
Key features: modular design, use of large language models, and RAG framework
Method
Data Collection
Source of training data
Data preprocessing steps
Model Architecture
Components of ASAS-F: large language models, RAG framework
Integration of retrieval and generation processes
Training and Evaluation
Training methodology: fine-tuning vs. ASAS-F
Metrics for scoring accuracy and feedback quality
Results
Performance Comparison
ASAS-F vs. fine-tuning on unseen questions
Improvement in scoring accuracy (9%)
Feedback Quality
Detailed and explainable feedback provided by ASAS-F
Evaluation of feedback quality and relevance
Implementation and Scalability
Computational Efficiency
ASAS-F's approach to efficient scoring and feedback generation
Adaptability to varying question complexities
Cost-effectiveness
Comparison with other scoring systems in terms of cost
Scalability for handling large volumes of questions
Case Studies and Applications
Real-world Scenarios
Examples of ASAS-F in educational settings or automated assessment
Future Directions
Potential improvements and extensions of ASAS-F
Conclusion
Summary of ASAS-F's contributions
Recommendations for further research
Appendix
Code and Model Outputs
Availability on Github for transparency and reproducibility
Detailed Evaluations
Additional metrics and analyses not covered in the main text
Basic info
papers
computation and language
artificial intelligence
Advanced features
Insights
How does ASAS-F utilize large language models within a RAG framework to provide feedback?
What is the main focus of the ASAS-F system introduced in the document?
What improvements does ASAS-F offer in terms of scoring accuracy compared to fine-tuning methods?
What are the key features of ASAS-F that contribute to its scalability, cost-effectiveness, and computational efficiency?