Descriptive answer script evaluation involves assessing student’s long-form textual answers and it remains a critical and challenging task in educational assessment. Conventional approach of manual grading has shortcoming such as it is time consuming and subjective. To overcome the drawback of manual grading, automated and semi-automated evaluation systems are introduced. This review presents an in depth examination of techniques used for descriptive answer script evaluation. These techniques consists of rule-based methods, machine learning based methods, deep learning based methods, transformer-based and hybrid approaches. Recent advancements in transformer architectures and large language models (LLMs) have greatly improved contextual understanding, scalability, and grading accuracy. In this review, we gather findings from recent studies, and discuss model performance, computational efficiency, and strategies to reduce bias. Despite these advances, significant issues remain concerning generalizability, fairness, explainability, multimodal processing, dataset availability, and pedagogical integration. To address these limitations, there is need of domain-agnostic LLM frameworks, fairness-aware learning, multimodal evaluation, and real-time feedback systems. This review aims to provide researchers and educators with a cohesive view point on current progress, research gaps, and future directions in automated evaluation of descriptive answer script evaluation.
Keywords
Automated Grading, Deep Learning, Large Language Models, Transformer Models