Jaeuk PARK

The Hong Kong University of Science and Technology (GZ)

About

No profile

Sessions

Presentation Can ChatGPT Grade Like a Human? Examining the Reliability and Validity of AI-Assisted Assessment in Academic Writing more

Recent research has explored the use of generative AI for writing assessment, yet evidence regarding its reliability and validity remains mixed. This study examines whether ChatGPT (GPT-4o) can function as an analytic assessment tool for source-based academic essays written by postgraduate research students. A dataset of 122 essays, originally scored by two experienced human raters, was reevaluated by ChatGPT using a standardized analytic rubric (e.g., Idea Presentation, Academic Style, Citation, and Mechanics) and a zero-shot prompting approach. Non-parametric analyses and descriptive statistics were used to examine score alignment, ranking patterns, and domain-specific differences. Results show that while human and AI scores occupied a similar overall range, ChatGPT consistently awarded higher scores and did not rank essays in ways that aligned with human judgment. Significant differences emerged across most rubric domains: human raters scored higher on idea presentation, whereas ChatGPT assigned higher scores for academic style and citation practices; no significant difference was found for mechanics. Repeated AI scoring demonstrated high internal consistency, with variability concentrated in meaning-dependent domains such as argument clarity and source integration. Overall, the findings indicate that generative AI shows promise for reliable form-focused assessment but remains limited in evaluating rhetorical and conceptual quality.

Jaeuk PARK