#4686

Presentation Machine Learning in CALL

Investigating the Validity of Accessible Automated Pronunciation Assessment Using Classroom and Corpus Data

Time not set

Assessing pronunciation accuracy, fluency, and prosody is challenging due to substantial variability in human perceptions of speech production. Automated pronunciation assessment tools have therefore been proposed as scalable supports for both assessment and speaking development. Among these tools, Azure Pronunciation Assessment provides automated scoring across 33 languages at relatively low cost. This study examines the convergent, predictive, and construct validity of Azure’s measures of pronunciation accuracy, fluency, and prosody, with prosody operationalized according to Azure’s definition of naturalness in speech, including stress, intonation, speaking rate, and rhythm.

Analyses of approximately 3,510 speech samples from the ICNALE dataset show that all three measures are strongly associated with CEFR proficiency levels and rank among the strongest CEFR predictors when compared with established indices of lexical diversity and syntactic complexity. In addition, analyses of classroom speech data from 66 learners in Korea, Japan, and China reveal moderate to strong correlations between human ratings and Azure scores across all three constructs.

These findings suggest that Azure Pronunciation Assessment can provide valid, fine-grained feedback to support pronunciation-focused instruction and learning. However, the analyses rely on a fixed reference transcript (“Please Call Stella”), which may limit generalizability across task types, accents, and speaking contexts.