Richard Rose

About

No profile

Sessions

Presentation Investigating the Validity of Accessible Automated Pronunciation Assessment Using Classroom and Corpus Data more

Assessing pronunciation accuracy, fluency, and prosody is challenging due to substantial variability in human perceptions of speech production. Automated pronunciation assessment tools have therefore been proposed as scalable supports for both assessment and speaking development. Among these tools, Azure Pronunciation Assessment provides automated scoring across 33 languages at relatively low cost. This study examines the convergent, predictive, and construct validity of Azure’s measures of pronunciation accuracy, fluency, and prosody, with prosody operationalized according to Azure’s definition of naturalness in speech, including stress, intonation, speaking rate, and rhythm. Analyses of approximately 3,510 speech samples from the ICNALE dataset show that all three measures are strongly associated with CEFR proficiency levels and rank among the strongest CEFR predictors when compared with established indices of lexical diversity and syntactic complexity. In addition, analyses of classroom speech data from 66 learners in Korea, Japan, and China reveal moderate to strong correlations between human ratings and Azure scores across all three constructs. These findings suggest that Azure Pronunciation Assessment can provide valid, fine-grained feedback to support pronunciation-focused instruction and learning. However, the analyses rely on a fixed reference transcript (“Please Call Stella”), which may limit generalizability across task types, accents, and speaking contexts.

Richard Rose

Presentation Identifying LLM-Generated Writing Through Authorship Familiarity more

Large language models (LLMs) have created new opportunities for writing support while simultaneously challenging the integrity of text-based educational assessment. Existing authorship verification methods, such as stylometric analyses and automated classifiers, provide probabilistic judgments that may allow plausible deniability for claimed authors. However, true authors are typically familiar with their text in ways that surrogate authors are not. Building on this insight, the present study introduces the Content Restoration Authorship Familiarity Test (CRAFT), which assesses authorship by asking claimed authors to recall and reconstruct elements of texts they identify as their own. The CRAFT battery was piloted with 60 university students in Seoul. Participants wrote a 16-sentence handwritten text in class. An LLM then generated a second text based on that content. About 30 minutes later, participants completed two CRAFT tests, one for each text. In both texts, four sentences were inserted and five words replaced with synonyms, and participants attempted to identify or restore the original wording. Responses were scored on a 14-point rubric allowing partial credit for morphologically related forms. Descriptive analyses showed non-overlapping performance distributions between human-authored and LLM-generated conditions, suggesting authorship familiarity can provide a reliable behavioral signal for distinguishing genuine authorship from AI-assisted text generation.

Richard Rose