Presentation Second language acquisition (SLA) theory and CALL
Developing an N-gram–Based Spoken Lecture Corpus Tool to Support Non-Native EMI Teachers
Many non-native English-speaking instructors face linguistic challenges when delivering English-medium instruction (EMI), particularly in using discipline-appropriate spoken academic phraseology. This paper presents the development of an n-gram–based spoken lecture corpus tool designed to support EMI teachers through large-scale lecture data.
The corpus was compiled from approximately 1,100 open-source academic lecture transcripts across multiple universities and disciplines, resulting in about eight million words. The tool enables users to search four- to six-word n-grams extracted from authentic lectures. Given a target word or phrase, the system generates frequent n-gram patterns and provides multiple contextualized examples from real lecture transcripts, allowing users to observe how academic language is used in spoken teaching contexts.
The tool was introduced to a group of university instructors teaching EMI courses. Informal feedback indicates that the system was perceived as intuitive and useful for lecture preparation, phrase selection, and increasing confidence in English delivery. Participants particularly valued access to spoken academic patterns that are rarely addressed in conventional EMI training materials.
Although large-scale evaluation has not yet been conducted, this study demonstrates the potential of repurposing open lecture transcripts into practical corpus-based support tools and highlights the pedagogical value of n-gram exploration for EMI teacher development.