Workshop Second language acquisition (SLA) theory and CALL
CBLP 2.0: Moving from Open-Ended Prompts to "Hyper-Local" Corpora in Language Learning
For decades, Corpus-Based Language Pedagogy (CBLP) promised to transform learners into researchers discovering language patterns inductively. However, these projects often "fail" to gain widespread classroom traction due to unintuitive interfaces and high technical literacy requirements. This presentation explores whether Generative AI allows this methodology to finally "prevail" via "CBLP 2.0." GenAI democratizes linguistic data, allowing learners to analyze collocations without the steep learning curve of traditional concordancers.
Yet, unbounded Large Language Models introduce new risks: linguistic "hallucinations" and inappropriate registers from generalized training data. To mitigate this, we propose shifting from generic "Prompt Engineering" to building "Hyper-Local Corpora" using Retrieval-Augmented Generation (RAG). Using accessible tools like Google’s NotebookLM, educators and students can upload bespoke, genre-specific texts to create a "walled garden" corpus. This session demonstrates how grounding AI in curated datasets provides GenAI's instant feedback while preserving the precise, evidence-based inquiry of traditional corpus linguistics. Ultimately, sustainable CALL success depends on bounding AI to guide students from passive consumption to active, ethical research.