Speaker: Prof. Clarence GREEN

Assistant Professor, Faculty of Education, University of Hong Kong

Date: 18 March 2024

Time: 14:00-15:30

Venue: Room E32 – G020, Faculty of Law

Language: English

Title: Modelling Early Print Language Environments using Corpora: A Lexical Research Project into Children’s Picture Books


This presentation describes work on modelling language learning environments of L1/L2 by extending traditional corpus linguistics by drawing on border data science methods in natural language processing It describes the development of a novel corpus of children’s picture books and models of vocabulary input in the print and oral language environment. Children’s language development is enhanced through children’s picture books so it is important to better characterize this print environment in terms of how it adds to the opportunities for vocabulary learning over and above spoken language input. Previous research has been restricted by methodological limitations precluding the development of large corpora of the print environment of children. The study applies data mining methods to a build larger corpus model than previously possible and investigates over 2000 narrative and information picture books. This method provides researchers access to larger pools of data than previously possible. Models are developed to estimate the additional word-type exposure in L1 and EAL language environments including (or lacking) English-language picture books, indicating that picture book exposure changes the language environment of children in a way that supports EL reading development by providing exposure to varied and different semantic environments than child-directed speech. Additional findings include that picture books provide exposure to EL academic vocabulary. Computational models indicate that book reading once every day or second day over a year substantially boosts unique-word exposure for most models of language environments.