Academic articles

Primary outputs

Korochkina, M., Marelli, M., Brysbaert, M., & Rastle, K. (2024). The Children and Young People’s Books Lexicon (CYP-LEX): A large-scale lexical database of books read by children and young people in the United Kingdom. Quarterly Journal of Experimental Psychology, 77(12), 2418–2438. https://doi.org/10.1177/17470218241229694. Code, supplementary information, and data files that constitute the database: https://doi.org/10.17605/OSF.IO/SQU49.

Introduces CYP-LEX — a large-scale database containing over 100,000 words derived from a 70-million-word corpus of books popular with British children aged 7–16. The article provides a detailed description of the lexical statistics available in the database and discusses key patterns observed in this vocabulary.

Korochkina, M., & Rastle, K. (2025). The vocabulary barrier in the General Certificate of Secondary Education (GCSE) in English Literature. The Use of English, 76(2), 12–26. https://englishassociation.ac.uk/the-use-of-english/. Pre-print, data, and analysis code: https://osf.io/hqvd3/.

Compares the vocabulary found in books popular with children aged 13–16 (based on the CYP-LEX corpus) with that found in books on the GCSE English Literature specifications from AQA and EdExcel, and considers how differences in vocabulary characteristics may relate to poor attainment in this course.

Korochkina, M., & Rastle, K. (2025). Morphology in children’s books, and what it means for learning. npj Science of Learning, 10: 22. https://doi.org/10.1038/s41539-025-00313-6. Pre-print, data, and analysis code: https://osf.io/vab95/.

Discusses what morphology is and the purposes of morphological knowledge, then presents the first concrete and comprehensive description of morphological information found in books popular with children aged 7–16. Introduces a new theory of how morpheme knowledge may be acquired through reading experience.

Korochkina*, M., Cooper*, H., Brysbaert, M., & Rastle, K. (In press). Morpheme knowledge is shaped by information available through orthography. Psychonomic Bulletin & Review. Pre-print, data, and analysis code: https://osf.io/yq9h7/. *Joint first authorship.

Introduces concrete metrics to quantify both the quality and quantity of exposure to several English prefixes and suffixes, based on the theory developed in Korochkina & Rastle (2025). These metrics are then tested using data from 120 adults in a lexical processing task. The article confirms the psychological validity of the Korochkina & Rastle (2025) theory and demonstrates that skilled readers’ morpheme knowledge is shaped by morphological information that is clearly accessible through orthography alone.

Korochkina, M., Marelli, M., & Rastle, K. (Under review). Morphemes in the wild: Modelling affix learning from the noisy landscape of natural text. Pre-print at https://doi.org/10.31234/osf.io/yzcqm_v1.

Translates the Korochkina & Rastle (2025) theory into a computational model using the compositional distributional semantics approach. This is the first application of such a model with a training regime that simulates natural reading, as well as the first application to prefixation. The model’s knowledge is validated against human data. The article demonstrates that, despite high levels of noise, reading experience contains enough structured information to enable the extraction of core affix semantics, and that affix knowledge is shaped primarily by information accessible through orthography.

Cooper, H., Korochkina, M., Brysbaert, M., & Rastle, K. (In progress). TRT/ART (Title & author order TBC). Forthcoming.

Introduces two validated tests of reading experience — author recognition and title recognition — designed for British primary school children, along with a novel spelling assessment that can be administered online.

Other outputs

Lombard, A., Ulicheva, A., Korochkina, M., & Rastle, K. (2024). The regularity of polysemy patterns in the mind: Computational and experimental data. GLOSSA Psycholinguistics, 3(1): 3, 1–24. https://doi.org/10.5070/G60111327. Data and analysis code: https://osf.io/uhy75/.

Combines corpus linguistics and empirical experimentation to explore the nature of polysemous words (words with multiple meanings) and how readers process them. Introduces a new measure of the regularity of polysemy patterns that explains variance in human behaviour and advances theories on how polysemous words are processed and represented in the mind.

Crawford, M., Raheel, N., Korochkina, M., & Rastle, K. (2024). Inadequate foundational decoding skills constrain global literacy goals for pupils in low- and middle-income countries. Nature Human Behaviour, 9, 74–83. https://doi.org/10.1038/s41562-024-02028-x. Pre-print: https://psyarxiv.com/2qxm9/, data and analysis code: https://osf.io/6s23f/.

Analyses reading assessment data from half a million pupils across 48 low- and middle-income countries, revealing that the absolute majority fail to acquire foundational reading skills despite being in school. Moreover, their performance increasingly falls short of expected benchmarks with each additional year of instruction. The study highlights the urgent need for rigorous phonics instruction in these countries and languages, emphasising that this is the only way to improve global literacy rates.

Rastle, K. (under review). Literacy. Invited submission to the Open Encyclopaedia of Cognitive Science.

An accessible overview of the most important findings in the cognitive science of reading and reading acquisition.