English Language Learner Corpora
EF-Cambridge Open Language Database (EFCAMDAT)
The EF-Cambridge Open Language Database (EFCAMDAT) is a publicly available resource to facilitate second language research and teaching. It contains written samples from thousands of adult learners of English as a second language, worldwide.
EFCAMDAT contains over 83 million words from 1 million assignments written by 174,000 learners, across a wide range of levels (CEFR stages A1-C2). This text corpus includes information on learner errors, parts of speech, and grammatical relationships. Researchers can search for language patterns using various criteria, including learner nationality and level.
Wildcat Corpus
The Wildcat Corpus of Native- and Foreign-Accented English is a corpus of both scripted and unscripted speech between native and non-native speakers of English. A key feature of this corpus is that for the unscripted part, the talkers are recorded in pairs (all possible pairing of native and non-native English speakers) as they work together on an interactive goal oriented task (the Diapix task). The corpus can be used to answer basic phonetic questions about speech communication in situations where one or both of the conversation partners may be non-native speakers of the target language.
Publications
Language Learner Corpora
Yan Huang, Jeroen Geertzen, Rachel Baker, Anna Korhonen and Theodora Alexopoulou. (2017). The EF Cambridge Open Language Database (EFCAMDAT) Information for Users
Kristin J. Van Engen, Melissa Baese-Berk, Rachel E. Baker, Arim Choi, Midam Kim, and Ann R. Bradlow. (2010). The Wildcat Corpus of Native- and Foreign-Accented English: Communicative Efficiency across Conversational Dyads with Varying Language Alignment Profiles. Language and Speech 53(4), 510-540.
LSA Summer Meeting 2008 Poster: Addressing Challenges Posed by Speech Corpora Including Non-Native Speakers
Phonetics, Prosody, and Language Learners
Rachel E. Baker. (2010). The Acquisition of English Focus Marking by Non-Native Speakers. Doctoral Dissertation, Northwestern University.
Rachel E. Baker, Melissa Baese-Berk, Laurent Bonnasse-Gahot, Midam Kim, Kristin J. Van Engen, Ann R. Bradlow. (2011). Word Durations in Non-Native English. Journal of Phonetics 39(1), 1-17.
Rachel E. Baker. (2010). Non-Native Perception of Native English Prominence. Proceedings of the 5th International Conference on Speech Prosody, Chicago, IL, May 2010.
Rachel E. Baker and Ann Bradlow. (2009). Variability in Word Duration as a Function of Probability, Speech Style, and Prosody. Language and Speech 52(4), 391–413.
Cross Language Speech Perception Workshop at ASA Spring 2009 Meeting Poster: Word-level Rhythm in Nonnative English
ASA Fall 2007 Meeting Poster: Second Mention Reduction in Indian English and Korean
Pragmatics and Discourse
Ryan Doran, Gregory Ward, Meredith Larson, Yaron McNabb, and Rachel E. Baker. (2012). A novel experimental paradigm for distinguishing between what is said and what is implicated. Language 88(1), 124-154.
Ryan Doran, Rachel E. Baker, Yaron McNabb, Meredith Larson, and Gregory Ward. (2009). On the Non Unified Nature of Scalar Implicature: An Empirical Investigation. International Review of Pragmatics 1(2), 211-248.
Gregory Ward, Ryan Doran, Meredith Larson, Rachel Baker, and Yaron McNabb. (2009). Caught in the Semantic-Pragmatic Crossfire: Literal Lucy to the Rescue. Proceedings of the 2008 Invitational Symposium on Approaches to Variation and Change in English, Bamberg, Germany, July 2008.
Meredith Larson, Ryan Doran, Yaron McNabb, Rachel Baker, Matthew Berends, Alex Djalali, and Gregory Ward. (2009). Distinguishing the SAID from the IMPLICATED Using a Novel Experimental Paradigm. In Semantics and Pragmatics: From Experiment to Theory, edited by Uli Sauerland and Kazuko Yatsushiro. Berlin: Palgrave MacMillan. 74-93.
Rachel E. Baker, Alastair J. Gill, and Justine Cassell. (2008). Reactive Redundancy and Listener Comprehension in Direction-Giving. Proceedings of the SIGDIAL Workshop, Columbus, OH, June 2008.
Speech Synthesis
Rachel E. Baker, Robert A. J. Clark, and Michael White. (2004). Synthesising Contextually Appropriate Intonation in Limited Domains. Proceedings of the 5th ISCA Speech Synthesis Workshop, Pittsburgh, PA, June 2004.
Rachel Baker. 2003. Using Unit Selection to Synthesise Contextually Appropriate Intonation in Limited Domain Synthesis. Masters Dissertation. University of Edinburgh.