B) TFM (Corpus Linguistics)



Corpus Linguistics is one of the newest areas about linguistic studies. The first books and manuals did not appear until the late 1980s. Although there is not a general consensus among the authors about its origins or its derivation from a particular area of studies, some linguistics include it in the CMC sciences (“Computer Mediated Communication”) that are defined as “synchronous or asynchronous electronic communication or computer conferencing, by which senders encode in text messages that are relayed from senders’ computers to receivers” (Walther: 1992: 52). Another definition of CMC can be “any communication patterns mediated through the computer” (Metz: 1992: 3).

It is an area of studies based on the simultaneous analyses of texts. The aim of corpus linguistics is to search linguistic patterns or language units like key words, lexemes, phraseological patterns, grammatical associations, etc. The data should be collected in their natural context (“real language”) so that the user can analyse the selected terms and the conditions for the production of that specific source. The user has direct access to the data (texts) through a concordance tool (specialized software). One of the most famous and useful concordance tool is the free program AntConc: http://www.antlab.sci.waseda.ac.jp/software.html

The results from the process are very useful and allow the user to confirm a grammatical or linguistic rule, (“corpus-based approach” or “top-down technique”) or through a non-marked corpus (a corpus without annotations) the user can make new generalizations about the exposed language to create new linguistic generalizations (“corpus-driven” or “bottom-up” technique).

antconc concordances

Concordance results of the word «face» in a corpus

There are different corpus available online (“corpora”). The most common are those divided into “general corpora” or “specific corpora”. General corpora contain millions of words and are especially designed to check common linguistic patterns or general characteristics from language. There are several examples of this variety of corpus: American National Corpus (ANC; 22 million of words), Corpus of Contemporary American English (COCA; 450 million of words), Cambridge International Corpus (CIC; one billion of words). However, one of the most famous and used by different authors until today is the BNC (Bristish National Corpus; 100 million words) because it was one of the first corpora to incorporate terms from oral conversations (http://www.natcorp.ox.ac.uk/)

The specific corpora are divided into thematic areas (engineering, medicine, cooking, history, academic texts, students’ corpora, etc.) and they are usually smaller in number of words. Although they are more practical and it is easy to work with them, the number of terms can be too small to create a generalization about the use of a word. Some of the most known specific corpora are these: International Corpus of Learner English (ICLE; 3.7 millions of written words), Michigan Corpus of Academic Spoken English (MICASE; 2 million of words), Spoken English Corpus (SEC; 50.000 words).

Corpus Linguistics has been a technique that was initially applied by researchers and language investigators (it has produced new language theories). However, there have been many attempts to include it in the classrooms. The system Data Driven Learning (DDL) promoted by Tim Johns in 1991 was a successful technique that has been followed by different educational centres around the world. With the system the students become themselves detectives of the language and learn the grammatical rules by an inductive approach. They study some words through the use of corpus and see the nature of the term and the rest of units that are associated with it, after formulate their own hypothesis making generalizations about the use of that specific term and finally the teacher corrects the answers. In 2008 the University of Nihon applied the system and lecturers affirm that the results obtained in exams where better than following traditional grammar lessons: http://bit.ly/1gkW8oL

Corpus Linguistics is a new language science full of possibilities for researchers and students and it is a system that little by little is being used by more institutions. Although it has many problems and some of its applications generate a lot of doubts, it can be a good alternative to the traditional grammatical method. As it is simple to use it should not be rejected only because the system implies using technology. It is an opportunity to increase the autonomy of the students and to improve their skills, the teacher can create corpus especially designed for the classroom and so the most doubtful areas can be resolved and the students can work in teams using the same application. Although it is a new approach and some risks are involved, the academic results support its use. There are good alternatives to teach English, and Corpus Linguistics is one of the possibilities to benefit students.



– Metz, J. M. Computer Mediated-Communication: Perceptions of a new context. Paper presented at the Speech Communication Association annual conference. Chicago: 1992.

– Walther, J. B. Relational Communication in Computer-Mediated Interaction. Human Communication Research 19. Dover: University of Delaware, 1992.