Corpora have been regarded as useful sources for English language teaching. As various studies are devoted to the use of corpus in teaching grammar and vocabulary, more research is needed to ascertain how learners could benefit from the use of corpus-based data in improving their writing skills, their reflective writing efficiency, in particular. In the present study, conducted at one of the leading international universities in Uzbekistan, Westminster International University in Tashkent (WIUT), data on 43 students experiences, opinions, and expectations as well as 100 original reflective essays have been analyzed using different corpora to examine whether corpus-based data could serve the ground for material design or not and whether it can indeed function as a source of hypotheses regarding language teaching. The accumulated data along with the designed materials outline the scope for follow-up research into the effectiveness of the corpus-based approach in teaching reflective writing. Relying on students responses and readiness to attend workshops on corpora-based learning, the implementation of the designed materials seems plausible and heralds fruitful results.
As the purpose of any corpus is to carry out a lin-guistic analysis, my intention for designing and analyzing it is exclusively pedagogical. I have been teaching Academic English module for over 12 years and teaching students to write a reflection essay has been viewed as something formulaic and devoid of any need to contemplate on how the representative-ness of reflective writing could facilitate teachers approaches to designing tasks with a much greater insight and efficacy. The development of computer technologies offers infinite opportunities for teachers to transcend their teaching to a newer and more lucrative, although thorny, level of teaching. This paper aims to analyze 100 WIUT CIFS students reflective essays through a specialized corpus and elaborate on three different two-word collocations, adjective-noun, adverb-adjective and verb-noun, to design teaching activities. It is known that purpose-ful and resourceful approach to teaching collocations in the classroom is often neglected (Farghal and Obidedate, 1995, 318), therefore, the current study has actual purposes to dissolve this common belief and examine whether corpus-based data could serve the ground for material design or not and whether it can indeed function as a source of hypotheses regarding language teaching.
Review of Literature
Collocations in Academic Writing
Collocations are defined as words that go together in oftentimes unexpected ways; this lexical together-ness is not a random cohesion, but the denotation of belonging (Walter and Woodford, 2010, p7). Webb and Kagimoto define collocation from a statistical point of view as two lexical items co-occurring at a particular frequency rate within a determined span (2009, p59). For teachers and lexicographers, collo-cations are mostly associated with habitual com-binations, e.g. do (not make) a case or do (not make) the laundry, and collocability or restrictions at a greater emphasis (Sultana and Yoko, 2021).
Academic Writing and Collocations
The most obvious and valuable advantage of collocation use is that they help students to express their written ideas in a more natural and concise manner, thereby bringing the level of their writing efficiency close to that of native speakers (McCarthy and ODell, 2017). Wood, (2002) emphasizes that irrespective of learners age; the knowledge of col-locations boosts word-stock and contributes to the fluency in exposition. In language assessment tes-ting, students with lower language levels tend to use high-frequency collocations, while students with higher performance are inclined to apply more low-frequency collocations (Granger and Bestgen, 2014). The latter group is more typical among the represent-atives awarded IELTS score 7+ on their writing component (IELTS Tutors, 2022).
Errors in the Use of Collocations and Problems in EFL Learners Academic Writing
Various research findings illustrate that studying a large number of language examples is indispensable for EFL learners to detect their writing issues and determine effectiveness of their own learning. Many internal and external factors might influence students application of collocations in writing. The most egregious language hurdles diagnosed by teachers are word choice misuse and incorrect sen-tence structures explained by students application of their native language (inter-lingual or intra lingual transfer) despite its apparent incongruence (Yang et al., 2019, p115). Apart from native language transfer and incorrect use of collocations, Phoocharoensil, (2011) found preponderance of mistakes in lexical collocations over grammatical ones, attributing them to overgeneralization and synonymy (pp111-115). On top of that, EFL students are predisposed to make mistakes in collocation use when writing due to the dearth of collocational knowledge, lack of practice in paraphrasing and, on the whole, lack of the con-ceptual understanding of collocations (Hashemi et al., 2012). In Table 1, the authors illustrate six different types of collocations, together with their characteristics and examples, which language lear-ners tend to misuse (Hashemi et al., 2012, pp556-557). Various other issues may impact collocation misuse in writing, among which the most prominent ones are ineffective learning strategies, learners low language proficiency and memory capacity (Liu, 2010, p28; Mohammad et al., 2023).
Table 1: Types of collocations.
Sketch Engine Corpus-operated tools for Collo-cations
Used by major publishers, such as Macmillan or CUP, to produce grammar books and compile dictionaries, web-based program “Sketch Engine” (originated from the word “sketch”) demonstrates a words collocation and grammatical behavior on a one-page summary (Kilgarriff et al., 2014, p9). It is often referred to as a draft dictionary entry. It functions as both the software and the web service, which collectively contain a considerable number of ready-to-use pre-loaded corpora as well as diverse tools that help to create, install and manage ones corpora (Kilgarriff et al., 2014). Those tools are word sketch, thesaurus, keywords, word lists, n-grams, concordance and trends. In this light, using, integrated within the corpus, tools, teachers can not only develop and easily adapt learning materials, but also provide guidance to students on how to effectively integrate academic vocabulary in writing.
Teaching and Learning Collocations
Honing students awareness and confident use of collocations in L2 writing is essential and, beyond any doubt, manifest. Nevertheless, while the impor-tance of aiding students learning of collocation use is welcomed by teachers, the latter are reported to encounter challenges in applying apposite techniques and strategies in achieving this (Wray, 2000). The author maintains that practical implementation of this is not as customary as mere presentation of semantic or grammatical features of collocations. Cognitive analysis allows for effective teaching of collocations for it helps to learn about ought-to-be-satisfied collocation needs in a language classroom and this is achieved in various ways. Liu, (2010) suggests that teachers utilize dictionaries and cor-pora, arrange collocations by their meaning, include comparison and contrast as well as pattern expla-nations of different L2/L1-L2 collocations (pp8-9). Fig. 1 illustrates the cycle of how corpus linguistics, driven by students needs incited while teaching, supplies an instructor with updated teaching re-sources, methods and insights (Huyen, 2019, p280).
Fig. 1: Application of corpora in language teaching.
So, the relationship between corpus linguistics and language teaching, as of late, has been inextricable and needs more attention from language teachers as well as researchers. Language teachers should pay attention to the application of corpus linguistics in L2 teaching as “corpus data can provide language teachers and learners with illuminating guidance as to frequent collocations” and it “supports the use of examples of real language class” (Reppen, 2010).
Corpus Description
While “corpus” is crudely defined as a collection of large amounts of authentic texts, any form of linguistic inquiry based on data extracted from such a corpus is known as “corpus linguistics” (Stefa-nowitsch, 2020). As Hunston, (2002) states a corpus itself is devoid of any new information in terms of language, but the integration of software renders a totally new insight on the familiar. Driven by the idea to reveal a new perspective on the language, I built a corpus of 33,030 words. It consists of 100 texts each ranging between 300-350 words. The texts represent original students work submitted as Entry 3 for Portfolio assessment at Certificate of Inter-national Foundation Studies (CIFS) at Westminster Inter-national University in Tashkent (WIUT). Initially, the sampling data, comprising about 8% of the total (1,323) submitted through Turnitin, were randomly selected and then manually extracted from Portfolio to be zipped for more convenient upload on Sketch Engine. The type of corpus I created is the most frequent one, monolingual. Such corpus allows a user an easier option to study various formal non-translated texts for intra-lingual analysis of patterns and word forms to create highly practical tasks (Johansson, 2007, p57).
Since I used texts created by learners of a language, my corpus is a learner one. Using it on Sketch Engine, I was able to identify the most pervasive mistakes and challenges that students have when learning how to write reflectively. Besides, my corpus is specialized because the reflective texts of which it is composed relate to one particular subject area, namely Academic English, and using this corpus I was able to see how the language is used. From time perspective, my corpus is synchronic since all assembled texts refer to the same time point - Semester 1 coursework submission as of December 2021. According to Meyer, (2004) when synchronic corpus is created, the compilers main objective is to ensure the narrowness of the time-frame to view the language undisturbed by its dynamics accurately (p45). In regard to topicality, like most corpora, the designed corpus is static because of its complete content development and no additional truncations or additions.
Questionnaire and Corpus Analysis
In this section, I will present the results of an online survey (designed to identify and evaluate students needs, perceptions and expectations they may have in the classroom activities) as well as a detailed analysis of selected data extracted from corpus.
Needs Analysis Questionnaire
To collect information about the students general background on language proficiency, experiences and perception of various corpora-related aspects, an online survey was conducted among 4 CIFS groups, in which three score of male (60.5%) and almost two-fifths (39.5%) of female volunteers shared their responses. It was found that only a quarter of the students studied English for longer than 5 years, while the majority did so for 2 and 3 years (34.9% and 25.6% respectively). Despite the fact that none of the subjects achieved 7.5+, the percentage of high-scorers (7.0) accounted for 9.3%, a figure similar to that of the low-scorers (5.5) 11.6%; nevertheless, most students scored between 6.0 and 6.5 on IELTS writing component (79%). Although a little over a quarter (25.6%) of the students felt that IELTS had a small or insignificant impact on realizing their potential in the Academic English module, the remaining ones were of the opposite opinion, with some 56% selecting 5-6 on a 10-point Likert Scale. The usefulness of collocations in academic writing is acknowledged by over a half of those questioned (51.2%), however, nearly four-tenths are doubtful and one in ten students do not see practical value in them. Whats more, less than one in ten students (9.3%) always checked with the dictionary whether they used formal collocations or not; the same percentage never considered its importance. As for the rest (81. 4%), they did so occasionally. Surprisingly, the figures grow consi-derably when students are ought to find similar to their mother tongue collocations with only 2.3% of the surveyed never doing so and the others checking them sometimes (46.5%) and on a constant basis (51.2%). Following, almost four out of five survey-participants (79.1%) realize they should work more on enhancing their vocabulary to write more skill-fully when completing academic assignments. Focusing more on paraphrasing, summarizing and sentence structures is crucial in 65.1%, 55.8% and 51.2% of the cases. All other areas for improvement, namely grammar, overgeneralization, learning stra-tegies and word for word synonymy are the main focus for 30 - 44% of the students in the survey. Students awareness of the most popular corpora is relative. The corpus that the majority of the students (53.5%) knew was Wikipedia Corpus, and American English Corpus following it (25.6%). Approximately three-tenths (27.9%) of the respondents never heard of the corpora listed. When asked what online services, tools and websites students used to faci-litate their writing process, various ones, many of which were unrelated, were mentioned with Quil-bolt, Context Reverso, Grammarly and several re-ferencing tools heading the list. Finally, over two-thirds (67.4%) of the CIFS students expressed their readiness to attend extracurricular workshops to learn how to make text-based analysis to improve their grammar and the use of collocations. None of them rejected the idea and the remaining third might consider the opportunity given a chance.
Corpus-based Analysis and Interpretation
The current corpus was studied using Sketch Engine Wordlist tool so as to scrutinize how effectively Adjective + Noun, Adverb + Adjective and Verb + Noun collocations were generally used in students writing and if there were any ways to improve it. Out of 425 adjectives found in the total of 2,592 frequencies (number of times the item was found in the corpus), 12 lemma-adjectives with absolute frequency ranging between 130 (academic) and 33 (next) were selected. As for adverbs, found in 1,797 instances, the two 2-gram adverbs vary to an extent of 222 items. Of those, 12 lemma-adverbs selected the range of 188 (not) and 21 (really) respectively was featured. Verbs, roughly comprising one-fifth of the corpus, provided undisputed dominance among other collocation groups resulting in 6,247 instances and, correspondingly 499 different 2-gram collocations. The twelve most frequent Verb + Noun collocations varied extensively between 1,155 (be) and 68 (get) absolute frequencies. While absolute frequency shows how many times the item was found in the corpus, there was another statistic value considered, viz. a LogDice, which is used in Sketch Engine to identify collocations as well as their strength. Relying on the latter parameter, it should be understood that the strength (typicality) of a collocation is ascribed to the score; in the meantime, a low score means that the words in the collocation frequently combine with many other words. According to LogDice the selected collocations, on the whole, varied on a range of 7.5 to 13.9, e.g. Adjective-Noun (difficult module: 7.5; reliable source: 13.4), Adverb + Adjective (very grateful: 9.6; most importantly: 13.9), and Verb + Noun collocations (make portfolio: 8.5; find source: 13.0), out of an aggregate of 304 collocations, i.e. 110, 74 and 120 respectively. Using the estimated overall associative collocations strength, there was com-puted a median of LogDice score given to each type of collocation (Fig. 2). This means that the three collocations in the entire corpus of over 33,000 words (37,023 tokens) having the highest LogDice strength are find source, reliable source, and most importantly, with the latter one heading the list.
Fig. 2: Median LogDice collocations strength.
Frankenberg-Garcia et al. (2019, p32) distinguish between free associations and collocations and point out that the bottom of the minimum of LogDice score for a collocation is below 5; any of those transcending this threshold should be deemed “free combinations” (Table 1 above). Young and Sun-Young, (2020, p448), went even further and cate-gorized collocations into five sub-groups depending of their association level: very high strength (over 11), high (9.5∼11), upper-mid (8∼9.5), mid (6.5~8) and lower-mid (logDice = 5∼6.5). None of the selected (over 300) collocations represent the fifth group. Relying on the median of LogDice score (Fig. 2), while Adverb + Adjective collocations are repre-sentative of very high strength with 11.37703, Adjective + Noun and Verb + Noun ones are typical of high LogDice median (10.21765 and 10.14471 respectively). As it can be seen, there is no single lower-mid (LogDice = 5∼6.5) collocation collected from the corpus, but there are two examples of mid (6.5~8) strength associated with gradable adjectives: difficult module (7.5) and easy essay (7.7). By comparing these data with the obtained from students survey, it can be deduced that IELTS scores of 6.0+ could more or less be used as framework indicators of defining the approximate expected level of expertise in the use of collocations.
It is interesting to note that showing how many times the item was found in the corpus -“frequency” in 90% (274 items) of the cases, is lower than 11.0 (Table 2).
Table 2: Reflection Corpus Collocation Frequency.
The remaining 10% to over a half (17 items) are in the 11-20 frequency group. Besides, 10 of them distributed throughout all groups, except for the 61-80 one, which contains two Verb + Noun col-locations write draft and get feedback. This confirms Granger and Bestgen, (2014) idea that low-scorers tend to use high-frequency collocations, while high-scorers prefer to apply more low-frequency collo-cations Another surprising upshot is that none of the Adverb + Adjective-lemma collocations entered the list, with only very useful scoring as much as 9.0. This result may result from the lack of attention paid in class to this problem. Due to reflective essay being an academic writing assignment type, I at-tempted to compare 304 collocations with the Academic Word List (AWL) suggested by Victoria University of Wellington (no date). The list is divi-ded into 10 sub-lists, each representing 60 families, except for the last one, which represents 30. The sub-lists provide most common words and their com-monality grows less intensive as the list progresses (Table 3).
Table 3: The sub-lists of most common collocations compared with Academic Word List (AWL).
Relying on the obtained data and meticulous analysis, it was ascertained that Verb + Noun colloc-ations, comprise the biggest number 29 (49%), followed by Adjective + Noun 25 (43%) and Adverb + Adjective 5 (less than one-tenth). Further analysis shows that verbal and adjectival prevalence of collocations is explained not by their exceptional diversity, but by the repetition of the head words, with some, draft and task, recorded in six different collocations and the others ranging mainly between two and three e.g. draft, task, source, job, style, topic, summary, resource, evidence, method and aspect:
• have draft, write draft, next draft, complete draft; give draft; improve draft;
• difficult task, easy task, be task, complete task, do task, give task;
• reliable source; different source, find source, use source, make source;
• give job, do job, easy job, future job;
• academic style, new style, make style, use style;
• future topic, different topic, new topic, give topic;
• next aspect, difficult aspect, useful aspect;
• be summary, write summary;
• reliable resource, find resource;
• reliable evidence, find evidence;
• use method, make method;
The comparison also allowed revealing that 16 collocations represent AWL sub-group 1, which is over a quarter of the total, and others are from AWL sub-groups 5 and 2, with 11 and 10 collocations res-pectively. Sub-lists 6-8 are rather scarce, accounting for only 8 collocations, most of which are listed in sub-group 7. Finally, AWL sub-lists 9 and 10 are devoid of any academic words, albeit borrowed from AWL words, such as anticipate, attain, devote, for-mat, team, assemble, compile, convince or undergo could well be phrased within the context of reflective writing to bring greater diversity and avoid repe-tition.
Creating Corpora-based Activities
In this section, I will rationalize the decision on why a particular classroom activity was chosen. All 11 Tasks integrated within 3 Activities (with answer keys at the end) were created using various tools available for Reflection Corpus, Academic English Corpus and in British National Corpus (BNC). The created activities are estimated to last approximately 110 minutes and can be planned applying various modes of teaching (e.g. individual, pair-work, small group-work, plenary, etc.). For each task clear instructions, aims, objectives and timing were considered. When planning them, I considered Blooms Taxonomy verbs (affective and cognitive domains) and indented to contribute to the deve-lopment of both HOTs and LOTs in students learning. The objectives were built on the basis of SMART goals.
Activities 1-3 on Adjective + Noun, Adverb + Adjective and Verb + Noun Collocations
Relying data, I selected the most frequent Adjective + Nouns collocations from the corpus and listed the latter in rows of five so that the students could match them with one of the suggested adjectives from the bank of words. The practical value of this task lies not in prescriptivism, but in its non-judgmental, descriptivist approach allowing operating with the language L2 WIUT CIFS students normally employ when reflecting on their academic experience. Doing this task, the students should pay heed to those words that may be out of regular use because most of the adjectives can match virtually any string of nouns (Activity 1, Task 1). In reflective writing, the interpretation paragraph should consider the reason/causes for why a student succeeded or failed to make a particular achievement, in other words provide justification. Having checked with the and inquiring a Word Sketch Difference in the Reflection Corpus, it can be seen that the nouns “reason” and “cause” were used 13 and 2 times (Fig. 3) respect-ively and are positioned 113 and 351 in the list of 749 nouns. Compared to the same inquiry on BNC, a similar preference toward the use of the modified noun “reason” can be noticed, but the discrepancy is over twofold less frequent (6.5 and 2.85 times respectively) (Fig. 4) (Activity 1, Task 3).
Fig. 3: Word Sketch Difference: Reflection Corpus.
Fig. 4: Word Sketch Difference: BNC Corpus.
Besides, having analyzed the use of these nouns with the Concordances tool (Fig. 5), it is apparent that students, despite an abundant variation, limit their use only to adjectives “same” and “main”, thus teachers help in diversifying adjective +cause/noun collocations is compulsory.
Fig. 5: The use of “reason” and “cause” in Reflection Corpus.
Word Sketch Difference provides 100 different adjective as well as their usability and frequncy with reason/cause (Table 4). Using these data, I created a “Tick Box” activity. I selected 20 collocations among most frequent and more academic options, where a number was against a zero, e.g. simple 156 - 0, or possible 129 -109, for “Both” option (Activity 1, Task 3). When designing Tasks for Activity 2, I aimed to encouraging students to generalize the meaning of collocations sets through recognizing familiar ones (Activity 2, Task 1), highly successful or strongly negative.
Table 4: Word Sketch Difference.
BNC “adjective Prep” combination allowed to generate a long list of interesting and useful col-locations for the task. In the next task (Activity 2, Task 2) I targeted to experiment with Academic English corpus so that students could detect app-ropriate collocations for prospective accomplishment of academic tasks. Survey results showed their awareness and interest in exploring corpora for learning. The task involves search techniques and the use of technology, e.g. working with smart phones, laptops, QRs. Following to expand students essen-tial vocabulary for collocations by changing their form Task 3 (Activity 2) was created. To do so, I used concordances from BNC. Lastly, the screenshot below (Fig. 6) shows how limited the students use of adverb “very” is; expanding its use is vital and should preferably be made toward single-word academic analogues. Synonyms available through Thesaurus on Sketch Engine were of great help when designing the task for recalling and recognizing short-form academic words (Fig. 7).
Fig. 6: Reflection Corpus Concordance of the Adverb “very”.
Fig. 7: Visualizing the Comparison of BNC and Reflection Corpora.
Finally, classroom activities for Verb + Noun collocations focused on the most egregious mistakes that students make with the use of verbal collo-cations (Activity 3, Task 1). I revealed it by studying Word Lists and Concordances from my corpus and created a reflective piece of my own in which, I purposefully integrated 12 of those mistakes throughout 27 lines. The students are expected to evaluate the quality of the passage and learn how to edit their written assignments through focused search for those verbal collocations.
Table 5: Word Sketch from Reflection Corpora.
Another essential focus was verb + feedback col-locations since those (Table 5) were of extremely limited use (with receive and get heading the list) and often were the result of discussed above inter-lingual or intra lingual transfer (Yang, Harn and Hwang, 2019, p115), e.g. “surrender drafts” as well as overgeneralization and synonymy Phoocharoensil, 2011, pp111-115). The last task aims to synthesize the most frequent collocations with “feedback” and resolve the problem of repetition and informality. Besides, it aids the use of various prepositions that may follow one out of 33 collocations for “verb + feedback”.
This paper has attempted to illustrate how scrupu-lously analysed data from a specialized monolingual corpus and other corpora data in conjunction with pre-learnt students needs could potentially con-tribute to designing L2 classroom activities to expedite students learning. The entire work is de-voted to three different types of collocation, the created activities for which pursue the goal to improve students written exposition, namely their reflective writing. The classroom activities have not been piloted yet.
However, the accumulated data along with the designed materials outline the scope for follow-up research into the effectiveness of the corpus-based approach in teaching reflective writing. Relying on students responses and readiness to attend work-shops on corpora-based learning, the implementation of the designed materials seems plausible and heralds fruitful results. Nonetheless, while the constraints are imminent, I can assuredly state that corpora grant immense opportunities for both the teachers and the students and should become an inextricable part in lesson planning.
The author is thankful to all students who made a significant contribution to the study by providing their consent to use data for research.
The author declares that there are no potential conflicts of interest concerning the research, author-ship and or publication of this article.
Academic Editor
Dr. Antonio Russo, Professor, Dept. of Moral Philosophy, Faculty of Humanities, University of Trieste, Friuli-Venezia Giulia, Italy.
Department of Global Education, Westminster International University in Tashkent, Uzbekistan.
Asanov A. (2023). The use of specialized corpora as supply reference of collocations for teaching reflective writing in academic English, Asian J. Soc. Sci. Leg. Stud., 5(4), 108-117. https://doi.org/10.34104/ajssls.021.01080117