To receive regular information about new issues:

Subscribe to englisp

Click to join IATET

Click to join IATET

Click to join MedicalESL

Click to join MedicalESL




Business English, Professional English, Legal English, Medical English, Academic English etc.
Online peer-reviewed Journal for Teachers

English for Specific Purposes World (ESP World)

English for Specific Purposes World

ISSN 1682-3257

English for Specific Purposes World (ESP World) Home    Information   ESP Encyclopaedia    Resources    Contacts


Joseba M. Gonzalez Ardeo

University of the Basque Country



The term testing is famous for its various uses depending on the author. For the purposes of this short paper it will be a synonym of formal assessment (Harris & McCann, 1994). However, the interpretations for the term formal assessment in the literature on the topic do not always coincide. While for some authors it refers to examinations, some others include all types of language tests under this heading. Then, for the sake of clarity, the terms formal assessment and examination will also be considered equivalent terms in this paper. They will have the following characteristics: are administered to many students, assess a broad range of language, are both backward- and forward-looking, are marked objectively, and are administered at the end of the course.

Bachman (1990) gives two main roles for tests, the role that tests play in making educational decisions within the context of an educational programme, and the role they play as indicators of abilities or attributes that are of interest in research on language, language acquisition and language teaching. If a test is regarded as important, its preparation will dominate all teaching and learning activities, and a beneficial backwash effect the effect of testing and learning will take place (Hughes, 1990). However, when tests are imposed, and this is our case, they may have the effect of compromising the naturalness of behaviour which the communicative approach aims to promote (Robinson, 1980).

Testing within a communicative framework should include tasks that involve realistic discourse processing and cover a range of enabling skills that have been previously identified as appropriate (Davies, 1986; Weir, 1993). Moreover, as Alderson (1988) states, when an ESP course is offered as a service course to other areas of study at university level this is, at least partially, our case, the test serves as a strong motivating force and is quite likely to influence teaching. Then, it is important that the test measures abilities and knowledge relevant to the students current content subjects or his/her future employment.


The particular circumstance that the teaching and learning operates in will give us useful clues about appropriate testing options, although it is necessary to keep in mind that there is no one or right way to assess learning of a language.

The main purpose of this paper consists of critically analyzing the type of examination administered to the student engineers of the Industrial Technical Engineering College in Bilbao (ITEC-BI). These students take English for Specific Purposes (ESP) courses. This type of course can focus on one or both two main branches of ESP, that is, English for Academic Purposes (EAP) or English for Occupational Purposes (EOP). According to McDonough (1984) the main offshoot of the former is English for Science and Technology (EST), but according to other authors (Hutchinson & Waters, 1987), EST is one of the branches of ESP, and it can be divided into EAP and EOP. Nevertheless, more recently, Dudley-Evans and St John (1998) consider that such taxonomies create problems since they cannot capture the essential fluid of the various types of ESP teaching. Their proposal consists of presenting the teaching of English in a continuum that flows from basic courses of English for General Purposes to highly specific ESP courses. Our own perception of the matter is closer to the idea of EAP and EOP being somehow overlapping. In other words and trying to focus our attention on the student engineers mentioned above, our courses are designed to meet both current and, to a certain extent, future linguistic needs of our students.

In science and technology, a barrier to full access by European citizens is that English has become de facto the international language of science and technology (Laver & Roukens, 1996), so there is an obvious pressing need for English at any technical level, and our students are aware of this situation. These same students have to face this fact while they are students, since lecturers of subjects other than English include in their reading lists books, papers, handbooks, journals, etc. written in English. Obviously, once they leave the College and enter the labour market, one of their most valuable resources will be English. So far then, the overlap between current and future needs seems to be irrefutable evidence. Current needs can be rather easily evaluated and gathered, and as a consequence lectured, but future needs are less predictable if the wide range of potential posts an engineer can take up is considered.

Achievement tests are based on a syllabus, and this can be arranged after carrying a thorough needs analysis. Proficiency tests do not look back to learning; rather they look forward to a future language using situation. In the case of an engineering student, they should be related to specific professional situations where English is needed. The key word here is situations since this will mean all the relevant language skills in as an authentic way as possible and therefore reflect the requirements of the target situation. Moreover, important academic and career decisions are based on proficiency tests.

Communicative Language Teaching (CLT) strives to emulate real language use, and then communicative tests aim to do the same, and consist of test items of real language use. Communicative language testing tasks should be structured in such a way that they show the capacity of a student for implementing the knowledge of the language in communicative language use. Examples of these tasks could be: an authentic reading with some transfer of information, writing a note with instructions about some aspect of a mechanism operation, listening to a welder talking about different welding methods to find the most appropriate one for a situation previously stated, giving someone spoken instructions for how to carry out an experiment in a chemistry laboratory.

If these tasks become tests, they are called third generation tests since both the texts used and the tasks set aim to be authentic, and the tasks are contextualized by their very nature as authentic. The tasks and the tests done have clear reference in reality, and assess integrative language. Nevertheless, communicative testing of the productive skills gives rise to language that has to be assessed subjectively. Moreover, the necessary time span needed for testing communicatively large groups is enormous, this sometimes meaning that more than one assessor is necessary.

Our global idea of testing fits in what Markee (1984) calls pragmatic language testing or any task in which the linguistic skills are integrated, and which challenges the learners developing grammatical system in real time.


Amongst the many different types of tests administered to the student engineers of the ITEC-BI, one frequently included in examinations consists of a cloze test. The one presented here has already been utilized.

Fill in the gaps below using those words you consider most appropriate from the list provided one word per gap-.


































































































































Ferric oxide, Fe2 O3, is mixed with coke and (1) and introduced into a blast furnace. The blast furnace is a tall structure about 100 ft. high and 20 ft. in diameter at the widest part. It contains a (2) lining inside a steel shell, and a blast of hot air can be introduced low down in the furnace through several pipes known as (3).

A (4) at the bottom of the furnace serves to hold the molten iron and (5) until these can be run off. The mixture of ore, coke and limestone is fed (6) continuously from the top, and a blast furnace, once (7), is kept going for months at a time, until repairs are necessary or work lacking.

Steel is hard, tough and strong. If cooled gradually, steel can subsequently be hammered into (8) or drilled, because it is fairly soft. By heating it and suddenly cooling it, the steel becomes very hard indeed, of very high (9) strength, and elastic. By reheating the steel to carefully regulated temperatures, steel of different (10) of hardness and brittleness can be obtained. This is called (11).

Rayon was made by dissolving cellulose in a solution of sodium hydroxide, or (12) soda, as it is usually called. The cellulose was obtained from (13) wood pulp. The dissolved cellulose was formed into threads by forcing it through a (14) (a metal plate with holes in it) in a setting bath of (15) sulphuric acid. The threads were drawn from the setting bath, washed, then dried on a heated roller and finally (16) on to a bobbin.

Keeping in (17) the marvellous technical progress which has been made during 100 years of car history, it is simply incomprehensible why petrol is still used as a fuel, since it is a highly (18) substance, which contains carcinogenic (19) and aromatics, whose fumes constitute a health (20) even before they are burnt in an engine, for example at a (21) station. They also contain sulphur and heavy metals and are so highly flammable that it has cost thousands of lives.

There are certain safety problems with nuclear waste. Nuclear waste means any unusable radioactive material. Waste from a nuclear plant can be used (22), usually water. Wastewater is normally discharged, or (23), into the sea. Sometimes it is discharged into a large river or lake. Such water contains low levels of radioactivity.

Spent fuel, or used fuel, contains a small amount of fission products. These (24) products are highly radioactive. Highly radioactive waste must be contained and (25). It must be carefully stored. It must not (26) or escape into the environment.

The task set is to restore the missing words in order to create the whole. There are many variations on the basic cloze test but it is always expected the text to make sense, and to be expressed grammatically.

Cloze procedure in testing involves deleting words from a text at intervals. Words may be deleted at regular intervals, or particular classes of words may be deleted, or the learner may be offered multiple choice answers to the gaps. Our text is a particular version of the latter. The words deleted, which by the way have not been deleted at random but considering attainment targets, are included in a list together with terms used in other units of the syllabus, thus the multiple choice answers being expanded to all and every gap in the text, and to all and every term in the list.

Once the test has been presented, our next step consists of analyzing whether this type of test follows the fundamental principles of testing or not. West (1990) provides a good summary of these principles. This author describes them in pairs, so that the opposition between members of a pair indicates some sort of tension in language testing: the more that a test confirms to one member of a pair, the less likely is to exhibit characteristics of the other member of the pair.


It was Chomsky (1965) who drew this distinction. When we only focus on competence the speaker-listeners knowledge of the target language we compare performance with that of an idealized speaker-listener in a homogenous speech community, unaffected by aspects of performance such as memory limitations, distractions, errors, false starts, etc.

In the 1970s, the concept of communicative competence arose and, as a consequence, the need to give learners practice in performance (Shohamy, 1996). Despite the fact that our engineering students are immersed in a communicative approach, this particular part of their examination cloze test rests on competence more than on performance. In fact, the students check their answers and make changes within the time allotted. This possibility of rethinking your sentences establishes a more or less diffuse barrier. It can therefore be stated that our test concentrates more on competence.


This distinction is due to Widdowson (1978). He said that in the normal circumstances of daily life, we are generally required to use our knowledge of the language system in order to achieve some kind of communicative purpose, but the type of output one may expect from a student who has been subjected to a particular kind of instruction, and who will therefore be asked to produce sentences to illustrate his/her level of target language acquisition, is a clear example of usage.

Widdowson argues that performance teaching, and therefore performance testing, require examples of language use, not usage. Then, despite our efforts to improve the communicative competence of our students, this type of test seems to contradict the spirit of our teaching actions. However, certain equilibrium between use and usage is sought in the examinations, that is why the different parts examinations consist of complement each other.


Testing which assesses competence without eliciting performance is known as indirect testing. Our cloze test (+ multiple choices) fits this description, since language is assessed without any production of language use from the learner.

Direct tests use testing tasks of the same type as language tasks in the real world. Some of the tasks included in our examinations try to check communicative competence and performance by simulating real communicative cases, but this cloze test cannot be included within this group.


In general, indirect assessment only tests one small part of the language. Each item is known as a discrete-point item. In theory, if there are enough items, the test gives a good indication of the learners underlying competence. This contrasts sharply with the moves in communicative assessment, which sees language use as basically indivisible.

When the items require ability to combine knowledge of different parts of the language, they are known as integrative or global.

At first sight, our cloze test seems to be full of discrete-point items but a close scrutiny reveals that the student has to sieve a long list of words until only one (the most appropriate from a linguistic and/or technical point of view) is left. This activity requires deep linguistic skills to discriminate words on grounds of their function, meaning, etc.


Objective assessment refers to test items that can be marked clearly as right or wrong. A clear example is a multiple choice item.

Subjective assessment requires that an assessor makes a judgement according to some criteria and experience. The difficulty arises in trying to achieve some agreement over marks, both between different markers and with the same marker at different times. Most integrative test elements require subjective assessment, but our cloze test could be considered an exception since, in our opinion, it is highly integrative and simultaneously it can be assessed very objectively.


Receptive skills (reading and listening) lend themselves to objective marking, while productive skills (speaking and writing) are generally resistant to objective marking.

How to be as objective as possible when marking and to include the productive skills into tests at the same time seem to be the main drawbacks of the system.

Our cloze test has apparently overcome these differences since it is objective and one productive skill is included to a certain extent


Disembodied language has little or no context. Integrative items need full context in order to function. In our cloze test the context includes information about the topic, a topic the students are familiar with in their content courses and a list of words they have been presented through the ESP course or were (should be) part of their linguistic background. Then, we could describe our test as a contextualized one.


Norm-referenced tests compare students with an average mark or a passing score, in order to make some type of pass/fail judgement on them.

Criterion-referenced assessment compares students with success in performing a task. The result of a criterion-referenced test could be expressed as follows: S/he is able to The ability may refer to some small or large integrative language task.

We do not express the result of our cloze test by means of a statement such as the one mentioned above, rather we express results in numbers, these representing degrees of pass or fail. This is due to the fact that in our institution, the ITEC-BI, results must be presented using figures. However, this does not mean that we could not express those results using a sentence.


Reliability refers to the consistency of the scoring of the test, both between different raters, and between the same rater on different occasions. Then, in theory, objective testing should give perfect reliability.

The subjective assessment inevitably associated with testing the productive skills reduces reliability. Then, we could choose not to assess all the skills but, obviously, this would reduce the validity of the test. In other words, it would not seem to be a good, fair or adequate test of the language.

Amongst the different types of validity mentioned in the literature on the topic, our cloze test reflects the language of the syllabus as well as the students language needs; therefore the so-called content validity is reached to an important extent. On the other hand, construct validity, or the extent to which our cloze test reflects current or valid theories of language learning/testing, can be described as high. Predictive validity or the extent to which the test assesses future language performance accurately is more difficult to express and somehow it will be connected to the needs analysis carried out prior to the ESP courses. Finally, with respect to concurrent validity, or the extent to which the test produces results similar to those produced by an established test, we can state that although we consider this type of test a valid one, we do not know any established test we can compare it with.


We are conscious that this kind of cloze test is mentally demanding since a lot of skill, patience, effort, etc. is needed for completing it successfully but we are also sure that our students effort when in the labour market will not differ very much from this one when their jobs require them to use all their skills to the maximum.

Whether one or other of the nine pairs of characteristics presented in this paper is perceived as good or desirable for a given test will depend on the testing purpose and the actual context of the test. As far as the first pair is concerned, our test focuses more on competence because, among other reasons, our students are thus judged on linguistic as well as technical accuracy. Usage is doubtlessly the characteristic chosen from the second pair, although other elements of the examinations try to balance usage against use. In the third pair, indirect testing is clearly the main focus of attention. The items from pair number four achieve certain equilibrium. Objective assessment is an important issue at University. That is why one of the reasons why our marking at University should be as objective as possible is the repercussions our verdict may have in the career of our students, since to pass or not to pass a single examination may mean to graduate or not to graduate. Pair number six also reaches certain equilibrium between receptive and productive skills. In pair number seven, it is obvious that the language is contextualized in our cloze tests, an asset in the communicative approach. Again, in pair number eight we are closer to one of the characteristics, norm-referenced assessment, but the other one would also be possible. Finally, with respect to the last pair, the reliability of the cloze test is high and its validity, as a whole, can be considered as moderately high.

In short, this type of cloze test represents an objective testing technique which apart from being integrative, since a large number of items are evaluated where a wide stylistic, linguistic and semantic context is provided for each item, it works beyond the mere level of a sentence and the learner has to use a wide range of sub-skills to be able to complete the task satisfactorily.


[1] Alderson, J. C. (1988). Testing and its Administration in ESP in Chamberlain, D. & R. J. Baumgardner (eds.). ESP in the Classroom: Practice and Evaluation. London: Modern English Publications & The British Council.

[2] Bachman, L. (1990). Fundamental Considerations in Language Testing. Oxford: OUP.

[3] Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge: MIT.

[4] Davies, A. (1986). Indirect ESP Testing: Old Innovations in Portal, M. (ed.). Innovation in Language Testing. London: NFER-Nelson.

[5] Dudley-Evans, T. & M. J. St John (1998). Developments in English for Specific Purposes: A multi-disciplinary Approach. Cambridge: CUP.

[6] Hutchinson; T. & A. Waters (1987). English for Specific Purposes: A Learning-centred Approach. Cambridge: CUP.

[7] Harris, M. & P. McCann (1994). Assessment. Oxford: Heinemann.

[8] Hughes, A. (1990). Testing for Language Teachers. Cambridge: CUP.

[9] Laver, J. & J. Roukens (1996). The global Information Society and Europes Linguistic and Cultural Heritage in Hoffmann, C. (ed.). Language, Culture and Communication in Contemporary Europe. Clevedon: Multilingual Matters, 1-27.

[10] Markee, N. (1984). The Methodological Component in ESP Operations. The ESP Journal 3, 3-16.

[11] McDonough, J. (1984). ESP in Perspective. A Practical Guide. London: Colling ELT.

[12] Robinson, P. C. (1980). ESP (English for Specific Purposes). Oxford: Pergamon.

[13] Shohamy, E. (1996). Competence and Performance in Language Testing in Brown, G. et al. (eds.). Performance and Competence in Second Language Acquisition. Cambridge: CUP.

[14] Weir, C. (1993). Understanding and Developing Language Tests. Hemel Hempstead: Prentice Hall International.

[15] West, R. (1990). Introduction and Principles of Language Testing. Manchester: University of Manchester SEDE.

[16] Widdowson, H. (1978). Teaching Language as Communication. Oxford: OUP.


English for Specific Purposes World (ESP World) Home    Information    Contents    ESP Encyclopaedia    Resources    Contacts

free counters


Copyright 2002-2012 TransEarl Co. Ltd. All Rights Reserved.