Test
and testing
Simply
put a test is a measuring tool or instrument in education. More specifically, a
test is considered to be a kind or class
of measurement device typically used to find out something about a person. Most
of the times, when you finish a lesson or lessons in a week, your teacher gives
you a test. This test is an instrument given to you by the teacher in order to
obtain data on which you are judged. It is an educationally common type of
device which an individual completes himself or herself, the intent is to
determine changes or gains resulting from such instruments as inventory,
questionnaire, opinionnaire, scale etc. Testing on the other hand is the
process of administering the test on the pupils. In other words the process of
making you or letting you take the test in order to obtain a quantitative
representation of the cognitive or non-cognitive traits you possess is called
testing. So the instrument or tool is the test and the process of administering
the test is testing.
3.1.2
Assessment
Now
that you have learnt the difference between test and testing, Let us move on to
the next concept which is assessment. As a teacher, you will be inevitably
involved in assessing learners; therefore you should have a clear knowledge and
the meaning of assessment. The term assess is derived from a Latin word
“asoidere” meaning “to sit by” in judgment. There are many definitions and
explanations of assessment in education. Let us look at few of them.
i.
Freeman and Lewis (1998) to assess is to judge the extent of students’
learning.
ii.
Rowntree (1977): Assessment in education can be thought of as occurring
whenever one person, in some kind of interaction, direct or indirect, with
another, is conscious of obtaining and interpreting information about the
knowledge and understanding, of abilities and attitudes of that other person.
To some extent or other, it is an attempt to know the person.
iii.
Erwin, in Brown and Knight, (1994). Assessment is a systematic basis for making
inference about the learning and development of students… the process of
defining, selecting, designing, collecting, analyzing, interpreting and using
information to increase students’ learning and development. You will have to
note from these definitions that
Assessment
is a human activity.
Assessment
involves interaction, which aims at seeking to understand what the learners
have achieved.
Assessment
can be formal or informal.
Assessment
may be descriptive rather than judgment in nature.
Its
role is to increase students’ learning and development
It
helps learners to diagnose their problems and to improve the quality of their
subsequent learning.
3.1.3
Measurement
This
is a broad term that refers to the systematic determination of outcomes or
characteristics by means of some sort of assessment device. It is a systematic
process of obtaining the quantified degree to which a trait or an attribute is
present in an individual or object. In other words it is a systematic
assignment of numerical values or figures to a trait or an attribute in a
person or object. For instance what is the height of Uche? What is the weight
of the meat? What is the length of the classroom? In education, the numerical
value of scholastics ability, aptitude, achievement etc can be measured and
obtained using instruments such as paper and pencil test. It means that the
values of the attribute are translated into numbers by measurement.
3.1.4
Evaluation
According
to Tuckman (1975) evaluation is a process wherein the parts, processes, or
outcomes of a programme are examined to see whether they are satisfactory,
particularly with reference to the stated objectives of the programme, our own
expectations, or our own standards of excellence. According to Cronbach et al
(1980) evaluation means the systematic examination of events occurring in and
consequent on a contemporary programme. It is an examination conducted to
assist in improving this programme and other programmes having the same general
purpose. For Thorpe (1993) evaluation is the collection analysis and
interpretation of information about training as part of a recognized process of
judging its effectiveness, its efficiency and any other outcomes it may have.
If you study these definitions very well, you will note that evaluation as an
integral part of the instructional process involves three steps. These are
i.
Identifying and defining the intended outcomes.
ii.
Constructing or selecting tests and other evaluation tools relevant to the
specified outcomes, and
iii.
Using the evaluation results to improve learning and teaching. You will also
note that evaluation is a continuous process. It is essential in all fields of
teaching and learning activity where judgment need to be made.
3.2
Types of Evaluation
The
different types of evaluation are: placement, formative, diagnostic and
summative evaluations.
3.2.1
Placement Evaluation
This
is a type of evaluations carried out in order to fix the students in the
appropriate group or class. In some schools for instance, students are assigned
to classes according to their subject combinations, such as science, Technical,
arts, Commercial etc. before this is done an examination will be carried out.
This is in form of pretest or aptitude test. It can also be a type of
evaluation made by the teacher to find out the entry behaviour of his students
before he starts teaching. This may help the teacher to adjust his lesson plan.
Tests like readiness tests, ability tests, aptitude tests and achievement tests
can be used.
3.2.2
Formative Evaluation
This
is a type of evaluation designed to help both the student and teacher to
pinpoint areas where the student has failed to learn so that this failure may
be rectified. It provides a feedback to the teacher and the student and thus
estimating teaching success e.g. weekly tests, terminal examinations etc.
3.2.3
Diagnostic Evaluation
This
type of evaluation is carried out most of the time as a follow up evaluation to
formative evaluation. As a teacher, you have used formative evaluation to
identify some weaknesses in your students. You have also applied some
corrective measures which have not showed success. What you will now do is to
design a type of diagnostic test, which is applied during instruction to find
out the underlying cause of students persistent learning difficulties. These diagnostic
tests can be in the form of achievement tests, performance test, self rating,
interviews observations, etc.
3.2.4
Summative evaluation:
This
is the type of evaluation carried out at the end of the course of instruction
to determine the extent to which the objectives have been achieved. It is
called a summarizing evaluation because it looks at the entire course of
instruction or programme and can pass judgment on both the teacher and
students, the curriculum and the entire system. It is used for certification.
Think of the educational certificates you have acquired from examination bodies
such as WAEC, NECO, etc. These were awarded to you after you had gone through
some types of examination. This is an example of summative evaluation.
The
Purpose of Measurement and Evaluation.
The
main purposes of measurement and evaluation are:
i.
Placement of student, which involves bringing students appropriately in the
learning sequence and classification or streaming of students according to
ability or subjects.
ii.
Selecting the students for courses – general, professional, technical,
commercial etc.
iii.
Certification: This helps to certify that a student has achieved a particular
level of performance.
iv.
Stimulating learning: this can be motivation of the student or teacher,
providing feedback, suggesting suitable practice etc.
v.
Improving teaching: by helping to review the effectiveness of teaching
arrangements.
vi.
For research purposes.
vii.
For guidance and counseling services.
viii.
For modification of the curriculum purposes.
ix.
For the purpose of selecting students for employment
x.
For modification of teaching methods.
xi.
For the purposes of promotions to the student.
xii.
For reporting students progress to their parents.
xiii.
For the awards of scholarship and merit awards.
xiv.
For the admission of students into educational institutions.
xv.
For the maintenance of students.
4.0
CONCLUSION
Now
that you have gone through the descriptions of the major terms used in
measurement and evaluation and you can give the purposes of measurement and
evaluation as well as explain the types of evaluation, you have placed yourself
on a good footing for the study of this all important course which you can not
do without as a teacher.
5.0
SUMMARY
In
general, those practitioners in the educational system are most of the times
interested in ascertaining the outputs of the educational programme. Output is
counted in terms of test results which are naturally expressed in quantitative
indices such as scores or marks. Test, which is a device, an instrument or a
tool consisting of a set of tasks or questions, is used to obtain the results.
Test can be in the form of pen and paper examination, assignments, practical
etc. The process of administering this test is called testing. But an act of
measurement is done when we award marks to an answer paper or assignment. So
measurement gives the individuals ability in numerical indices of scores i.e.
measurement is quantitative. Assessment can be seen as the engine that drives
and shapes learning, rather than simply an end of term examination that grades
and reports performance. Evaluation is expressed in qualitative indices such as
good, excellent pass or fail. Value judgment is therefore attached to the
measurement. Evaluation can be placement, formative, diagnostic or summative.
Evaluation,
Measurement and Testing
Bachman (1990), quoting Weiss (1972) defines
evaluation as “the systematic gathering of information for the purpose of
making decisions”. Lynch (2001) adds the fact that this decision or judgment is
to be about individuals. In this conceptualization, both authors agree that
evaluation is the superordinate term in relation to both measurement and
testing. Assessment is sometimes used interchangeably for evaluation. The systematic
information can take many forms, but these forms are either quantitative or
qualitative. This is what distinguishes measures from qualitative descriptions.
Measurement is thus concerned with quantification. Language proficiency, like
many other constructs and characteristics of persons in social sciences, needs
to be quantified before any judgments can be made about it. This process of
quantifying is called operationalization in research by which we mean assigning
numbers according to observable operations and explicit procedures or rules to
measure a construct (Bachman 1990) (Ary et al. 1996) The third component in
this model is testing, which consists of the use of actual tests to elicit the
desired behavior. Carroll (1968) defines a test as:
“A PSYCHOLOGICAL OR EDUCATIONAL TEST IS A PROCEDURE DESIGNED TO ELICIT
CERTAIN BEHAVIOR FROM WHICH ONE CAN MAKE INFERENCES ABOUT CERTAIN
CHARACTERISTICS OF AN INDIVIDUAL”.
Bachman (1990) observes that a test is
one type of measurement instrument, and thus necessarily quantifies
characteristics of individuals according to explicit procedures. Bachman
(1990), then, concludes that there are other types of measurement than tests,
and the difference is that a test is designed to obtain a specific sample of
behavior. For the purpose of schematic representation, the three concepts of
evaluation, measurement and testing have traditionally been demonstrated in
three concentric circles of varying sizes. This is what Lynch (2001) has
followed in depicting the relationship among these concepts.
Bachman (1990) has
represented the relationship in a somewhat different way. The goal has been to
extend the model to include not only language testing but also language
teaching, language learning and language research domains. Figure 2 depicts
this extended view of the relationship among evaluation, measurement and
testing. The areas numbered from 1 to 5 show the various forms of this
relationship.
Area 1- Evaluation not
involving either tests or measures; for example, the use of qualitative
descriptions of student performance for diagnosing learning problems.
Area 2- A non-test measure
for evaluation; for example, teacher ranking used for assigning grades.
Area 3- A test used for
purposes of evaluation; for example, the use of an achievement test to
determine student progress.
Area 4- Non-evaluative use
of tests and measures for research purposes; for example, the use of a
proficiency test as a criterion in second language acquisition research.
Area 5- Non-evaluative
non-test; for example, assigning code numbers to subjects in second language
research according to native language.
The
word 'evaluation' is often confused with testing and measurement. therefore,
many a time teachers who give a test to the students, think that they are evaluating
the achievement of the students. Testing is only a technique to collect
evidence regarding pupil behaviour. Measurement on the other hand, is limited
to quantitative description of the pupil behaviour. Evaluation is a more
comprehensive term which includes testing and measurement and also qualitative
description of the pupil behaviour. It also includes value judgment regarding
the worth or desirability of the behaviour measured or assessed. Therefore,
Gronlund (1981) has indicated this relationship in the following equation:
Evaluation =
quantitative description of pupils (measurement) + value judgment
Evaluation =
qualitative description of pupils (non-measurement) + value judgment
WHAT ARE MCQs
Multiple choice is a form of
assessment in which respondents are asked to select the best possible answer
(or answers) out of the choices from a list. The multiple choice format is most
frequently used in educational testing, in market research, and in elections--
when a person chooses between multiple candidates, parties, or policies.
Multiple choice testing is particularly popular in the United States.
Although E. L. Thorndike developed an early
multiple choice test, Frederick J. Kelly was the first to use such items as
part of a large scale assessment.
STRUCTURE Multiple choice items consist of a stem and a set of options. The stem is the beginning part of the item that presents the item as a problem to be solved, a question asked of the respondent, or an incomplete statement to be completed, as well as any other relevant information. The options are the possible answers that the examiner can choose from, with the correct answer called the key and the incorrect answers called distractors.
Only one answer can be keyed as correct. This
contrasts with multiple response items in which more than one answer may be
keyed as correct.
Usually, a correct answer earns a set number
of points toward the total mark, and an incorrect answer earns nothing.
However, tests may also award partial credit for unanswered questions or
penalize students for incorrect answers, to discourage guessing. For example,
the SAT removes a quarter point from the test taker's score for an incorrect
answer.
WHY MULTIPLE-CHOICE ITEMS ARE GOOD
1. Multiple-choice items can be used to measure
learning outcomes at almost any level. This is the big one, and we have
mentioned it before. This allows multiple-choice items to be very flexible and
to be useful anytime you are sure that test takers can adequately read, and
understand, the content of the question.
2.
They are clear and straightforward. Well-written
multiple-choice items are very clear, and what is expected of the test taker is
clear as well. There’s usually no ambiguity (how many pages should I write, can
I use personal experiences, etc.) about answering the test questions.
3.
No writing needed. Well, not
very much anyway, and that has two distinct advantages. First, it eliminates
any differences between test takers based on their writing skills. And, second,
it allows for responses to be completed fairly quickly, leaving more time for
more questions. You should allot about 60 seconds per multiple-choice question
when designing your test.
4.
The effects of guessing are minimized,
especially when compared to true-false items. With four or five options,
the likelihood of getting a well-written item correct by chance alone (and
that’s exactly what guessing is) is anywhere between 20% and 25%.
5.
Multiple-choice items are easy to
score, and the scoring is reliable as well. If this is the case, and you
have a choice of what kind of items to use, why not use these? Being able to
bring 200 bubble scoring sheets to your office’s scoring machine and having the
results back in 5 minutes sure makes life a lot easier. And, when the scoring
system is more reliable and more accurate, the reliability of the entire test
increases.
6.
Multiple-choice items lend themselves
to item analysis. We’ll talk shortly about item analysis, including how
to do it and what it does. For now, it’s enough to understand that this
technique allows you to further refine multiple-choice items so that they
perform better and give you a clearer picture of how this or that item performed
and if it did what it was supposed to do. For this reason, multiple-choice
items can be diagnostic tools to tell you what test takers understand and what
they do not.
Advantages
of Multiple-Choice Items
|
Disadvantages
of Multiple-Choice Items
|
•
They
can be used to measure learning outcomes at almost any level.
|
•
They
take a long time to write.
|
•
They
are easy to understand (if well written, that is).
|
•
Good
ones are difficult to write.
|
•
They
deemphasize writing skills.
|
•
They
limit creativity.
|
•
They
minimize guessing.
|
•
They
may have more than one correct answer.
|
•
They
are easy to score.
|
|
•
They
can be easily analyzed for their effectiveness.
|
|
Why Multiple-Choice Items Are
Not So Good
1. Multiple-choice items take a long time to
write. You can figure on anywhere between 10 and 20 minutes to write a
decent first draft of a multiple-choice item. Now, you may be able to use this
same item in many different settings, and perhaps for many different years,
but nonetheless it’s a lot of work. And, once these new items are administered
and after their performance analyzed, count on a few more minutes for revision.
2. Good multiple-choice items are not easy to
write. Not only do they take a long time, but unless you have very good
distracters (written well, focused, etc.), and you include one correct answer,
then you will get test takers who can argue for any of the alternatives as
being correct (even though you think they are not), and they can sometimes do
this pretty persuasively.
3. Multiple-choice items do not allow for
creative or unique responses. Test takers have no choice as to how to
respond (A or B or C or D). So, if there is anything more they would like to
add or show what they know beyond what is present in the individual item, they
are out of luck!
4. The best test takers may know more than you!
Multiple-choice items operate on the assumption that there is only one
correct alternative. Although the person who designs the test might believe
this is true, the brightest (student and) test taker may indeed find something
about every alternative, including the correct one, that is flawed.
TYPES OF MULTIPLE CHOICE QUESTIONS: Multiple-Choice Items: More
Than Just “Which One Is Correct”
There
are many types of Multiple Choice Questions, some are being discussed below:
1.
Best-answer
multiple-choice items. These
are multiple-choice items where there may be more than one correct answer, but
only one of them is the best of all the correct ones.
2.
Rearrangement
multiple-choice items. Here’s
where the test taker arranges a set of items in sequential order, be it steps
in a process or the temporal sequence in which something might have occurred or
should occur.
3.
Interpretive
multiple-choice items. Here,
the test taker reads through a passage and then selects a response where the
alternatives (and the correct answer) all are based on the same passage. Keep
in mind that although this appears to be an attractive format, it does place a
premium on reading and comprehension skills.
4.
Substitution
multiple-choice items. This
is something like a short answer or completion item , but there are alternatives
from which to select. The test taker selects those responses from a set of
responses that he or she thinks answers the question correctly.
The
Role of Assessment in Teaching
Assessing student
learning is something that every teacher has to do, usually quite frequently.
Written tests, book reports, research papers, homework exercises, oral
presentations, question-and-answer sessions, science projects, and artwork of
various sorts are just some of the ways in which teachers measure student
learning, with written tests accounting for about 45 percent of a typical
student's course grade (Green & Stager, 1986/1987). It is no surprise,
then, that the typical teacher can spend between one-third and one-half of her
class time engaged in one or another type of measurement activity (Stiggins,
1994). Yet despite the amount of time teachers spend assessing student learning,
it is a task that most of them dislike and that few do well. One reason is that
many teachers have little or no in-depth knowledge of assessment principles
(Crooks, 1988; Hills, 1991; Stiggins, Griswold, & Wikelund, 1989). Another
reason is that the role of assessor is seen as being inconsistent with the role
of teacher (or helper). Since teachers with more training in assessment use
more appropriate assessment practices than do teachers with less training
(Green & Stager, 1986/1987), a basic goal of this chapter is to help you
understand how such knowledge can be used to reinforce, rather than work
against, your role as teacher. Toward that end, we will begin by defining what
we mean by the term assessment and by two key elements of this process,
measurement and evaluation.
What is Assessment?
Broadly conceived,
classroom assessment involves two major types of activities: collecting
information about how much knowledge and skill students have learned
(measurement) and making judgments about the adequacy or acceptability of each
student's level of learning (evaluation). Both the measurement and evaluation
aspects of classroom assessment can be accomplished in a number of ways. To
determine how much learning has occurred, teachers can, for example, have students
take exams, respond to oral questions, do homework exercises, write papers,
solve problems, and make oral presentations. Teachers can then evaluate the
scores from those activities by comparing them either to one another or to an
absolute standard (such as an A equals 90 percent correct). Throughout much of
this chapter we will explain and illustrate the various ways in which you can
measure and evaluate student learning.
WHAT ARE SOME TYPES OF ASSESSMENT?
There are many alternatives to traditional standardized tests that offer a variety of ways to measure student understanding.
In the early theories of learning, it was
believed that complex higher-order thinking skills were acquired in small
pieces, breaking down learning into a series of prerequisite skills. After
these pieces were memorized, the learner would be able to assemble them into
complex understanding and insight -- the puzzle could be arranged to form a
coherent picture.
Today, we know learning requires that the
learner engage in problem-solving to actively build mental models. Knowledge is
attained not just by receiving information, but also by interpreting the
information and relating it to the learner's knowledge base. What is important,
and therefore should be assessed, is the learner's ability to organize,
structure, and use information in context to solve complex problems.
STANDARDIZED ASSESSMENT
Almost every school district now administers
state-mandated standardized tests. Every student at a particular grade level is
required to take the same test. Everything about the test is standard -- from
the questions themselves, to the length of time students have to complete it
(although some exceptions may be made for students with learning or physical
disabilities), to the time of year in which the test is taken. Throughout the
country, and with the passage of the Elementary and Secondary Education Act,
commonly known as the No Child Left Behind Act (which requires research-based
assessment), student performance on these tests has become the basis for such
critical decisions as student promotion from one grade to the next, and
compensation for teachers and administrators.
Standardized tests should not be confused
with the standards movement, which advocates specific grade-level content and performance
standards in key subject areas. Often, in fact, standardized tests are not
aligned with state and district content standards, causing considerable
disconnect between what is being taught and what is being tested.
In the spring of 2009, an initiative was
created to develop a set of standards for all states in the United States
to adhere to. The Common Core State Standards Initiative (CCSS), as it has
become known, is still an evolving movement. The vast majority of states have
pledged to adopt the standards and implement them by 2015. Standards for
English language arts and mathematics were published in 2010, while standards
for science and social studies are still in development. Visit Edutopia's
Common Core State Standards Resource page for more information about the
standards.
The questions then become: What is
evidence-based assessment? Is it standardized tests? Is it portfolios? If
portfolios are a part of evidence-based assessment, what else is necessary?
Reflections? Work samples? Best work?
FORMAL ASSESSMENT
Some formal assessments provide teachers with a systematic way to evaluate how well students are progressing in a particular instructional program. For example, after completing a four- to six-week theme, teachers will want to know how well students have learned the theme skills and concepts. They may give all the students a theme test in which students read, answer questions, and write about a similar theme concept. This type of assessment allows the teacher to evaluate all the students systematically on the important skills and concepts in the theme by using real reading and writing experiences that fit with the instruction. In other situations, or for certain students, teachers might use a skills test to examine specific skills or strategies taught in a theme.
INFORMAL ASSESSMENT
Other forms of authentic assessment are more informal, including special activities such as group or individual projects, experiments, oral presentations, demonstrations, or performances. Some informal assessments may be drawn from typical classroom activities such as assignments, journals, essays, reports, literature discussion groups, or reading logs. Other times, it will be difficult to show student progress using actual work, so teachers will need to keep notes or checklists to record their observations from student-teacher conferences or informal classroom interactions. Sometimes informal assessment is as simple as stopping during instruction to observe or to discuss with the students how learning is progressing. Any of these types of assessment can be made more formal by specifying guidelines for what and how to do them, or they can be quite informal, letting students and teachers adjust to individual needs. In some situations, the teacher will want all students to complete the same assessments; in others, assessments will be tailored to individual needs.
ALTERNATIVE ASSESSMENT
Alternative assessment, often called
authentic, comprehensive, or performance assessment, is usually designed by the
teacher to gauge students' understanding of material. Examples of these
measurements are open-ended questions, written compositions, oral
presentations, projects, experiments, and portfolios of student work.
Alternative assessments are designed so that the content of the assessment
matches the content of the instruction.
Effective assessments give students feedback
on how well they understand the information and on what they need to improve,
while helping teachers better design instruction. Assessment becomes even more
relevant when students become involved in their own assessment. Students taking
an active role in developing the scoring criteria, self-evaluation, and goal
setting, more readily accept that the assessment is adequately measuring their
learning.
Authentic assessment can include many of the
following:
1-Observation 2-Essays 3-Interviews 4-Performance tasks
5-Exhibitions
and demonstrations 6-Portfolios 7-Journals 8-Teacher-created
tests
9-Rubrics 10-Self-
and peer-evaluation
Measurement
Measurement is the assignment
of numbers to certain attributes of objects, events, or people according to a
rule-governed system. For our purposes, we will limit the discussion to
attributes of people. For example, we can measure someone's level of typing
proficiency by counting the number of words the person accurately types per
minute or someone's level of mathematical reasoning by counting the number of
problems correctly solved. In a classroom or other group situation, the rules
that are used to assign the numbers will ordinarily create a ranking that
reflects how much of the attribute different people possess (Linn &
Gronlund, 1995).
Evaluation
Evaluation involves
using a rule-governed system to make judgments about the value or worth of a
set of measures (Linn & Gronlund, 1995). What does it mean, for example, to
say that a student answered eighty out of one hundred earth science questions
correctly? Depending on the rules that are used, it could mean that the student
has learned that body of knowledge exceedingly well and is ready to progress to
the next unit of instruction or, conversely, that the student has significant
knowledge gaps and requires additional instruction.
ALTERNATIVE RESPONSE QUESTIONS
Alternative
response questions are a special form of multiple choice question, where are
the learner has to choose between just two items. This does give the learner a
50% chance of guessing the correct answer, and so their learning value as
single questions is limited. There are instances where they are valid, such as
where are there really are only two possibilities, such as in or out, or up or
down.
Another form of
alternative response question is the true/ false or yes/no type. You present
the learner with a statement that they must judge to be true or false. These
still suffer from the 50% chance problem, so you need to design the question
carefully so that they do actually help learning.
Ways
to Evaluate Student Learning
Once you have collected all the measures you intend to collect -- for
example, test scores, quiz scores, homework assignments, special projects, and
laboratory experiments -- you will have to give the numbers some sort of value
(the essence of evaluation). As you probably know, this is most often done by
using an A to F grading scale. Typically, a grade of A indicates superior
performance; a B, above-average performance; a C, average performance; a D,
below-average performance; and an F, failure. There are two general ways to
approach this task. One approach involves comparisons among students. Such forms
of evaluation are called norm-referenced since students are identified
as average (or normal), above average, or below average. An alternative
approach is called criterion-referenced because performance is
interpreted in terms of defined criteria. Although both approaches can be used,
we favor criterion-referenced grading for reasons we will mention shortly.
NORM-REFERENCED GRADING
A norm-referenced grading system assumes that classroom achievement will
naturally vary among a group of heterogeneous students because of differences
in such characteristics as prior knowledge, learning skills, motivation, and
aptitude. Under ideal circumstances (hundreds of scores from a diverse group of
students), this variation produces a bell-shaped, or "normal,"
distribution of scores that ranges from low to high, has few tied scores, and
has only a very few low scores and only a very few high scores. For this
reason, norm-referenced grading procedures are also referred to as
"grading on the curve."
CRITERION-REFERENCED GRADING
A criterion-referenced grading system permits students to benefit from
mistakes and to improve their level of understanding and performance.
Furthermore, it establishes an individual (and sometimes cooperative) reward
structure, which fosters motivation to learn to a greater extent than other
systems.
Under a criterion-referenced system, grades are determined through
comparison of the extent to which each student has attained a defined standard
(or criterion) of achievement or performance. Whether the rest of the students
in the class are successful or unsuccessful in meeting that criterion is
irrelevant. Thus, any distribution of grades is possible. Every student may get
an A or an F, or no student may receive these grades. For reasons we will discuss
shortly, very low or failing grades tend to occur less frequently under a
criterion-referenced system.
A common version of criterion-referenced grading assigns letter grades on
the basis of the percentage of test items answered correctly. For example, you
may decide to award an A to anyone who correctly answers at least 85 percent of
a set of test questions, a B to anyone who correctly answers 75 to 84 percent,
and so on down to the lowest grade. To use this type of grading system fairly,
which means specifying realistic criterion levels, you would need to have some
prior knowledge of the levels at which students typically perform. You would
thus be using normative information to establish absolute or fixed standards of
performance. However, although norm-referenced and criterion-referenced grading
systems both spring from a normative database (that is, from comparisons among
students), only the former system uses those comparisons to directly determine
grades.
Criterion-referenced grading systems (and criterion-referenced tests)
have become increasingly popular in recent years primarily because of three
factors. First, educators and parents complained that norm-referenced tests and
grading systems provided too little specific information about student strengths
and weaknesses. Second, educators have come to believe that clearly stated,
specific objectives constitute performance standards, or criteria, that are
best assessed with criterion-referenced measures. Third, and perhaps most
important, contemporary theories of school learning claim that most, if not
all, students can master most school objectives under the right circumstances.
If this assertion is even close to being true, then norm-referenced testing and
grading procedures, which depend on variability in performance, will lose much
of their appeal.
The Concept of Measurement
What does it mean to measure something?
According to the National Council of Teachers of Mathematics (2000),
"Measurement is the assignment of a numerical value to an attribute of an
object, such as the length of a pencil. At more-sophisticated levels,
measurement involves assigning a number to a characteristic of a situation, as
is done by the consumer price index." An early understanding of
measurement begins when children simply compare one object to another. Which
object is longer? Which one is shorter? At the other extreme, researchers
struggle to find ways to quantify their most elusive variables. The example of
the consumer price index illustrates that abstract variables are, in fact,
human constructions. A major part of scientific and social progress is the
invention of new tools to measure newly constructed variables.
Assessment and Evaluation
Types of Assessment and Evaluation
Assessment and evaluation studies may take place
at the subject, department, or Institute level, and range in size and scope
from a pilot study to a complex project that addresses a number of different
topics, involves hundreds of students, and includes a variety of
methodologies. Typically, assessment efforts are divided into two types,
formative or summative. Below, each is described briefly along with a third
less frequently seen type called process assessment. Included, as well, is a
grid that classifies different assessment methodologies.
Formative Assessment implies that the results will be used in the formation and revision process of an educational effort. Formative assessments are used in the improvement of educational programs. This type of assessment is the most common form of assessment in higher education, and it constitutes a large proportion of TLL’s assessment work. Since educators are continuously looking for ways to strengthen their educational efforts, this type of constructive feedback is valuable.
Summative Assessment is used for the purpose of documenting outcomes and judging value. It is used for providing feedback to instructors about the quality of a subject or program, reporting to stakeholders and granting agencies, producing reports for accreditation, and marketing the attributes of a subject or program. Most studies of this type are rarely exclusively summative in practice, and they usually contain some aspects of formative assessment.
Process Assessment: begins with the identification of project milestones to be reached, activities to be undertaken, products to be delivered, and/or projected costs likely to be incurred in the course of attaining a project’s final goals. The process assessment determines whether markers have been reached on schedule, deliverables produced, and cost estimates met. The degree of difference from the expected plan is used to evaluate success.
Methods of Measuring Learning Outcomes Grid
How colleges and universities can measure and
report on the knowledge and abilities their students have acquired during their
college years is an issue of growing interest. The Methods of Measuring
Learning Outcomes Grid provides a way to categorize the range of methodologies
that can be used to assess the value added by a college education.
ASSESSMENT SYSTEM IN PAKISTAN Reliable
and accurate education statistics are a condition for sound educational
planning and management. The first ever Pakistan National Education Census
(NEC), 2005-06, was conducted by the Federal Ministry of Education and the
Statistics Division, Federal Bureau of Statistics. It covered 245,682
institutions, including public and private schools, colleges and universities,
professional institutions, vocational and technical centres, mosque schools,
deeni madaris, and non-formal education centres. A number of statistical tables
for the national and provincial levels were published. However, analysis of the
data could go further in order to generate education indicators describing the education
situation in Pakistan,
and develop analyses underpinned by findings and technical explanations.
Executive Summary
The National Education Census (NEC) of 2005/06 was the first
education census conducted in the history of Pakistan that was specifically
designed to collect information on all types of schools. It thus generated a
complete and comprehensive picture of the current education system in the
country, and provides a robust information baseline from which to measure
future progress. Through ensuring a complete listing of schools, it also
assists other education data collection activities in the field. Pakistan also
has a National Education Management Information System (NEMIS) which collects
education data annually. The system covers public education sector, but to date
has not comprehensively covered private sector educational provision. Since
some 31% of basic education students attend private schools, it is therefore
important that up-to-date information be made available on this sub-sector, to
ensure that policy development is based on knowledge of the entire education
system not just the public sector alone. A combination of the NEC and the NEMIS
shows that over 36 million students were attending and educational institution
in 2005/06. Just under 50% of those students (17.8 million) were studying at
the primary level, 20.9% (7.5 million) in pre-primary, 15.4% (5.6 million) in
middle elementary, 6.9% (2.5 million) in secondary, 2.5% (.9 million) in higher
secondary and 4.9% (1.8 million) at the postsecondary level. Pakistan has a Gross Enrolment Rate
(GER) at the primary level of almost 80% - (when all primary enrolment is measured against the population
5 to 9 years of age). The difference of 80% between the Net Enrolment Rate
(NER) of 62% and the GER is due to the number of primary students who are over
9 years of age or under 5 years of age. Given the number of repeaters in
primary grades and the incidence of students beginning their primary school
after age 5, it is likely that most of the difference is due to overage
students. Numerically, this means that over 2.5 million students in primary
school are over 9 years of age. Any reduction in this number, possibly by
decreasing the repetition rate, may open up places in the primary system for
some of children not currently in school.
Teacher-Made Test Construction
Teacher-made
test is the major basis for evaluating the progress or performance of the
students in the classroom. The teacher therefore, had an obligation to provide
their students with best evaluation.
1.
identify the types of teacher-made test;
2. draw general rules/guidelines in
constructing test that is applicable to all types of test;
3. explain how to score essay test in such a
way that subjectivity can be eliminated;
4. discuss and summarize the advantages and
disadvantages of essay and objective type of test;
5. enumerate and discuss other evaluative
instruments use to measure students’ performance; and
6. construct different types of test.
Steps in
Constructing Teacher-Made Test
1.
Planning the Test. In planning the test the following should be observed:
the objectives of the subjects, the purpose for which the test is administered,
the availability of facilities and equipments, the nature of the testee, the
provision for review and the length of the test.
2. Preparing
the Test. The process of writing good test items is not simple – it
requires time and effort. It also requires certain skills and proficiencies on
the part of the writer. Therefore, a test writer must master the subject matter
he/she teaches, must understand his testee, must be skillful in verbal
expression and most of all familiar with various types of tests.
3. Reproducing
the Test. In reproducing test, the duplicating machine and who will
facilitate in typing and mimeographing be considered.
4. Administering
the Test. Test should be administered in an environment familiar to the
students, sitting arrangements is observed, corrections are made before the
start of the test, distribution and collection of papers are planned, and time
should be written on the board. One more important thing to remember is, do not
allow every testee to leave the room except for personal necessity.
5. Scoring
the Test. The best procedure in scoring objective test is to give one point
of credit for each correct answer. In case of a test with only two or three
options to each item, the correction formula should be applied. Example: for
two option, score equals right minus wrong (S = R-W). For three options, score
equals right minus one-half wrong (S = R-1/2 W or S= R-W/2). Correction formula
is not applied to four or more options. If correction formula is employed
students should be informed beforehand.
6. Evaluating
the Test. The test is evaluated as to the quality of the student’s
responses and the quality of the test itself. Index difficulty and discrimination
index of the test item is considered. Fifty (50) per cent difficulty is better.
Item of 100 per cent and zero (0) per cent answered by students are valueless
in a test of general achievement.
7. Interpreting
Test Results. Standardized achievement tests are interpreted based on norm
tables. Table of norm are not applicable to teacher-made test.
Types of
Informal Teacher Made Test
I. Essay
Examination
Essay examination consists of questions where
students respond in one or more sentences to a specific question or problems.
It is a test to evaluate knowledge of the subject matter or to measure skills
in writing. It is also tests students’ ability to express his ideas accurately
and to think critically within a certain period of time. Essay examination
maybe evaluated in terms of content and form. In order to write good essay
test, it must be planned and constructed in advance. The questions must show
major aspect of the lesson and a representative samples. Avoid optional
questions and use large number of questions with short answer rather than short
question with very long answer.
Classroom Activities That Relate to Piaget's Theory of Cognitive Development
Jean Piaget, the
psychologist and philosopher said, "The principle goal of education in the
schools should be creating men and women who are capable of doing new things,
not simply repeating what other generations have done." Piaget developed a
theory of cognitive development that corresponds to his hope for the
educational process. The four segments of development include sensorimotor in
which children 2 and under learn using their senses and primitive
understanding. The second stage is preoperational in which children from 2 to 7
understand abstract symbols and language. The third level is concrete, where
children 7 to 11 reverse operations, order items, and maturely understand cause
and effect processes. The final stage of development is formal operations in
which children 12 and up think abstractly. Use Piaget's theory to design your
classroom activities.
Social interaction
shapes personality development, according to Danish psychoanalyst Erik
Erikson's theory of psychosocial development. From birth, a child creates an
emotional repertoire tied to her perceptions of her world's safety. Fear of new
experiences battles with exploratory instincts, and the winner depends on
whether a child feels safe. Teachers who know how to apply psychosocial
development in the classroom create a safe environment where each child feels
appreciated and comfortable exploring new knowledge and relationships rather
than letting fear inhibit learning.
Formative and Summative Assessments in the Classroom
Successful middle
schools engage students in all aspects of their learning. There are many
strategies for accomplishing this. One such strategy is student-led
conferences. As a classroom teacher or administrator, how do you ensure that
the information shared in a student-led conference provides a balanced picture
of the student's strengths and weaknesses? The answer to this is to balance
both summative and formative classroom assessment practices and information
gathering about student learning.
Assessment is a huge
topic that encompasses everything from statewide accountability tests to
district benchmark or interim tests to everyday classroom tests. In order to
grapple with what seems to be an over use of testing, educators should frame
their view of testing as assessment and that assessment is information. The
more information we have about students, the clearer the picture we have about
achievement or where gaps may occur.
NEEDS
FOR DEVELOPMENT OF NEAS
It is clear that Pakistan
is still a long way from achieving universal primary enrolment. As
indicated by the primary Net Enrolment
Rate (NER)'s estimate of 62% , over 35% of the population 5 to 9 years of age
is not in school Given a population of 5 to 9 years old of some 19.5 million,
this means that about 7 million children aged 5 to 9 are out of the education
system. Furthermore, under current conditions, the education system does not
provide for a substantial percentage of students to move beyond the primary
level. At present, the average enrolment per grade at the middle elementary
level is less than one-half the average enrolment per grade at the primary
level. This is considerably less than that of most other countries, and it is
clear that the delivery system needs to significantly increase the proportion
of students capable of studying beyond the primary level.
Pakistan has a
Gross Enrolment Rate (GER) at the primary level of almost 80% - (when all
primary enrolment is measured against the population 5 to 9 years of age). The
difference of 80% between the Net Enrolment Rate (NER) of 62% and the GER is
due to the number of primary students who are over 9 years of age or under 5
years of age. Given the number of repeaters in primary grades and the incidence
of students beginning their primary school after age 5, it is likely that most
of the difference is due to overage students. Numerically, this means that over
2.5 million students in primary school are over 9 years of age. Any reduction
in this number, possibly by decreasing the repetition rate, may open up places
in the primary system for some of children not currently in school.
PURPOSE The National
Education Assessment System (NEAS) was established to undertake
systematic evaluations of student learning achievement across Pakistan and
share the analytical results with both policy makers and practitioners to
inform the education quality reform process. With data that is comparable
across regions and over time, NEAS can identify gaps and bring about
improvements in the curriculum, teaching and classroom support practices, as
well as in the development of learning aids. For NEAS to be established as a
student assessment system on par with international standards, several key
steps towards institutional strengthening, capacity building and improvement in
technical quality and processes should be undertaken. Further investment in the technical
proficiency of key staff is required, in both specialized skills (item writing,
sampling, test procedures) and core expertise (report writing, comparative
analysis). - This will facilitate improvements in test and instrument design,
and will support robust research and analysis. Extending the dissemination of
results and findings to primary stakeholders, particularly teacher trainers,
textbook developers and policy makers is important. - Deeper understanding of
the assessment process and stronger linkages between assessment systems and
other education sub-departments (such as teacher professional development
centers, examination units, curriculum wing, and textbook development) will aid
better informed and strategic use of assessment information for improvements in
student learning. The longer term sustainability of NEAS will depend not only
its establishment as an autonomous body and but also the degree of integration
between the federal and regional assessment centers so that cross learning and
implementation of best practice is facilitated. With continuous improvements in
test instruments and key technical skills, NEAS will be able to track overall
system efficiency as well as individual student performance, and identify key
areas for intervention that will lead to improvement of the quality and
effectiveness of the education system. The National Education Assessment System
for Pakistan
aims to design and administer assessment mechanisms to establish administrative
infrastructure and capacity for assessment administration, analysis and report
writing, and to increase stakeholder knowledge and acceptance of assessment.
There are three components to the project: 1) Capacity building would be the
main component, where the execution of an assessment is unusually technical in
nature. Any one of a number of small mistakes can cause serious delays in
implementation and, in the worst case, lead to meaningless findings. Therefore,
a high-level technical assistance, including the services o f a senior
Technical Advisor, would be required to monitor and assist in all aspects of
both central and provincial operations. 2) Pilot experiments will be required
to determine what will produce the desired, valid results, and what process is
most implemented. 3) Information dissemination. Through this component, the
project will facilitate information dissemination about assessment to
stakeholders in advance of the actual assessment to explain its purpose and to
provide insight and reassurance about its intended uses. (EXTRACT FROM WORLD BANK REPORT)
INTERPRETATION OF TEST
SCORES
Tests scores are
norm-referenced or criterion-referenced. The results of norm-referenced tests
tell us how a person compares against others. Results from criterion-referenced
tests tell what a person has achieved against a set of learning goals.
Norm-referenced tests usually use a set of standardized norms against which to
measure the test taker. Criterion-referenced tests usually employ analysis by
content cluster or content and performance standards. Educational tests are
somewhat hard to interpret because they do not have a true zero point. We can
talk about length of an object having a zero starting point but it is difficult
to talk about true zero learning. The interpretation of test results is also
handicapped by the inequality of units of measurement. While we know that there
is exactly one inch between one inch and two inches, we cannot assume that
there is an exactly similar distance between grades of B and a C or an A and a
B. There are a variety of ways for interpreting test scores. For
criterion-referenced tests these include raw scores and percentages. For
norm-referenced tests, choices include raw scores and derived scores such as
percentiles and grade equivalents. Grade norms have been widely used with
standardized achievement tests especially at the elementary school level. The
grade equivalent that corresponds to a particular raw score identifies the
grade level at which the typical student obtains that raw score. Grade
equivalents are based on the performance of students in the norm group in each
of two or more grades. One of the most widely used and easily understood
methods of describing test performance is percentile rank. A percentile rank
(or percentile score) indicates a student's relative position in a group in
terms of the percentage of students scoring lower. It should be remembered that
percentiles and percentages are not the same. Another type of norm-referenced
score is the standard score which indicates how much above or below the mean
that the individual test taker fell. Standard scores depend on the statistics
of the mean and the standard deviation. The normal curve is a symmetrical
bell-shaped curve that has many useful mathematical properties. One of the most
useful from the viewpoint of test interpretation is that when it is divided
into standard deviation units, each portion under the curve contains a fixed
percentage of cases. The normal curve is divided up into equal standard
deviation units. Types of standard scores include z-scores, T-scores,
normalized standard scores, stanines, normal curve equivalents, and standard
age scores. One advantage of converting raw scores to derived scores is that a
student's performance on different tests can be compared directly. This is
usually done by means of a test profile. Some test publishers provide profiles
that include reports for skill objectives as well as for full subtests. It is
the responsibility of the test user to be knowledgeable about the adequacy of
the norms for the test being used. Test scores need to relevant,
representative, and up to date. It is the responsibility of the test author and
publisher to adequately describe the test norms in the test manual so that the
test user may make these decisions. While most published tests use national
norms, some tests may use local norms. Local norms are typically prepared using
either percentile ranks or stanines. Most test publishers will provide local
norms if requested, but they also can be prepared locally. The test consumer
should always practice caution in interpreting test scores. It should always be
remembered that like all educational measurement, test scores always possess
some degree of error.
·
Norm-referenced test interpretation. In norm-referenced
test interpretation, the scores that the applicant receives are compared with
the test performance of a particular reference group. In this case the
reference group is the norm group. The norm group generally consists of large
representative samples of individuals from specific populations, such as high
school students, clerical workers, or electricians. It is their average test
performance and the distribution of their scores that set the standard and
become the test norms of the group.
The test manual will
usually provide detailed descriptions of the norm groups and the test norms. To
ensure valid scores and meaningful interpretation of norm-referenced tests,
make sure that your target group is similar to the norm group. Compare the
educational level, the occupational, language and cultural backgrounds, and other
demographic characteristics of the individuals making up the two groups to
determine their similarity.
For example, consider
an accounting knowledge test that was standardized on the scores obtained by
employed accountants with at least 5 years of experience. This would be an
appropriate test if you are interested in hiring experienced accountants.
However, this test would be inappropriate if you are looking for an accounting
clerk. You should look for a test normed on accounting clerks or a closely related
occupation.
·
Criterion-referenced test
interpretation.
In criterion-referenced tests, the test score indicates the amount of skill or
knowledge the test taker possesses in a particular subject or content area. The
test score is not used to indicate how well the person does compared to others;
it relates solely to the test taker's degree of competence in the specific area
assessed. Criterion-referenced assessment is generally associated with
educational and achievement testing, licensing, and certification.
A particular test
score is generally chosen as the minimum acceptable level of competence. How is
a level of competence chosen? The test publisher may develop a mechanism that
converts test scores into proficiency standards, or the company may use its own
experience to relate test scores to competence standards.
For example, suppose
your company needs clerical staff with word processing proficiency. The test
publisher may provide you with a conversion table relating word processing
skill to various levels of proficiency, or your own experience with current
clerical employees can help you to determine the passing score. You may decide
that a minimum of 35 words per minute with no more than two errors per 100
words is sufficient for a job with occasional word processing duties. If you
have a job with high production demands, you may wish to set the minimum at 75
words per minute with no more than 1 error per 100 words.
It
is important to ensure that all inferences you make on the basis of test
results are well founded. Only use tests for which sufficient information is
available to guide and support score interpretation. Read the test manual for
instructions on how to properly interpret the test results. This leads to the
next principle of assessment.
he table below
presents both pros and cons for various test item types. Your selection of item
types should be based on the types of outcomes you are trying to assess (see
analysis of your learning situation). Certain item types such as true/false,
supplied response, and matching, work well for assessing lower-order outcomes
(i.e., knowledge or comprehension goals), while other item types such as
essays, performance assessments, and some multiple choice questions, are better
for assessing higher-order outcomes (i.e., analysis, synthesis, or evaluation
goals). The italicized bullets below will help you determine the types of
outcomes the various items assess.
With your objectives
in hand, it may be useful to create a test blueprint that specifies your
outcomes and the types of items you plan to use to assess those outcomes.
Further, test items are often weighted by difficulty. On your test blueprint,
you may wish to assign lower point values to items that assess lower-order
skills (knowledge, comprehension) and higher point values to items that assess
higher-order skills (synthesis, evaluation).
Item Type
|
Pros
|
Cons
|
Multiple Choice
(see tips for writing multiple choice questions below) |
·
more answer options (4-5) reduce the
chance of guessing that an item is correct
·
many items can aid in student
comparison and reduce ambiguity
·
greatest flexibility
in type of outcome assessed: knowledge goals, application goals, analysis
goals, etc.
|
·
reading time increased with more
answers
·
reduces the number of questions that
can be presented
·
difficult to write four or five
reasonable choices
·
takes more time to write questions
|
True/False
(see tips for writing true/false questions below) |
·
can present many items at once
·
easy to score
·
used to assess
popular misconceptions, cause-effect reactions
|
·
most difficult question to write
objectively
·
ambiguous terms can confuse many
·
few answer options (2) increase the
chance of guessing that an item is correct; need many items to overcome this
effect
|
Matching
|
·
efficient
·
used to assess
student understanding of associations, relationships, definitions
|
·
difficult to assess
higher-order outcomes (i.e., analysis, synthesis, evaluation goals)
|
Interpretive
Exercise
(the above three item types are often criticized for assessing only lower-order skills; the interpretive exercise is a way to assess higher-order skills w/ multiple choice, T/F, and matching items) |
·
a variation on multiple choice,
true/false, or matching, the interpretive exercise presents a new map, short
reading, or other introductory material that the student must analyze
·
tests student ability to apply and
transfer prior knowledge to new material
·
useful for assessing
higher-order skills such as applications, analysis, synthesis, and evaluation
|
·
hard to design, must locate
appropriate introductory material
·
students with good reading skills are
often at an advantage
|
Supplied
Response
|
·
chances of guessing reduced
·
measures knowledge
and fact outcomes well, terminology, formulas
|
·
scoring is not objective
·
can cause difficulty for computer
scoring
|
Essay
|
·
less construction time, easier to
write
·
encourages more appropriate study
habits
·
measures
higher-order outcomes (i.e., analysis, synthesis, or evaluation goals),
creative thinking, writing ability
|
·
more grading time, hard to score
·
can yield great variety of responses
·
not efficient to test large bodies of
content
·
if you give the student the choice of
three or four essay options, you can find out what they know, but not what
they don't know
|
Performance
Assessments
(includes essays above, along with speeches, demonstrations, presentations, etc.) |
·
measures
higher-order outcomes (i.e., analysis, synthesis, or evaluation goals)
|
·
labor and time-intensive
·
need to obtain inter-rater
reliability when using more than one rater
|
The table below presents tips for designing
two popular item types: multiple choice questions and true/false questions.
Tips
for Writing Multiple Choice Questions
|
Tips
for Writing True/False Questions
|
·
Avoid responses that are
interrelated. One answer should not be similar to others.
·
Avoid negatively stated items:
"Which of the following is not a method of food irradiation?" It is
easy to miss the the negative word "not." If you use negatives,
bold-face the negative qualifier to ensure people see it.
·
Avoid making your correct response
different from the other responses, grammatically, in length, or otherwise.
·
Avoid the use of "none of the
above." When a students guesses "none of the above," you still
do not know if they know the correct answer.
·
Avoid repeating words in the question
stem in your responses. For example, if you use the word "purpose"
in the question stem, do not use that same word in only one of the answers,
as it will lead people to select that specific response.
·
Use plausible, realistic responses.
·
Create grammatically parallel items
to avoid giving away the correct response. For example, if you have four
responses, do not start three of them with verbs and one of them with a noun.
·
Always place the "term" in
your question stem and the "definition" as one of the response
options.
|
·
Do not use definitive words such as
"only," "none," and "always," that lead people
to choose false, or uncertain words such as "might,"
"can," or "may," that lead people to choose true.
·
Do not write negatively stated items,
as they are confusing to interpret: "Thomas Jefferson did not write the
Declaration of Independence." True or False?
·
People have a tendency to choose
"true," so design at least 60% of your T/F items to be
"false" to further minimize guessing effects.
·
Use precise words (100, 20%, half),
rather than vague or qualitative language (young, small, many).
·
Avoid making the correct answer
longer than the incorrect answer (a give-away).
|
Developing the Test Blueprint
The
first step in test development is to set the test specifications based on the
relative importance of the content to be tested. The usual procedure is to
develop a test blueprint which includes the test objectives and the cognitive
level of the items. The test objectives are weighted by assigning a percentage
of the test items to each objective. Thus, a test that covers five areas
equally would have twenty percent of the items assigned to each objective. Some
objectives may emphasize factual knowledge while others stress understanding or
application of knowledge. Therefore, it is useful to place the objectives on
one axis of the blueprint and the cognitive level on the other axis. In this
way the test can be balanced by content and cognitive requirements. At this
point, the instructor should review the length of the planned examination to be
certain students can complete it in the time allowed. While speed in taking
examinations could be relevant in some subject areas, speeded tests
discriminate against the high ability but more methodical student. As a rule of
thumb, students can respond to one relatively complex multiple choice item
every 50 seconds. Items requiring calculation may take longer. Time for writing
responses to an essay question also depend on the complexity of the task. An
instructor might double the time for the class that it takes for the instructor
to write an acceptable response.
Intellectual Task
Example
Reiteration Recite
verbatim "A stitch in time saves nine" Summarization Restate it in
different words, such as "Make necessary repairs immediately to avoid
having to spend a great deal more time making even more repairs later."
Illustration Provide or identify examples of the rule in use, such as
"Changing the oil in your car." Prediction Use the rule or principle
to anticipate the consequences of certain acts, such as "Failure to change
the oil in your car now will result in costly engine repairs later."
Evaluation Employ the principle to make a judgment, such as "Is it better
to change the oil now?"
A GOOD TEST AND ITS CHARACTERISTICS:
Characteristics
of A Good Test
1-
Validity:
A test is considered as valid when it measures what
it is supposed to measure.
• There are different types of
validity:
–
Operational validity
–
Predictive validity
–
Content validity
–
Construct validity
•
Operational Validity
–
A test will have operational validity if the tasks required by the test are
sufficient to evaluate the definite activities or qualities.
• Predictive Validity
–
A test has predictive validity if scores on it predict future performance
• Content Validity
–
If the items in the test constitute a representative sample of the total course
content to be tested, the test can be said to have content validity.
•
Construct
Validity
–
Construct validity involves explaining the test scores psychologically. A test
is interpreted in terms of numerous research findings.
2-
Reliability :
A test is considered reliable if it is taken again
by the same students under the same circumstances and the score average is
almost the constant , taking into consideration that the time between the test
and the retest is of reasonable length.
•
Reliability of a test refers to the degree of consistency with which it
measures what it indented to measure.
•
A test may be reliable but need not be valid. This is because it may yield
consistent scores, but these scores need not be representing what exactly we
want to measure.
•
A test with high validity has to be reliable also. (the scores will be
consistent in both cases)
•
Valid test is also a reliable test, but a reliable test may not be a valid one
Different
method for determining Reliability
•
Test-retest method
–
A test is administrated to the same group with short interval. The scores are
tabulated and correlation is calculated. The higher the correlation, the more
the reliability.
• Split-half method
–
The scores of the odd and even items are taken and the correlation between the
two sets of scores determined.
• Parallel form method
–
Reliability is determined using two equivalent forms of the same test content.
–
These prepared tests are administrated to the same group one after the other.
–
The test forms should be identical with respect to the number of items,
content, difficult level etc.
–
Determining the correlation between the two sets of scores obtained by the
group in the two tests.
–
If higher the correlation, the more the reliability.
3-
Objectivity:
Objectivity means that if the test is marked by
different people, the score will be the same . In other words, marking process
should not be affected by the marking person's personality.
4-
Comprehensiveness:
A good test should include items from different
areas of material assigned for the test. e.g ( dialogue - composition -
comprehension - grammar - vocabulary - orthography - dictation - handwriting )
5-
Simplicity:
Simplicity means that the test should be written in
a clear , correct and simple language , it is important to keep the method of
testing as simple as possible while still testing the skill you intend to test
. ( Avoid ambiguous questions and ambiguous instructions )
6-
Scorability :
Scorability means
that each item in the test has its own mark related to the distribution of
marks given by ( The Ministry of Education
7-Discriminating
Power
•
Discriminating power of the test is its power to discriminate between the upper
and lower groups who took the test.
•
The test should contain different difficulty level of questions.
8-Practicability
•
Practicability of the test depends up on...
•
Administrative ease •
Scoring ease •
Interpretative ease •
Economy
9-Comparability
•
A test possesses comparability when scores resulting from its use can be
interpreted in terms of a common base that has a natural or accepted meanings
•
There are two method for establishing comparability
–
Availability of equivalent (parallel) form of test – Availability of adequate norms
10-Utility
•
A test has utility if it provides the test condition that would facilitate
realization of the purpose for which it is mean.
0 Comments
To contact Study for Super-Vision, write your comments. Avoid spamming as your post will not be seen by any one.