B Ed Notes, Define and differenciate between Test and testing

Test and testing
Simply put a test is a measuring tool or instrument in education. More specifically, a test is  considered to be a kind or class of measurement device typically used to find out something about a person. Most of the times, when you finish a lesson or lessons in a week, your teacher gives you a test. This test is an instrument given to you by the teacher in order to obtain data on which you are judged. It is an educationally common type of device which an individual completes himself or herself, the intent is to determine changes or gains resulting from such instruments as inventory, questionnaire, opinionnaire, scale etc. Testing on the other hand is the process of administering the test on the pupils. In other words the process of making you or letting you take the test in order to obtain a quantitative representation of the cognitive or non-cognitive traits you possess is called testing. So the instrument or tool is the test and the process of administering the test is testing.
3.1.2 Assessment
Now that you have learnt the difference between test and testing, Let us move on to the next concept which is assessment. As a teacher, you will be inevitably involved in assessing learners; therefore you should have a clear knowledge and the meaning of assessment. The term assess is derived from a Latin word “asoidere” meaning “to sit by” in judgment. There are many definitions and explanations of assessment in education. Let us look at few of them.
i. Freeman and Lewis (1998) to assess is to judge the extent of students’ learning.
ii. Rowntree (1977): Assessment in education can be thought of as occurring whenever one person, in some kind of interaction, direct or indirect, with another, is conscious of obtaining and interpreting information about the knowledge and understanding, of abilities and attitudes of that other person. To some extent or other, it is an attempt to know the person.
iii. Erwin, in Brown and Knight, (1994). Assessment is a systematic basis for making inference about the learning and development of students… the process of defining, selecting, designing, collecting, analyzing, interpreting and using information to increase students’ learning and development. You will have to note from these definitions that
Assessment is a human activity.
Assessment involves interaction, which aims at seeking to understand what the learners have achieved.
Assessment can be formal or informal.
Assessment may be descriptive rather than judgment in nature.
Its role is to increase students’ learning and development
It helps learners to diagnose their problems and to improve the quality of their subsequent learning.
3.1.3 Measurement
This is a broad term that refers to the systematic determination of outcomes or characteristics by means of some sort of assessment device. It is a systematic process of obtaining the quantified degree to which a trait or an attribute is present in an individual or object. In other words it is a systematic assignment of numerical values or figures to a trait or an attribute in a person or object. For instance what is the height of Uche? What is the weight of the meat? What is the length of the classroom? In education, the numerical value of scholastics ability, aptitude, achievement etc can be measured and obtained using instruments such as paper and pencil test. It means that the values of the attribute are translated into numbers by measurement.
3.1.4 Evaluation
According to Tuckman (1975) evaluation is a process wherein the parts, processes, or outcomes of a programme are examined to see whether they are satisfactory, particularly with reference to the stated objectives of the programme, our own expectations, or our own standards of excellence. According to Cronbach et al (1980) evaluation means the systematic examination of events occurring in and consequent on a contemporary programme. It is an examination conducted to assist in improving this programme and other programmes having the same general purpose. For Thorpe (1993) evaluation is the collection analysis and interpretation of information about training as part of a recognized process of judging its effectiveness, its efficiency and any other outcomes it may have. If you study these definitions very well, you will note that evaluation as an integral part of the instructional process involves three steps. These are
i. Identifying and defining the intended outcomes.
ii. Constructing or selecting tests and other evaluation tools relevant to the specified outcomes, and
iii. Using the evaluation results to improve learning and teaching. You will also note that evaluation is a continuous process. It is essential in all fields of teaching and learning activity where judgment need to be made.
3.2 Types of Evaluation
The different types of evaluation are: placement, formative, diagnostic and summative evaluations.
3.2.1 Placement Evaluation
This is a type of evaluations carried out in order to fix the students in the appropriate group or class. In some schools for instance, students are assigned to classes according to their subject combinations, such as science, Technical, arts, Commercial etc. before this is done an examination will be carried out. This is in form of pretest or aptitude test. It can also be a type of evaluation made by the teacher to find out the entry behaviour of his students before he starts teaching. This may help the teacher to adjust his lesson plan. Tests like readiness tests, ability tests, aptitude tests and achievement tests can be used.
3.2.2 Formative Evaluation
This is a type of evaluation designed to help both the student and teacher to pinpoint areas where the student has failed to learn so that this failure may be rectified. It provides a feedback to the teacher and the student and thus estimating teaching success e.g. weekly tests, terminal examinations etc.
3.2.3 Diagnostic Evaluation
This type of evaluation is carried out most of the time as a follow up evaluation to formative evaluation. As a teacher, you have used formative evaluation to identify some weaknesses in your students. You have also applied some corrective measures which have not showed success. What you will now do is to design a type of diagnostic test, which is applied during instruction to find out the underlying cause of students persistent learning difficulties. These diagnostic tests can be in the form of achievement tests, performance test, self rating, interviews observations, etc.
3.2.4 Summative evaluation:
This is the type of evaluation carried out at the end of the course of instruction to determine the extent to which the objectives have been achieved. It is called a summarizing evaluation because it looks at the entire course of instruction or programme and can pass judgment on both the teacher and students, the curriculum and the entire system. It is used for certification. Think of the educational certificates you have acquired from examination bodies such as WAEC, NECO, etc. These were awarded to you after you had gone through some types of examination. This is an example of summative evaluation.
The Purpose of Measurement and Evaluation.
The main purposes of measurement and evaluation are:
i. Placement of student, which involves bringing students appropriately in the learning sequence and classification or streaming of students according to ability or subjects.
ii. Selecting the students for courses – general, professional, technical, commercial etc.
iii. Certification: This helps to certify that a student has achieved a particular level of performance.
iv. Stimulating learning: this can be motivation of the student or teacher, providing feedback, suggesting suitable practice etc.
v. Improving teaching: by helping to review the effectiveness of teaching arrangements.
vi. For research purposes.
vii. For guidance and counseling services.
viii. For modification of the curriculum purposes.
ix. For the purpose of selecting students for employment
x. For modification of teaching methods.
xi. For the purposes of promotions to the student.
xii. For reporting students progress to their parents.
xiii. For the awards of scholarship and merit awards.
xiv. For the admission of students into educational institutions.
xv. For the maintenance of students.
4.0 CONCLUSION
Now that you have gone through the descriptions of the major terms used in measurement and evaluation and you can give the purposes of measurement and evaluation as well as explain the types of evaluation, you have placed yourself on a good footing for the study of this all important course which you can not do without as a teacher.
5.0 SUMMARY
In general, those practitioners in the educational system are most of the times interested in ascertaining the outputs of the educational programme. Output is counted in terms of test results which are naturally expressed in quantitative indices such as scores or marks. Test, which is a device, an instrument or a tool consisting of a set of tasks or questions, is used to obtain the results. Test can be in the form of pen and paper examination, assignments, practical etc. The process of administering this test is called testing. But an act of measurement is done when we award marks to an answer paper or assignment. So measurement gives the individuals ability in numerical indices of scores i.e. measurement is quantitative. Assessment can be seen as the engine that drives and shapes learning, rather than simply an end of term examination that grades and reports performance. Evaluation is expressed in qualitative indices such as good, excellent pass or fail. Value judgment is therefore attached to the measurement. Evaluation can be placement, formative, diagnostic or summative.
Evaluation, Measurement and Testing
Bachman (1990), quoting Weiss (1972) defines evaluation as “the systematic gathering of information for the purpose of making decisions”. Lynch (2001) adds the fact that this decision or judgment is to be about individuals. In this conceptualization, both authors agree that evaluation is the superordinate term in relation to both measurement and testing. Assessment is sometimes used interchangeably for evaluation. The systematic information can take many forms, but these forms are either quantitative or qualitative. This is what distinguishes measures from qualitative descriptions. Measurement is thus concerned with quantification. Language proficiency, like many other constructs and characteristics of persons in social sciences, needs to be quantified before any judgments can be made about it. This process of quantifying is called operationalization in research by which we mean assigning numbers according to observable operations and explicit procedures or rules to measure a construct (Bachman 1990) (Ary et al. 1996) The third component in this model is testing, which consists of the use of actual tests to elicit the desired behavior. Carroll (1968) defines a test as:
“A PSYCHOLOGICAL OR EDUCATIONAL TEST IS A PROCEDURE DESIGNED TO ELICIT CERTAIN BEHAVIOR FROM WHICH ONE CAN MAKE INFERENCES ABOUT CERTAIN CHARACTERISTICS OF AN INDIVIDUAL”.
Bachman (1990) observes that a test is one type of measurement instrument, and thus necessarily quantifies characteristics of individuals according to explicit procedures. Bachman (1990), then, concludes that there are other types of measurement than tests, and the difference is that a test is designed to obtain a specific sample of behavior. For the purpose of schematic representation, the three concepts of evaluation, measurement and testing have traditionally been demonstrated in three concentric circles of varying sizes. This is what Lynch (2001) has followed in depicting the relationship among these concepts.
Bachman (1990) has represented the relationship in a somewhat different way. The goal has been to extend the model to include not only language testing but also language teaching, language learning and language research domains. Figure 2 depicts this extended view of the relationship among evaluation, measurement and testing. The areas numbered from 1 to 5 show the various forms of this relationship.
Area 1- Evaluation not involving either tests or measures; for example, the use of qualitative descriptions of student performance for diagnosing learning problems.
Area 2- A non-test measure for evaluation; for example, teacher ranking used for assigning grades.
Area 3- A test used for purposes of evaluation; for example, the use of an achievement test to determine student progress.
Area 4- Non-evaluative use of tests and measures for research purposes; for example, the use of a proficiency test as a criterion in second language acquisition research.
Area 5- Non-evaluative non-test; for example, assigning code numbers to subjects in second language research according to native language.
The word 'evaluation' is often confused with testing and measurement. therefore, many a time teachers who give a test to the students, think that they are evaluating the achievement of the students. Testing is only a technique to collect evidence regarding pupil behaviour. Measurement on the other hand, is limited to quantitative description of the pupil behaviour. Evaluation is a more comprehensive term which includes testing and measurement and also qualitative description of the pupil behaviour. It also includes value judgment regarding the worth or desirability of the behaviour measured or assessed. Therefore, Gronlund (1981) has indicated this relationship in the following equation:
Evaluation = quantitative description of pupils (measurement) +  value judgment
Evaluation = qualitative description of pupils (non-measurement) +  value judgment
WHAT ARE MCQs
Multiple choice is a form of assessment in which respondents are asked to select the best possible answer (or answers) out of the choices from a list. The multiple choice format is most frequently used in educational testing, in market research, and in elections-- when a person chooses between multiple candidates, parties, or policies. Multiple choice testing is particularly popular in the United States.
Although E. L. Thorndike developed an early multiple choice test, Frederick J. Kelly was the first to use such items as part of a large scale assessment.

STRUCTURE       Multiple choice items consist of a stem and a set of options. The stem is the beginning part of the item that presents the item as a problem to be solved, a question asked of the respondent, or an incomplete statement to be completed, as well as any other relevant information. The options are the possible answers that the examiner can choose from, with the correct answer called the key and the incorrect answers called distractors.

Only one answer can be keyed as correct. This contrasts with multiple response items in which more than one answer may be keyed as correct.
Usually, a correct answer earns a set number of points toward the total mark, and an incorrect answer earns nothing. However, tests may also award partial credit for unanswered questions or penalize students for incorrect answers, to discourage guessing. For example, the SAT removes a quarter point from the test taker's score for an incorrect answer.
WHY MULTIPLE-CHOICE ITEMS ARE GOOD
1. Multiple-choice items can be used to measure learning outcomes at almost any level. This is the big one, and we have mentioned it before. This allows multiple-choice items to be very flexible and to be useful anytime you are sure that test takers can adequately read, and understand, the content of the question.
2. They are clear and straightforward. Well-written multiple-choice items are very clear, and what is expected of the test taker is clear as well. There’s usually no ambiguity (how many pages should I write, can I use personal experiences, etc.) about answer­ing the test questions.
3. No writing needed. Well, not very much anyway, and that has two distinct advantages. First, it eliminates any differences between test takers based on their writing skills. And, second, it allows for responses to be completed fairly quickly, leaving more time for more questions. You should allot about 60 seconds per multiple-choice question when designing your test.
4. The effects of guessing are minimized, especially when compared to true-false items. With four or five options, the likelihood of get­ting a well-written item correct by chance alone (and that’s exactly what guessing is) is anywhere between 20% and 25%.
5. Multiple-choice items are easy to score, and the scoring is reliable as well. If this is the case, and you have a choice of what kind of items to use, why not use these? Being able to bring 200 bubble scoring sheets to your office’s scoring machine and having the results back in 5 minutes sure makes life a lot easier. And, when the scoring system is more reliable and more accurate, the reliabil­ity of the entire test increases.
6. Multiple-choice items lend themselves to item analysis. We’ll talk shortly about item analysis, including how to do it and what it does. For now, it’s enough to understand that this technique allows you to further refine multiple-choice items so that they perform better and give you a clearer picture of how this or that item per­formed and if it did what it was supposed to do. For this reason, multiple-choice items can be diagnostic tools to tell you what test takers understand and what they do not.
Advantages of Multiple-Choice Items
Disadvantages of Multiple-Choice Items
They can be used to measure learning outcomes at almost any level.
They take a long time to write.
They are easy to understand (if well written, that is).
Good ones are difficult to write.
They deemphasize writing skills.
They limit creativity.
They minimize guessing.
They may have more than one correct answer.
They are easy to score.

They can be easily analyzed for their effectiveness.

Why Multiple-Choice Items Are Not So Good
1. Multiple-choice items take a long time to write. You can figure on anywhere between 10 and 20 minutes to write a decent first draft of a multiple-choice item. Now, you may be able to use this same item in many different settings, and perhaps for many differ­ent years, but nonetheless it’s a lot of work. And, once these new items are administered and after their performance analyzed, count on a few more minutes for revision.
2. Good multiple-choice items are not easy to write. Not only do they take a long time, but unless you have very good distracters (written well, focused, etc.), and you include one correct answer, then you will get test takers who can argue for any of the alterna­tives as being correct (even though you think they are not), and they can sometimes do this pretty persuasively.
3. Multiple-choice items do not allow for creative or unique responses. Test takers have no choice as to how to respond (A or B or C or D). So, if there is anything more they would like to add or show what they know beyond what is present in the individual item, they are out of luck!
4. The best test takers may know more than you! Multiple-choice items operate on the assumption that there is only one correct alterna­tive. Although the person who designs the test might believe this is true, the brightest (student and) test taker may indeed find something about every alternative, including the correct one, that is flawed.
TYPES OF MULTIPLE CHOICE QUESTIONS:              Multiple-Choice Items: More Than Just “Which One Is Correct”
There are many types of Multiple Choice Questions, some are being discussed below:
1.             Best-answer multiple-choice items. These are multiple-choice items where there may be more than one correct answer, but only one of them is the best of all the correct ones.
2.             Rearrangement multiple-choice items. Here’s where the test taker arranges a set of items in sequential order, be it steps in a process or the temporal sequence in which something might have occurred or should occur.
3.             Interpretive multiple-choice items. Here, the test taker reads through a passage and then selects a response where the alter­natives (and the correct answer) all are based on the same passage. Keep in mind that although this appears to be an attractive format, it does place a premium on reading and comprehension skills.
4.             Substitution multiple-choice items. This is something like a short answer or completion item , but there are alterna­tives from which to select. The test taker selects those responses from a set of responses that he or she thinks answers the question correctly.
The Role of Assessment in Teaching
Assessing student learning is something that every teacher has to do, usually quite frequently. Written tests, book reports, research papers, homework exercises, oral presentations, question-and-answer sessions, science projects, and artwork of various sorts are just some of the ways in which teachers measure student learning, with written tests accounting for about 45 percent of a typical student's course grade (Green & Stager, 1986/1987). It is no surprise, then, that the typical teacher can spend between one-third and one-half of her class time engaged in one or another type of measurement activity (Stiggins, 1994). Yet despite the amount of time teachers spend assessing student learning, it is a task that most of them dislike and that few do well. One reason is that many teachers have little or no in-depth knowledge of assessment principles (Crooks, 1988; Hills, 1991; Stiggins, Griswold, & Wikelund, 1989). Another reason is that the role of assessor is seen as being inconsistent with the role of teacher (or helper). Since teachers with more training in assessment use more appropriate assessment practices than do teachers with less training (Green & Stager, 1986/1987), a basic goal of this chapter is to help you understand how such knowledge can be used to reinforce, rather than work against, your role as teacher. Toward that end, we will begin by defining what we mean by the term assessment and by two key elements of this process, measurement and evaluation.
What is Assessment?
Broadly conceived, classroom assessment involves two major types of activities: collecting information about how much knowledge and skill students have learned (measurement) and making judgments about the adequacy or acceptability of each student's level of learning (evaluation). Both the measurement and evaluation aspects of classroom assessment can be accomplished in a number of ways. To determine how much learning has occurred, teachers can, for example, have students take exams, respond to oral questions, do homework exercises, write papers, solve problems, and make oral presentations. Teachers can then evaluate the scores from those activities by comparing them either to one another or to an absolute standard (such as an A equals 90 percent correct). Throughout much of this chapter we will explain and illustrate the various ways in which you can measure and evaluate student learning.

WHAT ARE SOME TYPES OF ASSESSMENT?

There are many alternatives to traditional standardized tests that offer a variety of ways to measure student understanding.

In the early theories of learning, it was believed that complex higher-order thinking skills were acquired in small pieces, breaking down learning into a series of prerequisite skills. After these pieces were memorized, the learner would be able to assemble them into complex understanding and insight -- the puzzle could be arranged to form a coherent picture.
Today, we know learning requires that the learner engage in problem-solving to actively build mental models. Knowledge is attained not just by receiving information, but also by interpreting the information and relating it to the learner's knowledge base. What is important, and therefore should be assessed, is the learner's ability to organize, structure, and use information in context to solve complex problems.

STANDARDIZED ASSESSMENT

Almost every school district now administers state-mandated standardized tests. Every student at a particular grade level is required to take the same test. Everything about the test is standard -- from the questions themselves, to the length of time students have to complete it (although some exceptions may be made for students with learning or physical disabilities), to the time of year in which the test is taken. Throughout the country, and with the passage of the Elementary and Secondary Education Act, commonly known as the No Child Left Behind Act (which requires research-based assessment), student performance on these tests has become the basis for such critical decisions as student promotion from one grade to the next, and compensation for teachers and administrators.
Standardized tests should not be confused with the standards movement, which advocates specific grade-level content and performance standards in key subject areas. Often, in fact, standardized tests are not aligned with state and district content standards, causing considerable disconnect between what is being taught and what is being tested.
In the spring of 2009, an initiative was created to develop a set of standards for all states in the United States to adhere to. The Common Core State Standards Initiative (CCSS), as it has become known, is still an evolving movement. The vast majority of states have pledged to adopt the standards and implement them by 2015. Standards for English language arts and mathematics were published in 2010, while standards for science and social studies are still in development. Visit Edutopia's Common Core State Standards Resource page for more information about the standards.
The questions then become: What is evidence-based assessment? Is it standardized tests? Is it portfolios? If portfolios are a part of evidence-based assessment, what else is necessary? Reflections? Work samples? Best work?

FORMAL ASSESSMENT

Some formal assessments provide teachers with a systematic way to evaluate how well students are progressing in a particular instructional program. For example, after completing a four- to six-week theme, teachers will want to know how well students have learned the theme skills and concepts. They may give all the students a theme test in which students read, answer questions, and write about a similar theme concept. This type of assessment allows the teacher to evaluate all the students systematically on the important skills and concepts in the theme by using real reading and writing experiences that fit with the instruction. In other situations, or for certain students, teachers might use a skills test to examine specific skills or strategies taught in a theme.

INFORMAL ASSESSMENT

Other forms of authentic assessment are more informal, including special activities such as group or individual projects, experiments, oral presentations, demonstrations, or performances. Some informal assessments may be drawn from typical classroom activities such as assignments, journals, essays, reports, literature discussion groups, or reading logs. Other times, it will be difficult to show student progress using actual work, so teachers will need to keep notes or checklists to record their observations from student-teacher conferences or informal classroom interactions. Sometimes informal assessment is as simple as stopping during instruction to observe or to discuss with the students how learning is progressing. Any of these types of assessment can be made more formal by specifying guidelines for what and how to do them, or they can be quite informal, letting students and teachers adjust to individual needs. In some situations, the teacher will want all students to complete the same assessments; in others, assessments will be tailored to individual needs.

ALTERNATIVE ASSESSMENT

Alternative assessment, often called authentic, comprehensive, or performance assessment, is usually designed by the teacher to gauge students' understanding of material. Examples of these measurements are open-ended questions, written compositions, oral presentations, projects, experiments, and portfolios of student work. Alternative assessments are designed so that the content of the assessment matches the content of the instruction.
Effective assessments give students feedback on how well they understand the information and on what they need to improve, while helping teachers better design instruction. Assessment becomes even more relevant when students become involved in their own assessment. Students taking an active role in developing the scoring criteria, self-evaluation, and goal setting, more readily accept that the assessment is adequately measuring their learning.
Authentic assessment can include many of the following:
1-Observation                                      2-Essays                               3-Interviews                          4-Performance tasks
5-Exhibitions and demonstrations   6-Portfolios                           7-Journals                             8-Teacher-created tests
9-Rubrics                                               10-Self- and peer-evaluation
Measurement
Measurement is the assignment of numbers to certain attributes of objects, events, or people according to a rule-governed system. For our purposes, we will limit the discussion to attributes of people. For example, we can measure someone's level of typing proficiency by counting the number of words the person accurately types per minute or someone's level of mathematical reasoning by counting the number of problems correctly solved. In a classroom or other group situation, the rules that are used to assign the numbers will ordinarily create a ranking that reflects how much of the attribute different people possess (Linn & Gronlund, 1995).
Evaluation
Evaluation involves using a rule-governed system to make judgments about the value or worth of a set of measures (Linn & Gronlund, 1995). What does it mean, for example, to say that a student answered eighty out of one hundred earth science questions correctly? Depending on the rules that are used, it could mean that the student has learned that body of knowledge exceedingly well and is ready to progress to the next unit of instruction or, conversely, that the student has significant knowledge gaps and requires additional instruction.
ALTERNATIVE RESPONSE QUESTIONS
 Alternative response questions are a special form of multiple choice question, where are the learner has to choose between just two items. This does give the learner a 50% chance of guessing the correct answer, and so their learning value as single questions is limited. There are instances where they are valid, such as where are there really are only two possibilities, such as in or out, or up or down.
Another form of alternative response question is the true/ false or yes/no type. You present the learner with a statement that they must judge to be true or false. These still suffer from the 50% chance problem, so you need to design the question carefully so that they do actually help learning.
Ways to Evaluate Student Learning
Once you have collected all the measures you intend to collect -- for example, test scores, quiz scores, homework assignments, special projects, and laboratory experiments -- you will have to give the numbers some sort of value (the essence of evaluation). As you probably know, this is most often done by using an A to F grading scale. Typically, a grade of A indicates superior performance; a B, above-average performance; a C, average performance; a D, below-average performance; and an F, failure. There are two general ways to approach this task. One approach involves comparisons among students. Such forms of evaluation are called norm-referenced since students are identified as average (or normal), above average, or below average. An alternative approach is called criterion-referenced because performance is interpreted in terms of defined criteria. Although both approaches can be used, we favor criterion-referenced grading for reasons we will mention shortly.
NORM-REFERENCED GRADING
A norm-referenced grading system assumes that classroom achievement will naturally vary among a group of heterogeneous students because of differences in such characteristics as prior knowledge, learning skills, motivation, and aptitude. Under ideal circumstances (hundreds of scores from a diverse group of students), this variation produces a bell-shaped, or "normal," distribution of scores that ranges from low to high, has few tied scores, and has only a very few low scores and only a very few high scores. For this reason, norm-referenced grading procedures are also referred to as "grading on the curve."
CRITERION-REFERENCED GRADING
A criterion-referenced grading system permits students to benefit from mistakes and to improve their level of understanding and performance. Furthermore, it establishes an individual (and sometimes cooperative) reward structure, which fosters motivation to learn to a greater extent than other systems.
Under a criterion-referenced system, grades are determined through comparison of the extent to which each student has attained a defined standard (or criterion) of achievement or performance. Whether the rest of the students in the class are successful or unsuccessful in meeting that criterion is irrelevant. Thus, any distribution of grades is possible. Every student may get an A or an F, or no student may receive these grades. For reasons we will discuss shortly, very low or failing grades tend to occur less frequently under a criterion-referenced system.
A common version of criterion-referenced grading assigns letter grades on the basis of the percentage of test items answered correctly. For example, you may decide to award an A to anyone who correctly answers at least 85 percent of a set of test questions, a B to anyone who correctly answers 75 to 84 percent, and so on down to the lowest grade. To use this type of grading system fairly, which means specifying realistic criterion levels, you would need to have some prior knowledge of the levels at which students typically perform. You would thus be using normative information to establish absolute or fixed standards of performance. However, although norm-referenced and criterion-referenced grading systems both spring from a normative database (that is, from comparisons among students), only the former system uses those comparisons to directly determine grades.
Criterion-referenced grading systems (and criterion-referenced tests) have become increasingly popular in recent years primarily because of three factors. First, educators and parents complained that norm-referenced tests and grading systems provided too little specific information about student strengths and weaknesses. Second, educators have come to believe that clearly stated, specific objectives constitute performance standards, or criteria, that are best assessed with criterion-referenced measures. Third, and perhaps most important, contemporary theories of school learning claim that most, if not all, students can master most school objectives under the right circumstances. If this assertion is even close to being true, then norm-referenced testing and grading procedures, which depend on variability in performance, will lose much of their appeal.

The Concept of Measurement

What does it mean to measure something? According to the National Council of Teachers of Mathematics (2000), "Measurement is the assignment of a numerical value to an attribute of an object, such as the length of a pencil. At more-sophisticated levels, measurement involves assigning a number to a characteristic of a situation, as is done by the consumer price index." An early understanding of measurement begins when children simply compare one object to another. Which object is longer? Which one is shorter? At the other extreme, researchers struggle to find ways to quantify their most elusive variables. The example of the consumer price index illustrates that abstract variables are, in fact, human constructions. A major part of scientific and social progress is the invention of new tools to measure newly constructed variables.

Assessment and Evaluation

Types of Assessment and Evaluation

Assessment and evaluation studies may take place at the subject, department, or Institute level, and range in size and scope from a pilot study to a complex project that addresses a number of different topics, involves hundreds of students, and includes a variety of methodologies.  Typically, assessment efforts are divided into two types, formative or summative. Below, each is described briefly along with a third less frequently seen type called process assessment. Included, as well, is a grid that classifies different assessment methodologies.

Formative Assessment implies that the results will be used in the formation and revision process of an educational effort.  Formative assessments are used in the improvement of educational programs. This type of assessment is the most common form of assessment in higher education, and it constitutes a large proportion of TLL’s assessment work. Since educators are continuously looking for ways to strengthen their educational efforts, this type of constructive feedback is valuable.

Summative Assessment is used for the purpose of documenting outcomes and judging value.  It is used for providing feedback to instructors about the quality of a subject or program, reporting to stakeholders and granting agencies, producing reports for accreditation, and marketing the attributes of a subject or program. Most studies of this type are rarely exclusively summative in practice, and they usually contain some aspects of formative assessment.

Process Assessment:      begins with the identification of project milestones to be reached, activities to be undertaken, products to be delivered, and/or projected costs likely to be incurred in the course of attaining a project’s final goals. The process assessment determines whether markers have been reached on schedule, deliverables produced, and cost estimates met. The degree of difference from the expected plan is used to evaluate success.

Methods of Measuring Learning Outcomes Grid

How colleges and universities can measure and report on the knowledge and abilities their students have acquired during their college years is an issue of growing interest. The Methods of Measuring Learning Outcomes Grid provides a way to categorize the range of methodologies that can be used to assess the value added by a college education.
ASSESSMENT SYSTEM IN PAKISTAN         Reliable and accurate education statistics are a condition for sound educational planning and management. The first ever Pakistan National Education Census (NEC), 2005-06, was conducted by the Federal Ministry of Education and the Statistics Division, Federal Bureau of Statistics. It covered 245,682 institutions, including public and private schools, colleges and universities, professional institutions, vocational and technical centres, mosque schools, deeni madaris, and non-formal education centres. A number of statistical tables for the national and provincial levels were published. However, analysis of the data could go further in order to generate education indicators describing the education situation in Pakistan, and develop analyses underpinned by findings and technical explanations.
Executive Summary
The National Education Census (NEC) of 2005/06 was the first education census conducted in the history of Pakistan that was specifically designed to collect information on all types of schools. It thus generated a complete and comprehensive picture of the current education system in the country, and provides a robust information baseline from which to measure future progress. Through ensuring a complete listing of schools, it also assists other education data collection activities in the field. Pakistan also has a National Education Management Information System (NEMIS) which collects education data annually. The system covers public education sector, but to date has not comprehensively covered private sector educational provision. Since some 31% of basic education students attend private schools, it is therefore important that up-to-date information be made available on this sub-sector, to ensure that policy development is based on knowledge of the entire education system not just the public sector alone. A combination of the NEC and the NEMIS shows that over 36 million students were attending and educational institution in 2005/06. Just under 50% of those students (17.8 million) were studying at the primary level, 20.9% (7.5 million) in pre-primary, 15.4% (5.6 million) in middle elementary, 6.9% (2.5 million) in secondary, 2.5% (.9 million) in higher secondary and 4.9% (1.8 million) at the postsecondary level. Pakistan has a Gross Enrolment Rate (GER) at the primary level of almost 80% - (when all primary  enrolment is measured against the population 5 to 9 years of age). The difference of 80% between the Net Enrolment Rate (NER) of 62% and the GER is due to the number of primary students who are over 9 years of age or under 5 years of age. Given the number of repeaters in primary grades and the incidence of students beginning their primary school after age 5, it is likely that most of the difference is due to overage students. Numerically, this means that over 2.5 million students in primary school are over 9 years of age. Any reduction in this number, possibly by decreasing the repetition rate, may open up places in the primary system for some of children not currently in school.

Teacher-Made Test Construction

Teacher-made test is the major basis for evaluating the progress or performance of the students in the classroom. The teacher therefore, had an obligation to provide their students with best evaluation.
1. identify the types of teacher-made test;
2. draw general rules/guidelines in constructing test that is applicable to all types of test;
3. explain how to score essay test in such a way that subjectivity can be eliminated;
4. discuss and summarize the advantages and disadvantages of essay and objective type of test;
5. enumerate and discuss other evaluative instruments use to measure students’ performance; and
6. construct different types of test.
Steps in Constructing Teacher-Made Test
1. Planning the Test. In planning the test the following should be observed: the objectives of the subjects, the purpose for which the test is administered, the availability of facilities and equipments, the nature of the testee, the provision for review and the length of the test.
2. Preparing the Test. The process of writing good test items is not simple – it requires time and effort. It also requires certain skills and proficiencies on the part of the writer. Therefore, a test writer must master the subject matter he/she teaches, must understand his testee, must be skillful in verbal expression and most of all familiar with various types of tests.
3. Reproducing the Test. In reproducing test, the duplicating machine and who will facilitate in typing and mimeographing be considered.
4. Administering the Test. Test should be administered in an environment familiar to the students, sitting arrangements is observed, corrections are made before the start of the test, distribution and collection of papers are planned, and time should be written on the board. One more important thing to remember is, do not allow every testee to leave the room except for personal necessity.
5. Scoring the Test. The best procedure in scoring objective test is to give one point of credit for each correct answer. In case of a test with only two or three options to each item, the correction formula should be applied. Example: for two option, score equals right minus wrong (S = R-W). For three options, score equals right minus one-half wrong (S = R-1/2 W or S= R-W/2). Correction formula is not applied to four or more options. If correction formula is employed students should be informed beforehand.
6. Evaluating the Test. The test is evaluated as to the quality of the student’s responses and the quality of the test itself. Index difficulty and discrimination index of the test item is considered. Fifty (50) per cent difficulty is better. Item of 100 per cent and zero (0) per cent answered by students are valueless in a test of general achievement.
7. Interpreting Test Results. Standardized achievement tests are interpreted based on norm tables. Table of norm are not applicable to teacher-made test.
Types of Informal Teacher Made Test
I. Essay Examination
Essay examination consists of questions where students respond in one or more sentences to a specific question or problems. It is a test to evaluate knowledge of the subject matter or to measure skills in writing. It is also tests students’ ability to express his ideas accurately and to think critically within a certain period of time. Essay examination maybe evaluated in terms of content and form. In order to write good essay test, it must be planned and constructed in advance. The questions must show major aspect of the lesson and a representative samples. Avoid optional questions and use large number of questions with short answer rather than short question with very long answer.

Classroom Activities That Relate to Piaget's Theory of Cognitive Development

Jean Piaget, the psychologist and philosopher said, "The principle goal of education in the schools should be creating men and women who are capable of doing new things, not simply repeating what other generations have done." Piaget developed a theory of cognitive development that corresponds to his hope for the educational process. The four segments of development include sensorimotor in which children 2 and under learn using their senses and primitive understanding. The second stage is preoperational in which children from 2 to 7 understand abstract symbols and language. The third level is concrete, where children 7 to 11 reverse operations, order items, and maturely understand cause and effect processes. The final stage of development is formal operations in which children 12 and up think abstractly. Use Piaget's theory to design your classroom activities.
Social interaction shapes personality development, according to Danish psychoanalyst Erik Erikson's theory of psychosocial development. From birth, a child creates an emotional repertoire tied to her perceptions of her world's safety. Fear of new experiences battles with exploratory instincts, and the winner depends on whether a child feels safe. Teachers who know how to apply psychosocial development in the classroom create a safe environment where each child feels appreciated and comfortable exploring new knowledge and relationships rather than letting fear inhibit learning.

Formative and Summative Assessments in the Classroom

Successful middle schools engage students in all aspects of their learning. There are many strategies for accomplishing this. One such strategy is student-led conferences. As a classroom teacher or administrator, how do you ensure that the information shared in a student-led conference provides a balanced picture of the student's strengths and weaknesses? The answer to this is to balance both summative and formative classroom assessment practices and information gathering about student learning.
Assessment is a huge topic that encompasses everything from statewide accountability tests to district benchmark or interim tests to everyday classroom tests. In order to grapple with what seems to be an over use of testing, educators should frame their view of testing as assessment and that assessment is information. The more information we have about students, the clearer the picture we have about achievement or where gaps may occur.
NEEDS FOR DEVELOPMENT OF NEAS
It is clear that Pakistan is still a long way from achieving universal primary enrolment. As indicated  by the primary Net Enrolment Rate (NER)'s estimate of 62% , over 35% of the population 5 to 9 years of age is not in school Given a population of 5 to 9 years old of some 19.5 million, this means that about 7 million children aged 5 to 9 are out of the education system. Furthermore, under current conditions, the education system does not provide for a substantial percentage of students to move beyond the primary level. At present, the average enrolment per grade at the middle elementary level is less than one-half the average enrolment per grade at the primary level. This is considerably less than that of most other countries, and it is clear that the delivery system needs to significantly increase the proportion of students capable of studying beyond the primary level.
Pakistan has a Gross Enrolment Rate (GER) at the primary level of almost 80% - (when all primary enrolment is measured against the population 5 to 9 years of age). The difference of 80% between the Net Enrolment Rate (NER) of 62% and the GER is due to the number of primary students who are over 9 years of age or under 5 years of age. Given the number of repeaters in primary grades and the incidence of students beginning their primary school after age 5, it is likely that most of the difference is due to overage students. Numerically, this means that over 2.5 million students in primary school are over 9 years of age. Any reduction in this number, possibly by decreasing the repetition rate, may open up places in the primary system for some of children not currently in school.
PURPOSE            The National Education Assessment System (NEAS) was established to undertake systematic evaluations of student learning achievement across Pakistan and share the analytical results with both policy makers and practitioners to inform the education quality reform process. With data that is comparable across regions and over time, NEAS can identify gaps and bring about improvements in the curriculum, teaching and classroom support practices, as well as in the development of learning aids. For NEAS to be established as a student assessment system on par with international standards, several key steps towards institutional strengthening, capacity building and improvement in technical quality and processes should be undertaken.  Further investment in the technical proficiency of key staff is required, in both specialized skills (item writing, sampling, test procedures) and core expertise (report writing, comparative analysis). - This will facilitate improvements in test and instrument design, and will support robust research and analysis. Extending the dissemination of results and findings to primary stakeholders, particularly teacher trainers, textbook developers and policy makers is important. - Deeper understanding of the assessment process and stronger linkages between assessment systems and other education sub-departments (such as teacher professional development centers, examination units, curriculum wing, and textbook development) will aid better informed and strategic use of assessment information for improvements in student learning. The longer term sustainability of NEAS will depend not only its establishment as an autonomous body and but also the degree of integration between the federal and regional assessment centers so that cross learning and implementation of best practice is facilitated. With continuous improvements in test instruments and key technical skills, NEAS will be able to track overall system efficiency as well as individual student performance, and identify key areas for intervention that will lead to improvement of the quality and effectiveness of the education system. The National Education Assessment System for Pakistan aims to design and administer assessment mechanisms to establish administrative infrastructure and capacity for assessment administration, analysis and report writing, and to increase stakeholder knowledge and acceptance of assessment. There are three components to the project: 1) Capacity building would be the main component, where the execution of an assessment is unusually technical in nature. Any one of a number of small mistakes can cause serious delays in implementation and, in the worst case, lead to meaningless findings. Therefore, a high-level technical assistance, including the services o f a senior Technical Advisor, would be required to monitor and assist in all aspects of both central and provincial operations. 2) Pilot experiments will be required to determine what will produce the desired, valid results, and what process is most implemented. 3) Information dissemination. Through this component, the project will facilitate information dissemination about assessment to stakeholders in advance of the actual assessment to explain its purpose and to provide insight and reassurance about its intended uses. (EXTRACT FROM WORLD BANK REPORT)
INTERPRETATION OF TEST SCORES
Tests scores are norm-referenced or criterion-referenced. The results of norm-referenced tests tell us how a person compares against others. Results from criterion-referenced tests tell what a person has achieved against a set of learning goals. Norm-referenced tests usually use a set of standardized norms against which to measure the test taker. Criterion-referenced tests usually employ analysis by content cluster or content and performance standards. Educational tests are somewhat hard to interpret because they do not have a true zero point. We can talk about length of an object having a zero starting point but it is difficult to talk about true zero learning. The interpretation of test results is also handicapped by the inequality of units of measurement. While we know that there is exactly one inch between one inch and two inches, we cannot assume that there is an exactly similar distance between grades of B and a C or an A and a B. There are a variety of ways for interpreting test scores. For criterion-referenced tests these include raw scores and percentages. For norm-referenced tests, choices include raw scores and derived scores such as percentiles and grade equivalents. Grade norms have been widely used with standardized achievement tests especially at the elementary school level. The grade equivalent that corresponds to a particular raw score identifies the grade level at which the typical student obtains that raw score. Grade equivalents are based on the performance of students in the norm group in each of two or more grades. One of the most widely used and easily understood methods of describing test performance is percentile rank. A percentile rank (or percentile score) indicates a student's relative position in a group in terms of the percentage of students scoring lower. It should be remembered that percentiles and percentages are not the same. Another type of norm-referenced score is the standard score which indicates how much above or below the mean that the individual test taker fell. Standard scores depend on the statistics of the mean and the standard deviation. The normal curve is a symmetrical bell-shaped curve that has many useful mathematical properties. One of the most useful from the viewpoint of test interpretation is that when it is divided into standard deviation units, each portion under the curve contains a fixed percentage of cases. The normal curve is divided up into equal standard deviation units. Types of standard scores include z-scores, T-scores, normalized standard scores, stanines, normal curve equivalents, and standard age scores. One advantage of converting raw scores to derived scores is that a student's performance on different tests can be compared directly. This is usually done by means of a test profile. Some test publishers provide profiles that include reports for skill objectives as well as for full subtests. It is the responsibility of the test user to be knowledgeable about the adequacy of the norms for the test being used. Test scores need to relevant, representative, and up to date. It is the responsibility of the test author and publisher to adequately describe the test norms in the test manual so that the test user may make these decisions. While most published tests use national norms, some tests may use local norms. Local norms are typically prepared using either percentile ranks or stanines. Most test publishers will provide local norms if requested, but they also can be prepared locally. The test consumer should always practice caution in interpreting test scores. It should always be remembered that like all educational measurement, test scores always possess some degree of error.
·                     Norm-referenced test interpretation. In norm-referenced test interpretation, the scores that the applicant receives are compared with the test performance of a particular reference group. In this case the reference group is the norm group. The norm group generally consists of large representative samples of individuals from specific populations, such as high school students, clerical workers, or electricians. It is their average test performance and the distribution of their scores that set the standard and become the test norms of the group.
The test manual will usually provide detailed descriptions of the norm groups and the test norms. To ensure valid scores and meaningful interpretation of norm-referenced tests, make sure that your target group is similar to the norm group. Compare the educational level, the occupational, language and cultural backgrounds, and other demographic characteristics of the individuals making up the two groups to determine their similarity.
For example, consider an accounting knowledge test that was standardized on the scores obtained by employed accountants with at least 5 years of experience. This would be an appropriate test if you are interested in hiring experienced accountants. However, this test would be inappropriate if you are looking for an accounting clerk. You should look for a test normed on accounting clerks or a closely related occupation.
·                     Criterion-referenced test interpretation. In criterion-referenced tests, the test score indicates the amount of skill or knowledge the test taker possesses in a particular subject or content area. The test score is not used to indicate how well the person does compared to others; it relates solely to the test taker's degree of competence in the specific area assessed. Criterion-referenced assessment is generally associated with educational and achievement testing, licensing, and certification.
A particular test score is generally chosen as the minimum acceptable level of competence. How is a level of competence chosen? The test publisher may develop a mechanism that converts test scores into proficiency standards, or the company may use its own experience to relate test scores to competence standards.
For example, suppose your company needs clerical staff with word processing proficiency. The test publisher may provide you with a conversion table relating word processing skill to various levels of proficiency, or your own experience with current clerical employees can help you to determine the passing score. You may decide that a minimum of 35 words per minute with no more than two errors per 100 words is sufficient for a job with occasional word processing duties. If you have a job with high production demands, you may wish to set the minimum at 75 words per minute with no more than 1 error per 100 words.
It is important to ensure that all inferences you make on the basis of test results are well founded. Only use tests for which sufficient information is available to guide and support score interpretation. Read the test manual for instructions on how to properly interpret the test results. This leads to the next principle of assessment.
he table below presents both pros and cons for various test item types. Your selection of item types should be based on the types of outcomes you are trying to assess (see analysis of your learning situation). Certain item types such as true/false, supplied response, and matching, work well for assessing lower-order outcomes (i.e., knowledge or comprehension goals), while other item types such as essays, performance assessments, and some multiple choice questions, are better for assessing higher-order outcomes (i.e., analysis, synthesis, or evaluation goals). The italicized bullets below will help you determine the types of outcomes the various items assess.
With your objectives in hand, it may be useful to create a test blueprint that specifies your outcomes and the types of items you plan to use to assess those outcomes. Further, test items are often weighted by difficulty. On your test blueprint, you may wish to assign lower point values to items that assess lower-order skills (knowledge, comprehension) and higher point values to items that assess higher-order skills (synthesis, evaluation).
Item Type
Pros
Cons
Multiple Choice
(see tips for writing multiple choice questions below)
·                     more answer options (4-5) reduce the chance of guessing that an item is correct
·                     many items can aid in student comparison and reduce ambiguity
·                     greatest flexibility in type of outcome assessed: knowledge goals, application goals, analysis goals, etc.
·                     reading time increased with more answers
·                     reduces the number of questions that can be presented
·                     difficult to write four or five reasonable choices
·                     takes more time to write questions
True/False
(see tips for writing true/false questions below)
·                     can present many items at once
·                     easy to score
·                     used to assess popular misconceptions, cause-effect reactions
·                     most difficult question to write objectively
·                     ambiguous terms can confuse many
·                     few answer options (2) increase the chance of guessing that an item is correct; need many items to overcome this effect
Matching
·                     efficient
·                     used to assess student understanding of associations, relationships, definitions
·                     difficult to assess higher-order outcomes (i.e., analysis, synthesis, evaluation goals)
Interpretive Exercise
(the above three item types are often criticized for assessing only lower-order skills; the interpretive exercise is a way to assess higher-order skills w/ multiple choice, T/F, and matching items)
·                     a variation on multiple choice, true/false, or matching, the interpretive exercise presents a new map, short reading, or other introductory material that the student must analyze
·                     tests student ability to apply and transfer prior knowledge to new material
·                     useful for assessing higher-order skills such as applications, analysis, synthesis, and evaluation
·                     hard to design, must locate appropriate introductory material
·                     students with good reading skills are often at an advantage
Supplied Response
·                     chances of guessing reduced
·                     measures knowledge and fact outcomes well, terminology, formulas
·                     scoring is not objective
·                     can cause difficulty for computer scoring
Essay
·                     less construction time, easier to write
·                     encourages more appropriate study habits
·                     measures higher-order outcomes (i.e., analysis, synthesis, or evaluation goals), creative thinking, writing ability
·                     more grading time, hard to score
·                     can yield great variety of responses
·                     not efficient to test large bodies of content
·                     if you give the student the choice of three or four essay options, you can find out what they know, but not what they don't know
Performance Assessments
(includes essays above, along with speeches, demonstrations, presentations, etc.)
·                     measures higher-order outcomes (i.e., analysis, synthesis, or evaluation goals)
·                     labor and time-intensive
·                     need to obtain inter-rater reliability when using more than one rater
The table below presents tips for designing two popular item types: multiple choice questions and true/false questions.
Tips for Writing Multiple Choice Questions
Tips for Writing True/False Questions
·                     Avoid responses that are interrelated. One answer should not be similar to others.
·                     Avoid negatively stated items: "Which of the following is not a method of food irradiation?" It is easy to miss the the negative word "not." If you use negatives, bold-face the negative qualifier to ensure people see it.
·                     Avoid making your correct response different from the other responses, grammatically, in length, or otherwise.
·                     Avoid the use of "none of the above." When a students guesses "none of the above," you still do not know if they know the correct answer.
·                     Avoid repeating words in the question stem in your responses. For example, if you use the word "purpose" in the question stem, do not use that same word in only one of the answers, as it will lead people to select that specific response.
·                     Use plausible, realistic responses.
·                     Create grammatically parallel items to avoid giving away the correct response. For example, if you have four responses, do not start three of them with verbs and one of them with a noun.
·                     Always place the "term" in your question stem and the "definition" as one of the response options.
·                     Do not use definitive words such as "only," "none," and "always," that lead people to choose false, or uncertain words such as "might," "can," or "may," that lead people to choose true.
·                     Do not write negatively stated items, as they are confusing to interpret: "Thomas Jefferson did not write the Declaration of Independence." True or False?
·                     People have a tendency to choose "true," so design at least 60% of your T/F items to be "false" to further minimize guessing effects.
·                     Use precise words (100, 20%, half), rather than vague or qualitative language (young, small, many).
·                     Avoid making the correct answer longer than the incorrect answer (a give-away).
Developing the Test Blueprint
The first step in test development is to set the test specifications based on the relative importance of the content to be tested. The usual procedure is to develop a test blueprint which includes the test objectives and the cognitive level of the items. The test objectives are weighted by assigning a percentage of the test items to each objective. Thus, a test that covers five areas equally would have twenty percent of the items assigned to each objective. Some objectives may emphasize factual knowledge while others stress understanding or application of knowledge. Therefore, it is useful to place the objectives on one axis of the blueprint and the cognitive level on the other axis. In this way the test can be balanced by content and cognitive requirements. At this point, the instructor should review the length of the planned examination to be certain students can complete it in the time allowed. While speed in taking examinations could be relevant in some subject areas, speeded tests discriminate against the high ability but more methodical student. As a rule of thumb, students can respond to one relatively complex multiple choice item every 50 seconds. Items requiring calculation may take longer. Time for writing responses to an essay question also depend on the complexity of the task. An instructor might double the time for the class that it takes for the instructor to write an acceptable response.
Intellectual Task Example
Reiteration Recite verbatim "A stitch in time saves nine" Summarization Restate it in different words, such as "Make necessary repairs immediately to avoid having to spend a great deal more time making even more repairs later." Illustration Provide or identify examples of the rule in use, such as "Changing the oil in your car." Prediction Use the rule or principle to anticipate the consequences of certain acts, such as "Failure to change the oil in your car now will result in costly engine repairs later." Evaluation Employ the principle to make a judgment, such as "Is it better to change the oil now?"
A GOOD TEST AND ITS CHARACTERISTICS:
Characteristics of A Good Test
1- Validity:
A test is considered as valid when it measures what it is supposed to measure.
• There are different types of validity:
– Operational validity
– Predictive validity
– Content validity
– Construct validity
Operational Validity
– A test will have operational validity if the tasks required by the test are sufficient to evaluate the definite activities or qualities.
Predictive Validity
– A test has predictive validity if scores on it predict future performance
• Content Validity
– If the items in the test constitute a representative sample of the total course content to be tested, the test can be said to have content validity.
Construct Validity
– Construct validity involves explaining the test scores psychologically. A test is interpreted in terms of numerous research findings.
2- Reliability :
A test is considered reliable if it is taken again by the same students under the same circumstances and the score average is almost the constant , taking into consideration that the time between the test and the retest is of reasonable length.
• Reliability of a test refers to the degree of consistency with which it measures what it indented to measure.
• A test may be reliable but need not be valid. This is because it may yield consistent scores, but these scores need not be representing what exactly we want to measure.
• A test with high validity has to be reliable also. (the scores will be consistent in both cases)
• Valid test is also a reliable test, but a reliable test may not be a valid one
Different method for determining Reliability
Test-retest method
– A test is administrated to the same group with short interval. The scores are tabulated and correlation is calculated. The higher the correlation, the more the reliability.
• Split-half method
– The scores of the odd and even items are taken and the correlation between the two sets of scores determined.
• Parallel form method
– Reliability is determined using two equivalent forms of the same test content.
– These prepared tests are administrated to the same group one after the other.
– The test forms should be identical with respect to the number of items, content, difficult level etc.
– Determining the correlation between the two sets of scores obtained by the group in the two tests.
– If higher the correlation, the more the reliability.
3- Objectivity:
Objectivity means that if the test is marked by different people, the score will be the same . In other words, marking process should not be affected by the marking person's personality.
4- Comprehensiveness:
A good test should include items from different areas of material assigned for the test. e.g ( dialogue - composition - comprehension - grammar - vocabulary - orthography - dictation - handwriting )
5- Simplicity:
Simplicity means that the test should be written in a clear , correct and simple language , it is important to keep the method of testing as simple as possible while still testing the skill you intend to test . ( Avoid ambiguous questions and ambiguous instructions )
6- Scorability :    
Scorability means that each item in the test has its own mark related to the distribution of marks given by ( The Ministry of Education
7-Discriminating Power
• Discriminating power of the test is its power to discriminate between the upper and lower groups who took the test.
• The test should contain different difficulty level of questions.
8-Practicability
• Practicability of the test depends up on...
• Administrative ease                          • Scoring ease                      • Interpretative ease                            • Economy
9-Comparability
• A test possesses comparability when scores resulting from its use can be interpreted in terms of a common base that has a natural or accepted meanings
• There are two method for establishing comparability
– Availability of equivalent (parallel) form of test                         – Availability of adequate norms
10-Utility
• A test has utility if it provides the test condition that would facilitate realization of the purpose for which it is mean.

Post a Comment

0 Comments