They represent a continuum in how much freedom of response is allowed, ranging from restricted-response essays on one end to extended-response essays on the other.
- Restricted-response essay
- limits content and response to be given
- can limit via how narrowly question is phrased (e.g., as specific as a short-answer question)
- can limit via scope of the problem posed (e.g., with introduction like that of an interpretive exercise)
- therefore, can approach the objectivity of short-answer and interpretive exercises
- Extended-response essay
- great freedom so that allows problem formulation, organization, originality
- therefore, shares similar scoring difficulties with performance-based tasks
Represent a continuum in complexity and breadth of learning outcomes assessed, with interpretive exercises on the left end, restricted-response essays in the middle, and extended-response essays at the right end.
- Restricted-response essays
- For learning outcomes not readily assessed objectively
- Compared to extended-response questions, they target narrower learning outcomes, such as more specific mental processes (e.g., draws valid conclusions)
- Extended-response essays
- For learning outcomes not readily assessed objectively or with restricted response essays
- Compared to restricted-response questions, they assess broader learning outcomes, such as integrating a set of mental processes (e.g., integrates evidence to evaluate a scientific theory)
- Compared to interpretive exercises, both kinds of essays can assess more complex learning outcomes
- See Table 10.1 on page 240
- Measure complex learning outcomes not measured by other means
- Restricted-response essays: (i) require students to supply, not just identify, the answer and (ii) can target specific mental skills
- Extended-response essays: emphasize integration and application of high-level skills
- Can measure writing skills in addition to (or instead of) knowledge and understanding
- Easy to construct—but only if you don’t care what you actually measure and how reliably you do so!
- Contribute to student learning, directly and indirectly
- Unreliability of scoring (unless clear learning outcomes, good scoring rubrics, practice in scoring)
- Time-consuming to score—especially if follow guidelines. Can be impossible if conscientious in scoring, give good feedback, and have many students
- Limited sampling of content domain
- To call forth the intended student responses
Suggestions for writing essay questions
- Restrict use to learning outcomes that cannot be measured well by objective means (e.g., organization, originality)
- Write questions that can call forth the intended mental processes
- Easiest to do with restricted-response
- See sample stems on pp. 243-244
- For extended-response items, helps to state evaluation criteria in the question
- Make sure they do not target what has not been taught
- Phrase the question so that student’s task is clear and comparable for all
- Easiest with restricted response
- For extended-response, don’t define the task so tightly that its purpose is spoiled
- Rather, give explicit instructions on type of answer desired (e.g., "Your answer should be confined to 100-150 words. It will be evaluated in terms of the appropriateness of the facts and examples presented and the skill with which it is written.")
- Indicate approximate time limit for each question
- Give plenty of time (should be a power test not a speed test)
- Do not create overconcern about time
- Avoid optional questions
- Giving choices means students taking different tests
- They will not study the entire domain
- Review checklist on p. 248
The Scoring Problem
- There is no single correct or best answer to an essay question, so you need guidelines—"rubrics"--for rating the quality of answers.
- Rubric = a set of guidelines for the application of performance criteria to the responses and performances of students
Tackle it early--before you give the test
- Carefully specify your scoring criteria before you finalize the exam
- May cause you to rethink or modify the question and its accompanying performance criteria
- That, in turn, enhances likelihood of calling forth the intended responses
- Do an initial review of answers to a question to find exemplars or anchors for your scoring levels
- Make sure you can describe the kinds of performance (e.g., "lists two of the four key points") that qualify for each scoring level ("satisfactory," 2 points, etc.)
You need rubrics!
- Rubric=guidelines for the application of performance criteria to the responses and performances of students
Rubrics for restricted-response questions
- Write exemplar answer(s)
- Decide how to give points for each part expected for a full answer
- Decide the level of explanation necessary for full vs. partial credit
Analyticalrubrics for extended-response questions
- Specify the separate characteristics or dimensions you want to score (focus, elaboration, mechanics, etc. for an expository essay)
- Assign a series of levels to each characteristic (1-7, poor to excellent, etc.).
- Summarize the performance corresponding to each level ("main idea present but may not maintain consistent focus" for "adequate achievement" or 4 points for "focus"—example from table on p. 251)
- Result is a matrix against which to judge the elements of each essay
- Good for giving feedback to students
- See websites for examples (e.g., www.nwrel.org/eval/toolkit/traits/index.html )
Holistic rubrics for extended-response questions
- Provides single overall score, no separate dimensions
- Decide how many levels.
- Summarize the performance corresponding to each level
- Easier to construct and apply than analytical rubrics
- May correspond better to grading needs
- But provides less feedback to students about strengths and weaknesses
- Ease your burden by writing good questions in the first place (OH 4)
- Prepare outlines of expected answers in advance
- Can redesign poor questions
- Provides common basis for judging all students
- Standards less likely to shift during grading
- Analytical, if focus is on multiple dimensions of performance and giving feedback
- Holistic, if focus is on overall understanding rather than writing skill
- Irrelevant skills (legibility, spelling, etc.)
- Irrelevant or inaccurate factual information (risky to ignore, so consider warning in advance that will penalize)
- Maintains more uniform standards for a question
- Distributes effects of following a good or bad paper
- Minimizes halo effects
- Good questions and scoring rubrics can reduce its impact
- See bluffing strategies on p. 256
- Reconcile any big discrepancies
- Average the scores
B. Essay Questions (Short and Extended Response)
Essay questions are a more complex version of constructed response assessments. With essay questions, there is one general question or proposition, and the student is asked to respond in writing. This type of assessment is very powerful -- it allows the students to express themselves and demonstrate their reasoning related to a topic. Essay questions often demand the use of higher level thinking skills, such as analysis, synthesis, and evaluation.
Essay questions may appear to be easier to write than multiple choice and other question types, but writing effective essay questions requires a great deal of thought and planning. If an essay question is vague, it will be much more difficult for the students to answer and much more difficult for the instructor to score. Well-written essay questions have the following features:
Essay questions are used both as formative assessments (in classrooms) and summative assessments (on standardized tests). There are 2 major categories of essay questions -- short response (also referred to as restricted or brief ) and extended response.
Short response questions are more focused and constrained than extended response questions. For example, a short response might ask a student to "write an example," "list three reasons," or "compare and contrast two techniques." The short response items on the Florida assessment (FCAT) are designed to take about 5 minutes to complete and the student is allowed up to 8 lines for each answer. The short responses are scored using a 2-point scoring rubric. A complete and correct answer is worth 2 points. A partial answer is worth 1 point.
Sample Short Response Question
How are the scrub jay and the mockingbird different? Support your answer with details and information from the article.
Extended responses can be much longer and complex then short responses, but students should be encouraged to remain focused and organized. On the FCAT, students have 14 lines for each answer to an extended response item, and they are advised to allow approximately 10-15 minutes to complete each item. The FCAT extended responses are scored using a 4-point scoring rubric. A complete and correct answer is worth 4 points. A partial answer is worth 1, 2, or 3 points.
Sample Extended Response Question
Robert is designing a demonstration to display at his school’s science fair. He will show how changing the position of a fulcrum on a lever changes the amount of force needed to lift an object. To do this, Robert will use a piece of wood for a lever and a block of wood to act as a fulcrum. He plans to move the fulcrum to different places on the lever to see how its placement affects the force needed to lift an object.
Part A Identify at least two other actions that would make Robert’s demonstration better.
Part B Explain why each action would improve the demonstration.