Grading the 2010 AP English Language Exam: Prepare to be Assimilated
This is the third of a five-part series on the mysteries and realities of the AP English Language Exam and its grading process. For more on the marathon that is AP exam grading, see Part 1, Part 2, Part 4, and Part 5 (coming soon!).
While the grading standards set forth in the official grading rubric for each essay question might seem to be straightforward, you’ll find that most graders disagree strongly as to what makes for an “adequate” essay versus an “inadequate” essay—and that those disagreements are even more stringent when you’re discussing minor variations: What distinguishes an inadequate 3 from an inadequate 4? An adequate 6 from an adequate 7?
The sorts of natural disagreements that any two individuals might have over these sorts of questions are complicated by grader demographics. I would estimate that approximately 50% of the 2010 English Language exam graders were high school teachers; another 35% or so were teachers at the community college level, and the remaining 15% were either graduate students or faculty members at major universities. Think, for a moment, about the implications of that spread—a university professor almost certainly grades papers of a higher quality than a community college professor, and a community college professor almost certainly grades papers of a higher quality than most high school teachers. This means that graders come into the process with widely divergent expectations which must be reconciled so that scores will be standardized and no student’s scores will be skewed by a single grader’s prejudice against writers who regularly split their infinitives.
In this sense, the “Reading,” which is how ETS refers to the week-long grading process, is a lot like the Borg—if you’re not prepared to be assimilated into a greater collective, you’re in for a rude awakening. The “Chief Reader” presides over the grading process; for the 2010 English language exam, this was a BYU professor named Gary Hatch who, tragically, died just a month before the Readingwas to convene and was replaced by a University of professor named David Joliffe. Joliffe oversees the entire process of grading the three essays, but three “Question Leaders” are also designated to oversee the grading of each question. These question leaders oversee between 300 and 400 graders, who are grouped into tables of ten, and each table is presided over by a “Table Leader.” When more than 1,100 graders descended on the 2010 AP English Language exam on June 11, 2010, they were greeted by table leaders who had already been onsite for two days deriving a consensus as to the which essays merited a score of 3, which a score of 4, etc.
These table leaders, in conjunction with the question leader, had copied sample essays that reflected the entire range of scoring possibilities to help graders develop standardized scoring criteria—but graders had to fall in line with the standards that table leaders developed over two days in just four hours. Naturally, this would produce heated disagreements at each table as to why one sample essay deserved a 4 when a university professor saw it as a 2 and a high school teacher saw it as a 6. For four hours we haggled over sample essays while the question leader periodically polled the room to determine whether we were arriving at consensus. When I raised concerns over the last sample essay before graders would switch over to “live books” of ungraded exams, my (wonderful) table leader stared at me with exasperation: “Monk, we have no more time for disagreement. This is a 7. See it as a 7. Be assimilated.”
So I abandoned my individual will and became part of the Borg.
Of course, readers were still adjusting to the grading standards at this point, so table leaders periodically spot checked every reader at their table during the first two days, re-grading every fifth essay or so. When table leaders felt that their charges were straying too far from the established standards—a scoring difference of more than one point—they pulled that reader aside and explained why the essay he or she had given an 8 was really a 4. My weak, fleshy brain was repeatedly disciplined for not adopting the mechanical correctness of the Borg. I resolved to do better.
The following are excerpts from actual exams; each excerpt is in italics, with my commentary in normal typeface.
There were two problems in the grading the exam that were particularly problematic for me. The first problem arose when students made statements that were clever—or at least required thought—but I wasn’t sure whether or not the subtleties of their prose were intentional or not. For instance:
• Humorists are a big joke. How is one to interpret this?
• Humorists are like Santa Claus on Christmas Eve. He may not be real or the truth but he brings smiles to all. In an essay defending claiming that humorists are important players in society, how much credence can I give to the ironic undertones here?
• Harassment charges would be brought down on sexually explicit comics like Thor’s hammer. Maybe—but more to the point, is it wrong to invoke the god of thunder in an academic essay?
• Humorists are why we aren’t a communist nation. They keep us divided. This might be true . . . but does the student actually understand this argument?
• Humorists don’t wear the condom of censorship while breeding out the beautiful baby known as the naked truth. Well, when you put it that way, I guess they don’t. But do I reward you for a sophisticated metaphor or punish you for using informal language?
The second problem was a student tendency to describe works that I considered “serious” as “humorous” because they did political work—and the students understood that de Botton wanted them to talk about the political function of humor.
• As a result, I got students who cited the following works as “humorous” literature: Stowe’s Uncle Tom’s Cabin (nothing like slavery for a good laugh!), Shakespeare’s Macbeth, Hamlet, and King Lear; Machiavelli’s The Prince (ah, the humor of despotism!); Orwell’s 1984; Miller’s The Crucible (and repression!); Stephanie Meyer’s Twilight series (and ineptitude! Okay—that was a bit harsh); Conrad’s Heart of Darkness (did he even think about the title?); Thoreau’s Walden; Sinclair’s The Jungle (nothing funnier than drowning in a vat of boiling fat); Tolkein’s Lord of the Rings series (I admit that hobbits are funny); Ellison’s Invisible Man; Golding's The Lord of the Flies (whose macabre depictions of adolescent cruelty are NOT funny); Melville’s Bartleby (nothing like depression and suicide for a good laugh!); Ayn Rand’s The Fountainhead; and Dante’s Divine Comedy. At least this last one had “comedy” in the title, but every one of these books is far more dark than comic, more tragic than titillating.
• We also had students who suggested that films were funny, including The Dark Knight and The Godfather. Yup—a barrel of laughs, those two. "Why so serious?"
• Perhaps most inexplicable were the list of political figures that students described as “humorists.” These included Gandhi (!), Martin Luther King Jr., John Locke, and Thomas Hobbes. Nothing funnier than Leviathan, let me tell you.
As a grader I wanted to reward students for what were, occasionally, intelligent analyses of challenging texts—but I also had to consider the fact that these students failed to understand the basic point of de Botton’s argument that humor makes political statements possible in circumstances when serious works such as the ones above would have been repressed or censored. That was a tough balancing act. Similarly, should I reward students whose arguments were sound but whose facts were faulty?
• Kurt Vonnegut, in his novel Animal Farm, satirizes communism. Well, no. Orwell satirizes communism—so does the student get credit or not?
• Dickens satirizes the French government in Les Miserables. Well, no. Victor Hugo does—credit or not?
• Satirical writers have been around since we came to North America. In Praise of Folly is one of the greats. The writer shows that no changes are ever occurring and we are a corrupt nation. First of all, In Praise of Folly was written by a Dutchman while Columbus was still alive, so there was no “nation.” But the rest of the argument was sound . . .
• Like a woman trying to cover up her blemish, society attempts to cover up its mistakes using a little puff of powder. Back to the ways in which comedians are pimples . . .
• Mark Twain wants to have someone institute an emancipation policy on slavery in Huckleberry Finn. Well—no, he doesn’t, because Abraham Lincoln emancipated the slaves 20 years before Huck Finn was ever published! But he does criticize slavery as an historical institution . . .
It was hard to know which students deserved the benefit of the doubt, and that question often made a significant difference in score.
Coming soon--Part 4: Eyesore
While the grading standards set forth in the official grading rubric for each essay question might seem to be straightforward, you’ll find that most graders disagree strongly as to what makes for an “adequate” essay versus an “inadequate” essay—and that those disagreements are even more stringent when you’re discussing minor variations: What distinguishes an inadequate 3 from an inadequate 4? An adequate 6 from an adequate 7?
The sorts of natural disagreements that any two individuals might have over these sorts of questions are complicated by grader demographics. I would estimate that approximately 50% of the 2010 English Language exam graders were high school teachers; another 35% or so were teachers at the community college level, and the remaining 15% were either graduate students or faculty members at major universities. Think, for a moment, about the implications of that spread—a university professor almost certainly grades papers of a higher quality than a community college professor, and a community college professor almost certainly grades papers of a higher quality than most high school teachers. This means that graders come into the process with widely divergent expectations which must be reconciled so that scores will be standardized and no student’s scores will be skewed by a single grader’s prejudice against writers who regularly split their infinitives.
In this sense, the “Reading,” which is how ETS refers to the week-long grading process, is a lot like the Borg—if you’re not prepared to be assimilated into a greater collective, you’re in for a rude awakening. The “Chief Reader” presides over the grading process; for the 2010 English language exam, this was a BYU professor named Gary Hatch who, tragically, died just a month before the Readingwas to convene and was replaced by a University of professor named David Joliffe. Joliffe oversees the entire process of grading the three essays, but three “Question Leaders” are also designated to oversee the grading of each question. These question leaders oversee between 300 and 400 graders, who are grouped into tables of ten, and each table is presided over by a “Table Leader.” When more than 1,100 graders descended on the 2010 AP English Language exam on June 11, 2010, they were greeted by table leaders who had already been onsite for two days deriving a consensus as to the which essays merited a score of 3, which a score of 4, etc.
These table leaders, in conjunction with the question leader, had copied sample essays that reflected the entire range of scoring possibilities to help graders develop standardized scoring criteria—but graders had to fall in line with the standards that table leaders developed over two days in just four hours. Naturally, this would produce heated disagreements at each table as to why one sample essay deserved a 4 when a university professor saw it as a 2 and a high school teacher saw it as a 6. For four hours we haggled over sample essays while the question leader periodically polled the room to determine whether we were arriving at consensus. When I raised concerns over the last sample essay before graders would switch over to “live books” of ungraded exams, my (wonderful) table leader stared at me with exasperation: “Monk, we have no more time for disagreement. This is a 7. See it as a 7. Be assimilated.”
So I abandoned my individual will and became part of the Borg.
Of course, readers were still adjusting to the grading standards at this point, so table leaders periodically spot checked every reader at their table during the first two days, re-grading every fifth essay or so. When table leaders felt that their charges were straying too far from the established standards—a scoring difference of more than one point—they pulled that reader aside and explained why the essay he or she had given an 8 was really a 4. My weak, fleshy brain was repeatedly disciplined for not adopting the mechanical correctness of the Borg. I resolved to do better.
The following are excerpts from actual exams; each excerpt is in italics, with my commentary in normal typeface.
There were two problems in the grading the exam that were particularly problematic for me. The first problem arose when students made statements that were clever—or at least required thought—but I wasn’t sure whether or not the subtleties of their prose were intentional or not. For instance:
• Humorists are a big joke. How is one to interpret this?
• Humorists are like Santa Claus on Christmas Eve. He may not be real or the truth but he brings smiles to all. In an essay defending claiming that humorists are important players in society, how much credence can I give to the ironic undertones here?
• Harassment charges would be brought down on sexually explicit comics like Thor’s hammer. Maybe—but more to the point, is it wrong to invoke the god of thunder in an academic essay?
• Humorists are why we aren’t a communist nation. They keep us divided. This might be true . . . but does the student actually understand this argument?
• Humorists don’t wear the condom of censorship while breeding out the beautiful baby known as the naked truth. Well, when you put it that way, I guess they don’t. But do I reward you for a sophisticated metaphor or punish you for using informal language?
The second problem was a student tendency to describe works that I considered “serious” as “humorous” because they did political work—and the students understood that de Botton wanted them to talk about the political function of humor.
• As a result, I got students who cited the following works as “humorous” literature: Stowe’s Uncle Tom’s Cabin (nothing like slavery for a good laugh!), Shakespeare’s Macbeth, Hamlet, and King Lear; Machiavelli’s The Prince (ah, the humor of despotism!); Orwell’s 1984; Miller’s The Crucible (and repression!); Stephanie Meyer’s Twilight series (and ineptitude! Okay—that was a bit harsh); Conrad’s Heart of Darkness (did he even think about the title?); Thoreau’s Walden; Sinclair’s The Jungle (nothing funnier than drowning in a vat of boiling fat); Tolkein’s Lord of the Rings series (I admit that hobbits are funny); Ellison’s Invisible Man; Golding's The Lord of the Flies (whose macabre depictions of adolescent cruelty are NOT funny); Melville’s Bartleby (nothing like depression and suicide for a good laugh!); Ayn Rand’s The Fountainhead; and Dante’s Divine Comedy. At least this last one had “comedy” in the title, but every one of these books is far more dark than comic, more tragic than titillating.
• We also had students who suggested that films were funny, including The Dark Knight and The Godfather. Yup—a barrel of laughs, those two. "Why so serious?"
• Perhaps most inexplicable were the list of political figures that students described as “humorists.” These included Gandhi (!), Martin Luther King Jr., John Locke, and Thomas Hobbes. Nothing funnier than Leviathan, let me tell you.
As a grader I wanted to reward students for what were, occasionally, intelligent analyses of challenging texts—but I also had to consider the fact that these students failed to understand the basic point of de Botton’s argument that humor makes political statements possible in circumstances when serious works such as the ones above would have been repressed or censored. That was a tough balancing act. Similarly, should I reward students whose arguments were sound but whose facts were faulty?
• Kurt Vonnegut, in his novel Animal Farm, satirizes communism. Well, no. Orwell satirizes communism—so does the student get credit or not?
• Dickens satirizes the French government in Les Miserables. Well, no. Victor Hugo does—credit or not?
• Satirical writers have been around since we came to North America. In Praise of Folly is one of the greats. The writer shows that no changes are ever occurring and we are a corrupt nation. First of all, In Praise of Folly was written by a Dutchman while Columbus was still alive, so there was no “nation.” But the rest of the argument was sound . . .
• Like a woman trying to cover up her blemish, society attempts to cover up its mistakes using a little puff of powder. Back to the ways in which comedians are pimples . . .
• Mark Twain wants to have someone institute an emancipation policy on slavery in Huckleberry Finn. Well—no, he doesn’t, because Abraham Lincoln emancipated the slaves 20 years before Huck Finn was ever published! But he does criticize slavery as an historical institution . . .
It was hard to know which students deserved the benefit of the doubt, and that question often made a significant difference in score.
Coming soon--Part 4: Eyesore
Comments
I assume those essayists invoking such humorous books were awarded a 9?