KISS and tell grading
This is a long post in which I work out some new ideas I have about incorporating pass/fail formative grading in my courses. (See here for subsequent thoughts)
Contents
Experimenting with undergraduate assessment
For the last few years I’ve been experimenting with different types of assessment and assignments.
I’m an English professor, Digital Humanist, and medievalist, and so my overall goal is to create an environment in which students are self-motivated to define, think, research, and communicate concretely and clearly about problems in the humanities. This work has involved thinking about what it is we are trying to do when we teach “the essay” (see my and my students’ work on the unessay). It has also involved a heavy use of blogging and the introduction of other elements (most recently poster sessions) designed to help establish the class as a model scholarly eco-system.
Experiments with grading
And it has involved grading.
Starting about five years ago, I began experimenting with making a formal distinction between formative and summative grades (formative grades are grades that help a student and instructors monitor ongoing progress; summative grades are grades that assess and report how well a student has progressed retrospectively). After I introduced blogging and the unessay, I began to experiment with pass/fail marking. Since an important element of both my approach to blogging and the unessay is that students are given complete freedom to explore their own interests and skills, it seemed counterproductive (and indeed disingenuous) to then assign summative percentage grades on the basis of some exterior rubric. I still do try and assess student work qualitatively when it makes sense pedagogically—most of my assignments have a component that requires students to establish the standard they are trying to meet, for example, and that is something one can have partial success at defining and/or meeting. But given my belief that the biggest problem facing our students is fear of grades—a fear that makes them err on the side of conservativism and underperformance to a degree I consider to be almost an academic offense—it makes very little sense to me to encourage this behaviour by making them perform against a comparative grading system.
Grades higher, but in response to greater effort.
The results of this approach have been very encouraging. On the whole, I find, my students write better, participate more enthusiastically, certainly show better attendance, and, I believe, take greater responsibility for their learning. And on the whole they also, I think, write better (i.e. more positive) course evaluations (my score on ratemyprofessors.com has risen from a pre-2009 average of 2.5 to a post-2009 average of 3.1, and indeed 3.6 since I introduced the unessay in 2012).1
The grades my students earn have on the whole been higher for my classes—something that is known to affect student evaluations positively. But this also correlates to far greater effort: my average attendance is now around about 90% with my worst students clocking in at about 60% and students write on average a couple of hundred words a week in their blogs (the equivalent of an extra essay or two per semester); you’d expect students putting in that kind of effort to score higher. Moreover, although the grades were higher on average, I did not find them to be out-of-line with individual performances: there were more As and Bs because there were more students doing what I consider to be A- and B-level work; and there were fewer Ds and Fs because the minimum amount of effort my students were putting in was also greater than in my earlier offerrings.
In other words, grading more work on a pass-fail basis didn’t seem to inflate my grades all that much. Instead of earning Cs through my qualitative grading of their work, my C students instead tended to hand in about a C-level’s worth of pass-fail work. On the whole, I found, students still tended to perform at a level equivalent to the qualitative grade they received on their final essays or exams.
Rewarding efficiency rather than accomplishment.
This changed this past semester. For the first time since I started this new approach to grading, I found that my final grades were both exceptionally high and out of line with my sense of the actual quality of work I was receiving. In part this was because this group also worked hard (the average student, for example, exceeded the minimum number of required blog entries by almost 25%). But I also had the impression that it was because they were becoming better at gaming the assessment system I had established: because I’d not had any problem with grade inflation, I had not been all that careful about how I used bonus marks, for example, and it was entirely possible to get very significant boosts to your GPA by correctly playing for bonus marks. This year, for the first time, I had a serious problem with students who grades were a better reflection of their efficiency in gaining bonus marks than their actual learning as demonstrated in their year-end summative exercises.
What needs to change.
So it is now time to change this. In part what I needed to do was simply sit down and review my standard distribution of marks: when you make as many changes as I have in the last few years, it is necessary to sit down every so often and do a systematic review.
But it is also the case that I need to come up with a better approach to the use of pass-fail grades. While my experience agrees with studies that show that pass/fail can improve student outcomes (via Wayback Machine), I also work within a system that uses grades to distinguish between different levels of accomplishment: I was not happy with my results this semester precisely because I felt that they inflated the grades of some students above their actual level of performance. What I’m need is to adjust my system so that student performance on pass-fail grades reflects appropriate level performance and subject-mastery and help me distinguish among (and identify) students who are having a more or less difficult time with the material.
Two interesting approaches to pass-fail assessment.
A bit of research has pointed me at two interesting approaches to this problem: “Standards Based Grading” (SBG) and “Specifications Grading” (SG).
These systems are variants on each other. The basic idea behind them is that “pass/fail” does not have to mean “submitted/not-submitted.” In the past, I’ve essentially treated the equivalent of a D (i.e. poor) as being a “pass” in my pass-fail assessment: if the work was handed in and it looked like a reasonably good faith effort, then it received a grade of pass, regardless of whether it showed actual mastery of the exercise or topic.
The idea behind both SBG and SG, instead, is that work “passes” when it the learning goals for the exercise or unit have been met: i.e. that a “pass” is actually roughly the equivalent of a badge rather than a numeric grade and that passing a course means demonstrating actual mastery of specific skills and learning outcomes rather than simply handing in a set amount of better or worse work.
SG takes this one step further, by establishing different workloads for different grades. In this system, students are told what the minimum amount of work expected for a specific grade is—and what additional work they can do if they want to get more than that grade. So, for example, students might be told that a “C” grade requires the successful completion of one exercise from each unit to a minimum standard, while an “A” would require completion of three such exercises per unit. This allows students to decide how much work they want to put into a course while requiring them to complete the work they do do to an appropriately high standard.
What works and doesn’t for me.
As much as I like the core of this approach (i.e. treating pass/fail like a type of badging and requiring mastery rather than just submission in order to “pass”), there are some things about it that, if I’m understanding things correctly at least, seem less attractive to me:
They seem to require detailed rubrics
As described by many practitioners, both SBG and SG require detailed rubrics. I.e. students are assessed on whether or not they have carried out the details of the assignment they were given: if they do (SBG), or if they do it to a pre-determined level of accomplishment (SG), they pass; if they do not, then they don’t.
This is antithetical to the approach to fostering student responsibility and judgement that I have taken in designing the Unessay and using blogs in my class. I believe that a major problem among our undergraduates has to do with what I have described as the exercise as compulsory figure problem. I.e. a sense that equates coursework with makework and fails to appreciate the extent to which it is actually a trial run for work/research/learning in the real world.
My response to this has been to reduce the amount of detail I put in my rubrics rather than increase it. I now tell students why they are being given an assignment, what I hope they’ll learn from it, and then leave them free to work out for themselves what technical parameters might be necessary for excellence in relation to these goals. I don’t want to go back to encouraging them to see assignments as lists of things they need to check off.
They have poor mechanisms for identifying and rewarding qualitative excellence.
Neither system seems that good at recognising qualitative excellence in any single performance. In the case of SBG, this is perhaps a feature rather than a bug: the whole point of adopting SBG is to recognise better-than-minimal performance instead of better-than-average performance; one of the reasons pass/fail systems reduce student anxiety and speed up grading is that they avoid attempting to find small gradations among adequate performances. Presumably people who feel that it is important to distinguish among different levels of adequate performance will prefer a system (such as the traditional four point ABCD grading system) that is designed to do this.
In the case of SG, however, the flaw is a little more serious, because SG has been designed to allow students to choose different levels of achievement: a student who is happy with the minimal grade will do the minimum number of exercises to the level required to pass the course; but SG also offers students an opportunity to gain higher-than-minimum grades by completing additional pass/fail work.
What is wrong with this is that it treats excellence as a function of quantity rather than quality. An average student who is not able to clear the minimum standard for any one assignment by very much can never-the-less earn a greater-than-average grade simply by completing more exercises to to this minimum standard. If we assume that an instructor sets “pass” at “B,” then it would be entirely possible under this system for a student to earn an “A” by simply doing lots of B-level work.
This is actually the problem that is causing me to reevaluate my grading in the first place. The problem I had with my grade distribution this past semester was not that students weren’t completing lots of work—they were, in fact, working extremely hard, evern on average. It was that students were receiving a grade of “excellent” for what was, in essence, just more than expected amounts of average work. What really separates excellence from average is not the amount of work you do, but the quality of that same work. It is entirely possible to shine from your first assignment.
My adaptation of these Standards-Based and Specification Grading
So what I need is a system that does the following:
- Contributes to reducing student anxiety by marking formative work on a pass-fail basis
- Maintains standards by setting the bar for “pass” at “meets learning goals” rather than “submitted a good faith effort.”
- Provides a system for recognising and rewarding qualitatively exceptional work
- Does not systematically result in students receiving higher term grades than the qualitative level of their accomplishment reflected in their semester-end summative scores would suggest. I.e. on the whole, I want C students to have C grades, and while I am prepared to accept that a particularly hard working C student might earn a somewhat higher grade for exceptional effort, I do not want average students consistently earning very much higher than “average” grades (compared to their peers in other classes) for work that is not also qualitatively higher than average.
The way to do this, it seems to me, is the following:
- Maintain a distinction (and rough balance) between formative and summative grades and assignments
- Grade formative assignments on a 3 point scale: Appropriate, Below expectations, and Fail
- “Appropriate” means “shows achievement of learning goal” (i.e. at least a “satisfactory”) rather than simply “submitted and complete” or “more right than wrong.”
- “Below expectations” means that a submission was submitted and represents a good faith effort, but does not show the student has achieved the learning goal. In this case a student may revise and resubmit the work, with an accompanying explanation of what they revised (i.e. similar to that required of researchers by journals). Each resubmission carries with it a 3% penalty on the student’s final grade.
- “Fail” means that the work was not submitted or does not represent a good faith effort. “Fail” cannot be done over.
- Ensure that the value of all formative assignments is less than or equal to the average expected grade for the course (i.e. in a third year course the U of L, this would mean that the total value of all formative assignments should be 80% or less, since the average for a third year course is “B”).
- Allow students to collect tokens for every formative assignment they complete that exceeds expectations. When these tokens are added to the formative exercises, the total should be 100% of the formative grade.
An example.
This is a bit difficult to explain (right now, at any rate, for me) in abstract, so here’s an example from my upcoming medieval literature class to show how the system is supposed to work:
- Participation and Attendance 5% (pass/fail)
- Attendance
- Prospectus
- Poster presentation (i.e. “Slam”)
- Quizzes
- Formative 40% (Appropriate/Inappropriate/Fail):
- Weekly Blog
- First essay
- Seminar Leadership
- Poster
- Badges 10% (Awarded/Not awarded)
- Weekly Blog
- First essay
- Seminar Leadership
- Poster
- Summative 45% (A/B/C/D/F)
- Research Essay
- Final Exam
Note: All assignments are worth equal weight within their category
1 I should say that I don’t consider RateMyProfessor.com (or any other single metric) to be a measure of good teaching per se, since there are different kinds of good teaching including types that are missed by the RateMyProfessor approach. However, RateMyProfessor.com is a metric of one aspect of teaching (I suspect it primarily amplifies the far end of student satisfaction). In this case, the swing from those who were dissatisfied to those who were satisfied suggests that my new methods are pleasing the pleased much more than they are alienating the alienated, which is a change.