Competitiveness and the Problem of Grading

Back in the 1990s, I used to do a noncompetitive martial art. It was so much fun that I almost dropped out of grad school to become an instructor. It also had its unfun aspects, which is why I’m not doing it anymore. What I want to talk about here is the noncompetitive aspect. What do you think was the effect of this egalitarian, non-selfish aspect? At our school, at least, among the serious practitioners, at least, everything became a competition. Not only did everyone compete on the mat, in every practice (“Ha, bastard, let me show you how much better my technique is!” “No, loser, it’s not about technique but about whether I can throw you. Watch!”) but off the mat, too (“Jack is not as loyal to the chief instructor as Jill.”).

After I quit the martial art, I started running competitively. I loooved racing. It turned out I was relatively good, so I would place pretty well in smaller races or in the age groups of bigger ones. That’s a great discovery when you grew up as a non-athletic klutz. I also liked competition. But the most interesting fact was that, among my training buddies, there was none of that creepy competitiveness my martial arts school had had. Except for the occasional macho jackass who didn’t know the difference between races and training, we trained hard but totally noncompetitively. After all, we had the actual races to see who could do what.

The author in Hopkington with some friends, before his fifth Boston Marathon.

It was also interesting to notice that the jackasses for whom every practice was a race actually didn’t do so well.

Being competitive is good in some instances and not so good at others. This is familiar to all athletes who have reflected on their practice. You can also draw a broader theoretical observation: the rules with which an institution operates might foster the behaviors it values and wants to promote — or do the very opposite. A competitive sport helped foster a collaborative training practice for me and my friends, which improved everyone’s performance. A vague ideology of noncompetition in the martial art helped foster insidious competition in which sniping, griping, and back-biting flourished and progress was random.

* * *

The first reading in my Introduction to Political Theory is a cool essay by Louis Menand on the purpose of college. Although it’s not formally a piece of political theory, it works like one: it takes a step back from a familiar institution — American higher education in this case — and explores theoretically the goals and values that institution tries to foster. One of Menand’s points is that, since 1945, American higher education has embraced two very different theories, meritocracy (use college as a sorting mechanism to identify the talented, the mediocre, and the untalented) and democracy (make sure all graduates have the skills and talents democratic citizenship requires).

Menand doesn’t spend very much time on what kinds of behaviors higher education fosters internally. This is not a complaint; it’s not a central part of his argument. But it is related, so it’s worth asking questions about one of the most importance incentive mechanisms of education: grading.

Educational institutions don’t, for the most part, explicitly aim at competition between students. But depending on their assessment systems, some implicitly do. The most obvious case is ranking, which is still frequently done in law schools. It’s not surprising. Law schools are a kind of a pedagogical North Korea (totally backward, but in a cheerful denial about it): their use of one-time, high-stakes instruments for assessment and the Socratic Method already prove that. The ranking approach is the most insidious one, as it creates incentives not only to compete with your peers, but to actively hurt them. Come to Law School — We’ll turn you into pedagogical Tonya Hardings!

But law schools aren’t the only offenders. Even something like grading on a curve can foster perversely competitive tendencies — while simultaneously demotivating effort at learning. Now “grading on a curve” can mean a couple of different things. It can simply mean setting the median grade in an assessment instrument, but not really caring about what the distribution should be. Or it can mean making sure the shape of the resulting distribution is that familiar bell-shape.

Is this the distribution of your students? In every class? Every section? Regardless of where you teach? Amazing!

Or other related things. The problem with each of those is that it either (a) assumes students in a class are always, for the most part, just like students in another iteration of the class or (b) insists that, regardless of what the students in a course are like, their final outcome should make them look like students in other iterations of the course. In other words, (a) you’ll always think you have some really weak students, some really strong students, and lots of middle-of-the-road students or (b) you think that no matter what the distribution of talent in your course, they should look like a random sample of few weak, few strong, and lots of middling types. I’ll admit, sure, it’s possible students are similar, but you’ll need data independent of your assessment instrument to show it. Good luck with that! It can be done, but I’ll bet dollars to donuts most of the folks who grade on a curve haven’t done their homework on this. So, on either option, it seems ill-motivated.

And meanwhile, students have incentives to compete, in the bad way I experienced in my martial arts school. If the median is set beforehand, making sure others do worse than you will help you. And if the course insists on a normal distribution, it sends a totally perverse signal to everyone that everyone aspiring to do their best and actually doing well is impossible. Should it be?

The Missing Chili Pepper, Part 2

Back in June, I reported on the observation David Cottrell and I made that on (RMP), having a chili pepper makes a difference. The chili pepper, recall, is RMP’s way of indicating that at least third of your raters think you are “hot,” whatever that means.

Here’s an updated version of that earlier result:

The relationship between easiness and professor quality conditional on hotness.

The relationship between easiness and professor quality conditional on hotness.

What this updated visual tells us better than our earlier one is that the effect is not linear. For instructors rated not “hot,” the quality increase as their easiness increases is significantly greater than for “hot” professors, especially at the hardest end.

I wondered at the end of the earlier post on whether we might see a difference conditional on instructor “hotness” in the official University of Michigan evaluations, which of course don’t ask anything about hotness. We now have the answer: weeeell…. sort of, with grains of salt and lots of qualifications.

Here’s what we did. We took our dataset of all UM evaluations for the College of Engineering and the College of Letters, Sciences, and the Arts from Fall 2008 to Winter 2013. That’s about 10,000 instructors, including professors, lecturers, and graduate student instructors. Our RMP data has 3,100 instructors, again of all varieties. We were only able to match 715 instructors in these two sets, largely because instructor names are in different formats — and rely on student spelling skills on RMP. (I admit I have a particularly hard name, but I’ve yet to see all my students get it right, and this does seem like a common problem.)

So, with 715 observations, there’s not much we can say, and the data are not conclusive. Here’s the best thing to show:

The relationship between instructor quality on RMP and "excellence" on UM evaluations.

The relationship between instructor quality on RMP and “excellence” on UM evaluations.

On the x axis is an instructor’s quality rating on RMP, on the y axis is his or her median response to the statement “Overall, this is an excellent instructor” (with 0 as “strongly disagree” and 5 as “strongly agree”). The red circles represent instructors who have chili peppers on RMP, the black ones those who don’t. This data doesn’t have instructors not on RMP.

There is a small positive correlation between RMP and Michigan’s own evaluations, which is good news for RMP (and to be expected.) “Hot” instructors are to the high ends of both scales. But there are also plenty of “not hot” instructors with high ratings.

This is what we feel comfortable concluding: if you tell me that you have a chili pepper on RMP, I can tell you it’s more likely than not that you are highly rated on both RMP and in the official evals. The opposite is not true: if you say you don’t have a chili pepper, I can’t tell you anything about your other ratings. And, of course, most University of Michigan instructors are not on RMP at all.

Still, seeing the “chili pepper” difference in our data takes us back to the question of what it might be measuring. I won’t repeat the speculations of the earlier post, but offer a few more. First, maybe it is about looks, after all, as Daniel Hamermersh and Amy Parker’s fantastically titled 2005 paper, “Beauty in the Classroom: Instructors’ Pulchritude and Putative Pedagogical Productivity,” suggests. Looks make a difference for professionals’ earnings, so why not for instructors’ ratings? Another, less depressing and creepy conclusion is that the chili pepper is measuring what psychologist Joseph Lowman has called instructors’ “interpersonal rapport”: positive attitude toward students, democratic leadership style, and predictability.

Of course, those two don’t have to be mutually inclusive: for a few students, the chili pepper may just be a report on how attractive they perceive the instructor to be while for others, as our anecdotal evidence suggests, it may be a measure of positive rapport. Either way, it’s too bad that has to frame the issue like a horny eighteen-year-old frat guy.

The Missing Chili Pepper, Part 1

In a conversation about, a colleague once yelled (after a few drinks), “I don’t care about the rating. All I want is the chili pepper!” Well, it turns out that if you get the chili pepper, you are more likely to have a good rating, too., or RMP, is a crowdsourced college instructor rating system that our colleagues generally hate, ignore, or know nothing about. There are reasons to be critical of it: It’s anonymous (so people can vent and be mean — or not even be students in your course). Its response rates are spotty and generally bimodal (most instructors have no rating,¬† and mainly it’s only those who like you or hate you bother to rate you). And some of the variables RMP cares about are not conducive to good learning: you can be rated on “easiness,” and, most problematically, students can decide whether you are “hot” by giving you a chili pepper. Why should anyone care about that? Why should they be paying attention to the way you look, for cryin’ out loud?

OK, so it’s not great. But there are things to say in favor of RMP, too. It doesn’t only track problematic things such as easiness or the professor’s perceived hotness. It asks about the professor’s clarity and helpfulness, too. A professor’s score on those items in fact constitutes his or her overall “quality” score. Professor quality simply is the average of his or her clarity and helpfulness scores.

Also, for better or for worse, in many places, such as at the University of Michigan, RMP is the only thing students have to evaluate instructor quality. At Michigan, we do not make our own, official teaching evaluation results available to the students. (We are hoping to change that. Stay tuned.) And despite the things some faculty say about students, they are not stupid. Talking to students about their use of RMP tells me that students are pretty good at understanding the ways in which the tool is imperfect.

But let’s return to that chili pepper. I am heading a project that looks at how we evaluate teaching at the University of Michigan. I have the luxury of working with a talented grad student, David Cottrell, who is doing wonderful things with data analysis. Among other things, he scraped¬† all UM instructors’ ratings from RMP (all 31,000 of them, for more than 3,000 instructors). Below is an interesting chart (click on the image for full size):


In case you don’t want to look at it carefully, let me summarize the details. On the x-axis is an instructor’s easiness score; on the y-axis the “Professor quality.” The red line represents those professors rated “hot” (which means that at least one third of their raters gave them the chili pepper). The blue line represents those instructors who didn’t receive a chili pepper.

Some observations:

  • There is some correlation between an instructor’s perceived easiness and his or her overall quality, but not strict. In other words, quality doesn’t just track easiness. RMP isn’t just a tracker for an easy A.
  • There is a stronger correlation between easiness and quality for instructors who don’t have the chili pepper.
  • So, most disturbingly, if you are not seen as hot, you have to be almost as “easy” as the “hardest” “hot” professor to get the same quality rating as that hardass!

In other words: there is a very significant rating penalty for instructors who do not receive a chili pepper.

A bunch of interesting — and troubling — issues arise. What does the chili pepper actually track? Whereas the other RMP measures are scales, the chili pepper is just a yes-no variable. What leads a student to give an instructor a chili pepper? Let’s assume, first, it is all about “hotness,” that is, some kind of sexual/sexualized desirability. Does that mean that only those students for whom the instructor is in the possible realm of sexual objects are even considering it: women for hetero men, men for hetero women, women for lesbian women, men for gay men, and so on? (My hunch is no — we aren’t all metrosexuals, but lots of people are able to talk about attractiveness beyond personal preferences.)

But I have a hunch that the chili pepper tracks something beyond a purely creepy sexual attraction. In fact, I think it might be another measure of the student liking the instructor. It’s not perfectly correlated, but as the chart shows, there is a correlation. It’s still very disturbing — and interesting — if students sexualize or objectify their appreciation for an instructor, at least when invited to do so in such terms.

Please do not suggest that the easy solution to these questions is for me and David to go through all those 3,000 instructors’ websites and see if they are actually hot. Whatever that might mean. But do suggest ways of thinking about the data. We are interested, really.

And in case you wonder why this post is called part 1: we will be able to see whether the chili pepper effect gets replicated in the evaluation data that the University of Michigan collects — and which certainly asks no questions about the instructor’s hotness.