The Missing Chili Pepper, Part 1

In a conversation about, a colleague once yelled (after a few drinks), “I don’t care about the rating. All I want is the chili pepper!” Well, it turns out that if you get the chili pepper, you are more likely to have a good rating, too., or RMP, is a crowdsourced college instructor rating system that our colleagues generally hate, ignore, or know nothing about. There are reasons to be critical of it: It’s anonymous (so people can vent and be mean — or not even be students in your course). Its response rates are spotty and generally bimodal (most instructors have no rating,  and mainly it’s only those who like you or hate you bother to rate you). And some of the variables RMP cares about are not conducive to good learning: you can be rated on “easiness,” and, most problematically, students can decide whether you are “hot” by giving you a chili pepper. Why should anyone care about that? Why should they be paying attention to the way you look, for cryin’ out loud?

OK, so it’s not great. But there are things to say in favor of RMP, too. It doesn’t only track problematic things such as easiness or the professor’s perceived hotness. It asks about the professor’s clarity and helpfulness, too. A professor’s score on those items in fact constitutes his or her overall “quality” score. Professor quality simply is the average of his or her clarity and helpfulness scores.

Also, for better or for worse, in many places, such as at the University of Michigan, RMP is the only thing students have to evaluate instructor quality. At Michigan, we do not make our own, official teaching evaluation results available to the students. (We are hoping to change that. Stay tuned.) And despite the things some faculty say about students, they are not stupid. Talking to students about their use of RMP tells me that students are pretty good at understanding the ways in which the tool is imperfect.

But let’s return to that chili pepper. I am heading a project that looks at how we evaluate teaching at the University of Michigan. I have the luxury of working with a talented grad student, David Cottrell, who is doing wonderful things with data analysis. Among other things, he scraped  all UM instructors’ ratings from RMP (all 31,000 of them, for more than 3,000 instructors). Below is an interesting chart (click on the image for full size):


In case you don’t want to look at it carefully, let me summarize the details. On the x-axis is an instructor’s easiness score; on the y-axis the “Professor quality.” The red line represents those professors rated “hot” (which means that at least one third of their raters gave them the chili pepper). The blue line represents those instructors who didn’t receive a chili pepper.

Some observations:

  • There is some correlation between an instructor’s perceived easiness and his or her overall quality, but not strict. In other words, quality doesn’t just track easiness. RMP isn’t just a tracker for an easy A.
  • There is a stronger correlation between easiness and quality for instructors who don’t have the chili pepper.
  • So, most disturbingly, if you are not seen as hot, you have to be almost as “easy” as the “hardest” “hot” professor to get the same quality rating as that hardass!

In other words: there is a very significant rating penalty for instructors who do not receive a chili pepper.

A bunch of interesting — and troubling — issues arise. What does the chili pepper actually track? Whereas the other RMP measures are scales, the chili pepper is just a yes-no variable. What leads a student to give an instructor a chili pepper? Let’s assume, first, it is all about “hotness,” that is, some kind of sexual/sexualized desirability. Does that mean that only those students for whom the instructor is in the possible realm of sexual objects are even considering it: women for hetero men, men for hetero women, women for lesbian women, men for gay men, and so on? (My hunch is no — we aren’t all metrosexuals, but lots of people are able to talk about attractiveness beyond personal preferences.)

But I have a hunch that the chili pepper tracks something beyond a purely creepy sexual attraction. In fact, I think it might be another measure of the student liking the instructor. It’s not perfectly correlated, but as the chart shows, there is a correlation. It’s still very disturbing — and interesting — if students sexualize or objectify their appreciation for an instructor, at least when invited to do so in such terms.

Please do not suggest that the easy solution to these questions is for me and David to go through all those 3,000 instructors’ websites and see if they are actually hot. Whatever that might mean. But do suggest ways of thinking about the data. We are interested, really.

And in case you wonder why this post is called part 1: we will be able to see whether the chili pepper effect gets replicated in the evaluation data that the University of Michigan collects — and which certainly asks no questions about the instructor’s hotness.

11 thoughts on “The Missing Chili Pepper, Part 1

  1. I wonder if you’d also find some interesting trends on hotness/easiness if you compared based on gender of professor.

  2. What if professor quality and easiness are highly correlated with friendliness (a category that RMP doesn’t account for), which is in turn correlated with hotness. This was certainly true during my time as an undergraduate student. And, yes, I gave both attractive male and female professors the chili pepper.

    With that being said, it appears that you are on to something when you suggest that the chili pepper is a sign of the “level of the student liking the professor.” It makes sense. I never rated a professor because of their looks, but I did take the time to fill out the form, because they were friendly.

    In short: friendlier people are more likely to get good reviews, and good looking people in academica are usually pretty darn friendly.

  3. I would want to see similar charts for helpfulness and clarity. Easiness is such a dodgy category, capable of reflecting either great teaching (which makes hard work and critical thinking seem easy) or terrible teaching (where teachers pander to students and don’t push them at all). The other categories are less ambivalent and might give indications about whether the hotness is a result of good teaching or the good rating is a result of perceived hotness.

    I’d also want to see the ratings of individual groups based on whether they thought the teacher was hot. Do students who give their teachers chili peppers always have a favorable view of their easiness, helpfulness, and clarity? And in what relation? Are they more likely to give high “easiness” ratings than helpfulness or clarity ratings?

    And if there were a way of evaluating the comments as as well (maybe word clouds compiled from similarly-voting students?) you could see whether the comments are substantive or frivolous.

    Also, I imagine that one chili begets more chilis, as students feel more comfortable or more inclined to weigh in on that once someone else has. So perhaps the instructors to look at are the ones who have only a few chilis–do the students who give those professors chilis vote with their head or their hormones?

  4. As a UM alum who remembers the ’80s version of “Advice” I’m surprised to read, “At Michigan, we do not make our own, official teaching evaluation results available to the students. (We are hoping to change that. Stay tuned.)” A quick Google search shows that student government does have an online version of it, though it also looks like the site is being revamped and is currently unavailable. Is that a technical glitch or has there been some policy reversal (or series thereof) restricting student access to the numbers?

    • John, the Central Student Government (formerly Michigan Student Assembly) used to make them available, but hasn’t for a couple of years. It’s a technical issue at their end. Everybody things it would be ideal for the university not to have to rely on student volunteers to do this, which is why the Registrar’s Office is interested in providing the service. Nobody I know is opposed to it as a policy matter, but it does require some high-level policy decisions.

  5. Fun stuff to kick around, but the endogeneity and measurement concerns here are huge. I suggest:

    1) In your data, examine the correlation between “helpfulness” and chili pepper.

    2) I suspect it will be far from zero. Almost certainly quite positive. This could be because people perceive helpful people as being attractive. Or because they perceive attractive people as being more helpful. Or because helpful people really *are* more attractive. Or because both of the measures are mostly just getting at general affect toward a person (my hunch). Probably the same story for the correlation of “clarity” and attractiveness.

    3) Whatever the reason, as long as the correlation is nonzero, I think it really undermines confidence in your interpretation of the hotness variable. I can’t think of any a priori reason why genuinely helpful people would be more attractive than genuinely unhelpful people. Nor why genuinely clear people should be more attractive than genuinely unclear people. If anything, I think the psych literature would expect the opposite. (Attractive people tend to get a free pass, so have less incentive to be helpful.) So if a positive correlation shows up, it’s a good hint that “hotness” is measuring… something, but not genuine physical attractiveness. To the extent it, helpfulness, and clarity are all just measures of general affect, the figure becomes very difficult to interpret, because something that you’re presenting as an independent variable is really a dependent variable.

    • Tim, thanks for those observations. I agree about the endogeneity issues, which is why I don’t have anything other than speculations. I’d note, though, that “helpfulness” is almost perfectly correlated with professor quality, because the latter is simply the average of “clarity” and “helpfulness,” which are almost perfectly correlated with one another.

      • Right! Which is consistent with my guess that respondents here aren’t making much of a distinction between some of the questions. Maybe a few are, but I would guess that these questions (helpfulness and clarity, and perhaps hotness to an extent) are basically interpreted as, “Do you like this person?”

        I have another conjecture, which is that if a respondent hated the professor, he/she will almost never bestow a chili pepper. But surely there must be some attractive, but bad, professors? If we think there are, but they never appear in the data, it raises questions about whether we have valid measures.

  6. Here’s two hypotheses that jumped out at me:
    1) Hotness affects everything.
    2) Perceived hotness might reflect the attitude of the professor.

    Unfortunately, they are indistinguishable from a data perspective. Check what kind of correlation you get for clarity, helpfulness, etc. against an average hotness score (fraction of chili peppers might be a good metric on the x-axis). Perhaps, like in advertising, the students are misled by their own psychology and just rate a professor more favorably if they think the professor attractive. How does the p-value look if you see how every available variable is split by the yes-no chili pepper?

    If the correlation is as strong everywhere as that first graph suggests, then the original shouting professor is right; like the DOW reflecting the economy, the presence of a chili pepper summarizes the prof’s overall score.

    • Ack. I meant to say “just rate a professor more favorably *in every single category* if they think the professor attractive.”

  7. Pingback: The Missing Chili Pepper, Part 2 | Meaningful Competitions

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s