Category Archives: Feedback

#TeachingTuesday: Student feedback and how to interpret it in order to improve teaching

Student feedback has become a fixture in higher education. But even though it is important to hear student voices when evaluating teaching and thinking of ways to improve it, students aren’t perfect judges of what type of teaching leads to the most learning, so their feedback should not be taken onboard without critical reflection. In fact, there are many studies that investigate specific biases that show up in student evaluations of teaching. So in order to use student feedback to improve teaching (both on the individual level when we consider changing aspects of our classes based on student feedback, as well as at an institutional level when evaluating teachers for personnel decisions), we need to be aware of the biases that student evaluations of teaching come with.

While student satisfaction may contribute to teaching effectiveness, it is not itself teaching effectiveness. Students may be satisfied or dissatisfied with courses for reasons unrelated to learning outcomes – and not in the instructor’s control (e.g., the instructor’s gender).
Boring et al. (2016)

What student evaluations of teaching tell us

In the following, I am not presenting a coherent theory (and if you know of one please point me to it!), these are snippets of current literature on student evaluations of teaching, many of which I found referenced in this annotated literature review on student evaluations of teaching by Eva (2018). The aim of my blogpost is not to provide a comprehensive literature review, rather than pointing out that there is a huge body of literature that teachers and higher ed administrators should know exists somewhere out there, that they can draw upon when in doubt (and ideally even when not in doubt ;-)).

6 second videos are enough to predict teacher evaluations

This is quite scary, so I thought it made sense to start out with this study. Ambady and Rosenthal (1993) found that silent videos shorter than 30 seconds, in some case as short as 6 seconds, significantly predicted global end-of-semester student evaluations of teachers. These are videos that do not even include a sound track. Let this sink in…

Student responses to questions of “effectiveness” do not measure teaching effectiveness

And let’s get this out of the way right away: When students are asked to judge teaching effectiveness, that answer does not measure actual teaching effectiveness.

Stark and Freishtat (2014) give “an evaluation of course evaluations”. They conclude that student evaluations of teaching, though providing valuable information about students’ experiences, do not measure teaching effictiveness. Instead, ratings are even negatively associated with direct measures of teaching effectiveness and are influenced by gender, ethnicity and attractiveness of the instructor.

Uttl et al. (2017) conducted a meta-analysis of faculty’s teaching effectiveness and found that “student evaluation of teaching ratings and student learning are not related”. They state that “institutions focused on student learning and career success may want to abandon [student evaluation of teaching] ratings as a measure of faculty’s teaching effectiveness”.

Students have their own ideas of what constitutes good teaching

Nasser-Abu Alhija (2017) showed that out of five dimensions of teaching (goals to be achieved, long-term student development, teaching methods and characteristics, relationships with students, and assessment), students viewed the assessment dimension as most important and the long-term student development dimension as least important. To students, the grades that instructors assigned and the methods they used to do this were the main aspects in judging good teaching and good instructors. Which is fair enough — after all, good grades help students in the short term — but that’s also not what we usually think of when we think of “good teaching”.

Students learn less from teachers they rate highly

Kornell and Hausman (2016) review recent studies and report that when learning is measured at the end of the respective course, the “best” teachers got the highest ratings, i.e. the ones where the students felt that they had learned the most (which is congruent with Nasser-Abu Alhija (2017)’s findings of what students value in teaching). But when learning was measured during later courses, i.e. when meaningful deep learning was considered, other teachers seem to have more effective. Introducing desirable difficulties is thus good for learning, but bad for student ratings.

Appearances can be deceiving

Carpenter et al. (2013) compared a fluent video (instructor standing upright, maintaining eye contact, speaking fluidly without notes) and a disfluent video (instructor slumping, looking away, speaking haltingly with notes). They found that even though the amount of learning that took place when students watched either of the videos wasn’t influenced by the lecturer’s fluency or lack thereof, the disfluent lecturer was rated lower than the fluent lecturer.

The authors note that “Although fluency did not significantly affect test performance in the present study, it is possible that fluent presentations usually accompany high-quality content. Furthermore, disfluent presentations might indirectly impair learning by encouraging mind wandering, reduced class attendance, and a decrease in the perceived importance of the topic.”

Student expect more support from their female professors

When students rate teachers effectiveness, they do that based on their assumption of how effective a teacher should be, and it turns out that they have different expectations depending on the gender of their teachers. El-Alayi et al. (2018) found that “female professors experience more work demands and special favour requests, particularly from academically entitled students”. This was both true when male and female faculty reported on their experiences, as well as when students were asked what their expectations of fictional male and female teachers were. 

Student teaching evaluations punish female teachers

Boring (2017) found that even when learning outcomes were the same for students in courses taught by male and female teachers, female teachers received worse ratings than male teachers. This got even worse when teachers didn’t act in accordance to the stereotypes associated with their gender.

MacNell et al. (2015) found that believing that an instructor was female (in a study of online teaching where male and female names were sometimes assigned according to the actual gender of the teacher and sometimes not) was sufficient to rate that person lower than an instructor that was believed (correctly or not) to be male.

White male students challenge women of color’s authority, teaching competency, and scholarly expertise, as well as offering subtle and not so subtle threats to their persons and their careers

This title was drawn from the abstract of Pittman (2010)’s article that I unfortunately didn’t have access to, but thought an important enough point to include anyway.

There are very many more studies on race, and especially women of color, in teaching contexts, which all show that they are facing a really unfair uphill battle.

Students will punish a percieved accent

Rubin and Smith (1990) investigated “effects of accent, ethnicity, and lecture topic on undergraduates’ perceptions of nonnative English-speaking teaching assistants” in North America and found that 40% of undergraduates avoid classes instructed by nonnative English-speaking teaching assistants, even though the actual accentedness of teaching assistants did not actually influence student learning outcomes. Nevertheless, students judged teaching assistants they perceived as speaking with a strong accent as poorer teachers.

Similarly, Sanchez and Khan (2016) found that “presence of an instructor accent […] does not impact learning, but does cause learners to rate the instructor as less effective”.

Student will rate minorities differently

Ewing et al. (2003) report that lecturers that were identified as gay or lesbian received lower teaching ratings than other lecturers with undisclosed sexual orientation when they, according to other measures, were perfoming very well. Poor teaching performance was, however, rated more positively, possibly to avoid discriminating against openly gay or lesbian lecturers.

Students will punish age

Stonebraker and Stone (2015) find that “age does affect teaching effectiveness, at least as perceived by students. Age has a negative impact on student ratings of faculty members that is robust across genders, groups of academic disciplines and types of institutions”. Apparently, when it comes to students, from your mid-40ies on, you aren’t an effective teacher any more (unless you are still “hot” and “easy”).

Student evaluations are sensitive to student’s gender and grade expectation

Boring et al. (2016) find that “[student evaluation of teaching] are more sensitive to students’ gender bias and grade expectations than they are to teaching effectiveness.

What can we learn from student evaluations then?

Pay attention to student comments but understand their limitations. Students typically are not well situated to evaluate pedagogy.
Stark and Freishtat (2014)

Does all of the above mean that student evaluations are biased in so many ways that we can’t actually learn anything from them? I do think that there are things that should not be done on the basis of student evaluations (e.g. rank teacher performance), and I do think that most times, student evaluations of teaching should be taken with a pinch of salt. But there are still ways in which the information gathered is useful.

Even though student satisfaction is not the same as teaching effectiveness, it might still be desirable to know how satisfied students are with specific aspects of a course. And especially open formats like for example the “continue, start, stop” method are great for gaining a new perspective on the classes we teach and potentially gaining fresh ideas of how to change things up.

Also tracking ones own evaluation over time is helpful since — apart from aging — other changes are hopefully intentional and can thus tell us something about our own development, at least assuming that different student cohorts evaluate teaching performance in a similar way. Also getting student feedback at a later date might be helpful, sometimes students only realize later which teachers they learnt from the most or what methods were actually helpful rather than just annoying.

A measure that doesn’t come directly from student evaluations of teaching but that I find very important to track is student success in later courses. Especially when that isn’t measured in a single grade, but when instructors come together and discuss how students are doing in tasks that build on previous courses. Having a well-designed curriculum and a very good idea of what ideas translate from one class to the next is obviously very important.

It is also important to keep in mind that, as Stark and Freishtat (2014) point out, statistical methods are only valid if there are enough responses to actually do statistics on them. So don’t take very few horrible comments to heart and ignore the whole bunch of people who are gushing about how awesome your teaching is!

P.S.: If you are an administrator or on an evaluation committee and would like to use student evaluations of teaching, the article by Linse (2017) might be helpful. They give specific advice on how to use student evaluations both in decision making as well as when talking to the teachers whose evaluations ended up on your desk.


Ambady, N., & Rosenthal, R. (1993). Half a minute: Predicting teacher evaluations from thin slices of nonverbal behavior and physical attractiveness. Journal of Personality and Social Psychology, 64(3), 431–441.

Boring, A. (2017). Gender biases in student evaluations of teachers. Journal of Public Economics, 145(13), 27–41.

Boring, A., Dial, U. M. R., Ottoboni, K., & Stark, P. B. (2016). Student evaluations of teaching (mostly) do not measure teaching effectiveness. ScienceOpen Research, (January), 1–36.

Carpenter, S. K., Wilford, M. M., Kornell, N., & Mullaney, K. M. (2013). Appearances can be deceiving: Instructor fluency increases perceptions of learning without increasing actual learning. Psychonomic Bulletin & Review, 20(6), 1350–1356.

El-Alayi, A., Hansen-Brown, A. A., & Ceynar, M. (2018). Dancing backward in high heels: Female professors experience more work demands and special favour requests, particularly from academically entitled students. Sex Roles.

Eva, N. (2018), Annotated literature review: student evaluations of teaching (SET),

Ewing, V. L., Stukas, A. A. J., & Sheehan, E. P. (2003). Student prejudice against gay male and lesbian lecturers. Journal of Social Psychology, 143(5), 569–579.

Kornell, N. & Hausman, H. (2016). Do the Best Teachers Get the Best Ratings? Front. Psychol. 7:570.

Linse, A. R. (2017). Interpreting and using student ratings data: Guidance for faculty serving as administrators and on evaluation committees. Studies in Educational Evaluation, 54, 94- 106.

MacNell, L., Driscoll, A., & Hunt, A. N. (2015). What’s in a name: Exposing gender bias in student ratings of teaching. Innovative Higher Education, 40(4), 291– 303.

Nasser-Abu Alhija, F. (2017). Teaching in higher education: Good teaching through students’ lens. Studies in Educational Evaluation, 54, 4-12.

Pittman, C. T. (2010). Race and Gender Oppression in the Classroom: The Experiences of Women Faculty of Color with White Male Students. Teaching Sociology, 38(3), 183–196.

Rubin, D. L., & Smith, K. A. (1990). Effects of accent, ethnicity, and lecture topic on undergraduates’ perceptions of nonnative English-speaking teaching assistants. International Journal of Intercultural Relations, 14, 337–353.

Sanchez, C. A., & Khan, S. (2016). Instructor accents in online education and their effect on learning and attitudes. Journal of Computer Assisted Learning, 32, 494–502.

Stark, P. B., & Freishtat, R. (2014). An Evaluation of Course Evaluations. ScienceOpen, 1–26.

Stonebraker, R. J., & Stone, G. S. (2015). Too old to teach? The effect of age on college and university professors. Research in Higher Education, 56(8), 793–812.

Uttl, B., White, C. A., & Gonzalez, D. W. (2017). Meta-analysis of faculty’s teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation, 54, 22-42.

Assessing participation

One example of how to give grades for participation.

One of the most difficult tasks as a teacher is to actually assess how much people have learned, along with give them a grade – a single number or letter (depending on where you are) that supposedly tells you all about how much they have learnt.

Ultimately, what assessment makes sense depends on your learning goals. But still it is sometimes useful to have a couple of methods at hand for when you might need them.

Today I want to talk about a pet peeve of mine: Assessing participation. I don’t think this is necessarily a useful measure at all, but I’ve taught courses where it was a required part of the final grade.

I’ve been through all the classical ways of assessing participation. Giving a grade for participation from memory (even if you take notes right after class) opens you up to all kinds of problems. Your memory might not be as good as you thougt it was. Some people say more memorable stuff than others, or in a more memorable way. Some people are just louder and more foreward than others. No matter how objective you are (or attempt to be) – you always end up with complaints and there is just no way to convince people (including yourself) that the grades you end up giving are fair.

An alternative approach.

So what could you do instead? One method I have read about somewhere (but cannot find the original paper any more! But similar ideas are described in Maryellen Weimer’s article “Is it time to rethink how we grade participation“) is to set a number of “good” comments or questions that students should ask per day or week. Say, if a student asks 3 good questions or makes 3 good comments, this translates to a very good grade (or a maximum number of bonus points, depending on your system). 2 comments or questions still give a good grade (or some bonus points), 1 or less are worth less. But here is the deal: Students keep track of what they say and write it down after they’ve said it. At the end of the lesson, the day, the week or whatever period you chose, the hand you a list of their three very best questions or comments. So people who said more than three things are required to limit themselves to what they think were their three best remarks.

The very clear advantage is that

  • you are now looking for quality over quantity (depending on the class size, you will need to adjust the number of comments / questions you ideally want per person). This means people who always talk but don’t really say anything might not stop, but at least they aren’t encouraged to talk even more since they will have to find a certain number of substantial contributions to write down in the end rather than make sure they have the most air time.
  • you don’t have to rely on your memory alone. Sure, when you read the comments and questions you will still need to recall whether that was actually said during class or made up afterwards, but at least you have a written document to jog your memory.
  • you have written documentation of what they contributed, so if someone wants to argue about the quality of their remarks, you can do that based on what they wrote down rather than what they think they might have meant when they said something that they recall differently from you.
  • you can choose to (and then, of course, announce!) to let people also include other contributions on their lists, like very good questions they asked you in private, or emailed you about. Or extra projects they did on the side.

I guess in the end we need to remember that the main motive for grading participation is to enhance student engagement with the course content. And the more different ways we give them to engage – and receive credit for it – the more they are actually going to do it. Plus maybe they are already doing it and we just never knew?

Giving feedback on student writing

When feedback is more confusing than helpful.

The other day I came across a blog post on Teaching & Learning in Higher Ed. on responding to student writing/writers by P. T. Corrigan. And one point of that post struck home, and that point is on contradictory teacher feedback.

When I am asked to provide feedback on my peers’ writing, I always ask them about what stage in the writing process they are in and what kind of feedback do they want. Are they in the copy-editing stage and want me to check for spelling and commas, or is this a first draft and they are still open for input on the way their thoughts are organized, or even on the arguments they are making? If a thesis is to be printed that same evening, I am not going to suggest major restructuring of the document. If we are talking about a first draft, I might mark a typo that catches my eye, but I won’t focus on finding every single typo in the document.

But when we give feedback to students, we often give them all the different kinds of feedback at once, leaving them to sort through the feedback and likely sending contradictory messages in the process. Marking all the tiny details that could, and maybe should, be modified suggests that changes to the text are on a polishing level. When we suggest a completely different structure at the same time, chances are that rather than re-writing, students will just move existing blocks of text, assuming that since we provided feedback on a typo-level, those blocks of text are in their final, polished form already when that might not be how we perceive the text.

Thinking about this now, I realize that the feedback I give on student writing does not only need to be tailored to the specific purpose much better, it also needs to come with more meta information about what aspect of the writing my focus is on at that point in time. Only giving feedback on the structure without pointing out grammatical mistakes only sends the right message when it is made clear that the focus, right now, is only on the structure of the document. Similarly, students need to understand that copy-editing will usually not improve the bigger framing of the document and only focus on layout and typo-type corrections.

We’ve intuitively been doing a lot of this pretty well already. But go read Corrigan’s blog post and the literature he links to – it’s certainly worth a read!

Five finger feedback

At my new job the quality management team regularly offers workshops that the whole team attends. One detail has repeatedly come up and I want to present it here, too. It is a new-to-me method to ask for specific feedback: The five finger method.

For each finger of the hand, a specific question needs to be addressed. Many of the fingers are easy to remember if you imagine gestures that would include that finger, and/or the meaning that that finger carries in our culture.
1) The thumb. What went well?
2) The index finger. What could be improved?
3) The middle finger. What went wrong? Negative feedback.
4) The ring finger. What would we like to keep?
5) The pinkie finger. What did not get enough attention?
This method is certainly not suited for groups a lot larger than a dozen or so participants, especially not if everybody were asked to say something for every single finger (which we didn’t have to). But for a small group, I found it really helpful to have the visual reminder of the kind of feedback we were being asked to give, and to go through it in the order that was presented by just counting down the fingers on your hand.

Continue. Stop. Start.

Quick feedback tool for your teaching, giving you concrete examples of what students would like you to continue, start or stop

This is another great tool to get feedback on your classes. In contrast to the “fun” vs “learning” graph which gives you a cloud of “generally people seem to be happy and to have learned something”, this tool gives you much more concrete ideas of what you should continue, stop and start doing. Basically what you do is this: You hand out sheets of paper with the three columns and ask students to give you as many details as possible for each.

“Continue” is where students list everything that you do during your lectures that helps them learn and understand and that they think you should continue doing. Here students (of classes I teach! Obviously all these examples are highly dependent on the course) typically list things like that you are giving good presentations, ask whether they have questions, are available for questions outside of the lecture, are approachable, do fun experiments, let them discuss in class, that kind of thing.

“Stop” are things that hinder students learning (or sometimes things that they find annoying, like homework or being asked to present something in class, but usually students are pretty good about realizing that, even though annoying, those things might actually be helpful). Here students might list if you have an annoying habit, or if you always say things like “as everybody knows, …” when they don’t actually know but are now too shy to say so. Students will also give you feedback on techniques that you like using but they don’t think are appropriate for their level/group, or anything else they think is counterproductive.

“Start” are suggestions what you might want to add to your repertoire. I have recently been asked to give a quick overview over next lesson’s topics at the end of the lecture which makes perfect sense! But again, depending what you do in your course already you might be asked to start very different things.

In addition to help you teach better, this feedback is also really important for students, because it makes them reflect about how they learn as an individual and how their learning might be improved. And if they realize that they aren’t getting what they need from the instructor, at least they know now what they need and can go find it somewhere else if the instructor doesn’t change his/her teaching to meet that need.

When designing the questionnaire for this, you could also make very broad suggestions of topics that might be mentioned if you feel like that might spark students’ ideas (like for example, presentations, textbooks, assignments, activities, social interactions, methods, discussions, quizzes, …) but be aware that giving these examples means that you are more likely to get feedback on the suggested topics and less likely that students will bring up topics that you yourself had not considered.

On “fun” vs “learning”

Quick feedback tool, giving you an impression of the students’ perception of fun vs learning of a specific part of your course.

Getting feedback on your teaching and their learning from a group of students is very hard. There are tons of elaborate methods out there, but there is one very simple tool that I find gives me a quick overview: The “fun” vs “learning” graph.

This particular example is from last year’s GEOF130 “introduction to oceanography”, when we did the first in-class experiment (which I will do with this year’s class next week, so stay tuned!). Since the group was quite big for an oceanography class at my university (36 students) and I wanted to get a better feel of how each of them perceived their learning through experiments than what I would have gotten by just observing and asking a couple of questions, I asked them to anonymously put a cross on the graph where they feel they were located in the “fun” vs “learning” space after this experiment. And this is the result:


A “fun” vs “learning” graph filled in by students of the GEOF130 course in 2012 in response to an experiment that they conducted in pairs during a lecture.

Of course this is not a sufficient tool to evaluate a whole semester or course, but I can really recommend it for a quick overview!