Using student evaluations of teaching to actually improve teaching (based on Roxå et al., 2021)

There are a lot of problems with student evaluations of teaching, especially when they are used as a tool without reflecting on what they can and cannot be used for. Heffernan (2021) finds them to be sexist, racist, prejudiced and biased (my summary of Heffernan (2021) here). There are many more factors that influence whether or not students “like” courses, for example whether they have prior interested in the topic — Uttl et al. (2013) investigate the interest in a quantitative vs non-quantitative course at a psychology department and find a difference in interest of nearly six standard deviations! Even the weather on the day a questionnaire is submitted (Braga et al., 2014), or the “availability of cookies during course sessions” (Hessler et al., 2018) can influence student assessment of teaching. So it is not surprising that in a meta-analysis, Uttl et al. (2017) find “no significant correlations between the [student evaluations of teaching] ratings and learning” and they conclude that “institutions focused on student learning and career success may want to abandon [student evaluation of teaching] ratings as a measure of faculty’s teaching effectiveness”.

But just because student evaluations of teaching might not be a good tool for summative assessment of quality, especially when used out of context, that does not mean they can’t be a useful tool for formative purposes. Roxå et al. (2021) argue that the problem is not the data in itself, but the way it is used, and suggest using them — as academics do every day with all kinds of data — as basis for a critical discourse, as a tool to drive improvement of teaching. They suggest also changing the terminology from “student rating of teaching” to “course evaluations”, to move the focus away from pretending to be able to measure quality of teaching, towards focussing on improving teaching.

In that 2021 article, Roxå et al. present different way to think about course evaluations, supported by a case study from the Faculty of Engineering at Lund University (LTH; which is where I work now! :-)). At LTH, the credo is that “more and better conversations” will lead to better results — in the context of the Roxå et al. (2021) article meaning that more and better conversations between students and teachers will lead to better learning. “Better” conversations are deliberate, evidence-based and informed by literature.

At LTH, the backbone for those more and better conversations are standardised course evaluations run at the end of every course. The evaluations are done using a standard tool, the “course experience questionnaire”, which focusses on the elements of teaching and learning that students can evaluate: their own experiences, for example if they perceived goals as clearly defined, or if help was provided. It is LTH policy that results of those surveys cannot influence career progressions; however, a critical reflection on the results is expected, and a structured discussion format has been established to support this:

The results from those surveys are compiled into a working report that includes the statistics and any free-text comments that an independent student deemed appropriate. This report is discussed in a 30-45 min lunch meeting between the teacher, two students, and the program coordinator. Students are recruited and trained specifically for their role in those meetings by the student union.

After the meeting and informed by it, each of the three parties independently writes a response to the student ratings, including which next steps should be taken. These three responses together with the statistics then form the official report that is being shared with all students from the class.

The discourse and reflection that is kick-started with the course evaluations, structured discussions and reporting is taken further by pedagogical trainings. At LTH, 200 hours of training are required for employment or within the first 2 years, and all courses include creating a written artefact (and often this needs to be discussed with critical friends from participants’ departments before submission) with the purpose of make arguments about teaching and learning public in a scholarly report, contributing to institutional learning. LTH also rewards excellence in teaching, which is not measured by results of evaluations, but the developments that can be documented based on scholarly engagement with teaching, as evidenced for example by critical reflection of evaluation results.

At LTH, the combination of carefully choosing an instrument to measure student experiences, and then applying it, and using the data, in a deliberate manner has led to a consistent increase of student evaluations of the last decades. Of course, formative feedback happening throughout the courses pretty much all the time will also have contributed. This is something I am wondering about right now, actually: What is the influence of, say, consistently done “continue, start, stop” feedbacks as compared to the formalized surveys and discussions around them? My gut feeling is that those tiny, incremental changes will sum up over time and I am actually curious if there is a way to separate their influence to understand their impact. But that won’t happen in this blogpost, and it also doesn’t matter very much: it shouldn’t be an “either, or”, but an “and”!

What do you think? How are you using course evaluations and formative feedback?


Braga, M., Paccagnella, M., & Pellizzari, M. (2014). Evaluating students’ evaluations of professors. Economics of Education Review, 41, 71-88.

Heffernan, T. (2021). Sexism, racism, prejudice, and bias: a literature review and synthesis of research surrounding student evaluations of courses and teaching. Assessment & Evaluation in Higher Education, 1-11.

Hessler, M., Pöpping, D. M., Hollstein, H., Ohlenburg, H., Arnemann, P. H., Massoth, C., … & Wenk, M. (2018). Availability of cookies during an academic course session affects evaluation of teaching. Medical Education, 52(10), 1064-1072.

Roxå, T., Ahmad, A., Barrington, J., Van Maaren, J., & Cassidy, R. (2021). Reconceptualizing student ratings of teaching to support quality discourse on student learning: a systems perspective. Higher Education, 83(1), 35-55.

Uttl, B., White, C. A., & Morin, A. (2013). The numbers tell it all: students don’t like numbers!. PloS one, 8(12), e83443.

Uttl, B., White, C. A., & Gonzalez, D. W. (2017). Meta-analysis of faculty’s teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation, 54, 22-42.

Leave a Reply