Mirjam Sophia Glessmer

Currently reading Nazaretsky et al. (2025): “Can students judge like experts? A large-scale study on the pedagogical quality of AI and human personalized formative feedback”

Ignoring for now that an important aspect of a teacher giving feedback to students is what it signals about the relationship — that the teacher cares enough about the student and their learning to invest time and energy into understanding the students’ thoughts and figuring out how to help them improve — how good a job can AI do to replace the teacher on other aspects of feedback?


In their study “Can students judge like experts? A large-scale study on the pedagogical quality of AI and human personalized formative feedback” Nazaretsky et al. (2025) generated AI feedback on authentic tasks with a simple script: “You are an excellent instructor teaching a course called [COURSE NAME]. You gave the students the following assignment: [ASSIGNMENT]. The student submission was [STUDENT ANSWER]. The correct solution is [SOLUTION]. Please evaluate the student’s answer and provide elaborated formative feedback. Please follow the following instructions: The feedback should be addressed directly to the student as is. It should be no more than [NUMBER] lines. Please provide one sentence of the overall evaluation at the end” (which they describe as “we intentionally used a rather generic prompt to mimic the real situation of students and TAs, who usually have no background in machine learning and pedagogy, interacting with GenAI tools“).

They asked almost 500 STEM students to evaluate human and AI feedback generated with that script based on a rubric. The rubric assesses cognitive aspects in the categories “Task Current State”, “Task Next Steps”, metacognitive aspects in “Strategy Next Steps”, “Self-regulated Next Steps”, motivational aspects with “Praise”, and lastly red flags with “Incorrectness”, “Ambiguity”.

They found that AI and human feedback were of comparable pedagogical quality (and both human and AI feedback often did not contain any metacognitive feedback on what the next steps should be both in strategy and in self-regulation). Students could recognize low-quality feedback given by humans, but they were less critical when the feedback came from AI and judged its quality based on as how credible they perceived AI, not based on actual measures of quality of that specific feedback.

Acknowledging that providing good feedback takes a lot of time and requires both subject- and pedagogical expertise, Nazaretsky et al. (2025) recommend an approach to feedback where a TA drafts the formative feedback, gets feedback on the feedback from AI, the TA revises after reflecting on the feedback and then delivers it to the students. They stress that “Significantly, this iterative process not only enhances the quality of feedback received by students but also supports the professional development of TAs by helping them reflect on and improve their future pedagogical practices through interaction with AI-generated suggestions” (but then TAs are also just human, and in this case human learners — why should they feel differently about AI feedback, and react to it differently than students do? Aren’t we just shifting the problem now?). Nazaretsky et al. (2025) also have a recommendation “to mitigate the source-credibility biases in practice“, and that is to work on AI literacy and on how to work with AI in general, so that eventually feedback is judged based on its pedagogical quality and usefulness rather than on whether it is human or AI generated.

If, as mentioned in the beginning, we can ignore the relational function of a teacher providing feedback (and I think that is a big IF), then these suggestions make sense. And of course teachers can explain to students how they work with GenAI and why, and I think transparency can go a long way in making sure that the relationship is maintained while the teacher uses GenAI to produce better feedback in less time. At the same time, I think it’s a pretty slippery slope where a teacher might start out with the best intentions but then slowly hand over more and more responsibility to GenAI with less and less oversight. And it might take a while before they notice that themselves and can course-correct, and that might be where relationships get hurt. But I guess all we can do is try to be transparent with what we are going and super careful on the slippery slope…


Nazaretsky, T., Gabbay, H., & Käser, T. (2025). Can students judge like experts? A large-scale study on the pedagogical quality of AI and human personalized formative feedback. Computers and Education: Artificial Intelligence, 100533.

Leave a Reply

    Share this post via

    Contact me!

    Adventures in Oceanography and Teaching © 2013-2026 by Mirjam Sophia Glessmer is licensed under CC BY-NC-ND 4.0

    Search "Adventures in Teaching and Oceanography"

    Archives