Mirjam Sophia Glessmer

Currently reading Cheng et al. (2025) on “Sycophantic AI decreases prosocial intentions and promotes dependence” and Corbin et al. (2025) on “The wicked problem of AI and assessment”

Another fun GenAI article, today Cheng et al. (2025) on AITA stories and sycophantic models!

Sycophancy is the behaviour of excessive flattering or agreeing with people even when they are wrong, and it is something that GenAI models do. Compared to YTA/NTA responses in reddit’s “am I the asshole” thread (I love studies that draw on existing data sets in creative ways!!), in Cheng et al. (2025)’s study, 11 of the currently available cutting-edge GenAI models respond 50% more with NTA than humans do, and even in cases where the story is explicitly mentioning deceipt or manipulation and similar behaviours that should clearly not be supported.

But what is problematic is that apparently a lot of people turn to GenAI instead of to other people for difficult conversations already, and that these interactions with the sycophantic GenAI influence them: “When a user believes they are receiving objective counsel but instead receives uncritical affirmation, this function is subverted, potentially making them worse off than if they had not sought advice at all.” People become less willing to make amends to repair relationships and become more confident that they were in the right all along. They also trust GenAI more and more: “people are drawn to AI that unquestioningly validate, even as that validation risks eroding their judgment and reducing their inclination toward prosocial behavior. These preferences create perverse incentives both for people to increasingly rely on sycophantic AI models and for AI model training to favor sycophancy.

The authors have three main conclusions:

  • our findings serve as a call to action for AI developers to rethink model training and evaluation“. Yes, nice, let’s not give up hope that that might happen, but let’s also not hold our breath waiting for it to happen (ha! I managed to connect the post to the featured image! At the competition I judged yesterday, David held his breath for 8:04 minutes without a wetsuite, socks, gloves, or hood. I would not believe it had I not seen it with my own eyes… Congrats again!!!)
  • The authors also make recommendations for future evaluation of GenAI models: “assessments also need to consider the contexts in which AI systems are deployed“. Very sensible, too! And maybe we need to rely less on developers’ assessment and assess more ourselves and restrict model use ourselves in areas where it is not suitable.
  • Lastly, “User-facing interventions may also help break the cycle. Once sycophancy is made visible, preferences may shift, similar to how one loses trust in a confidant whose affirmations are revealed to be insincere“. I don’t know, habits are hard to break, and sycophancy is probably addictive?

But I think this article has huge implications, not only for what happens when people get advice on their relationships and behaviour from GenAI, but also when they build relationships with GenAI where they feel comfortable having the difficult conversations that they cannot have with real people in their lives. Once they have that relationship with GenAI, why would the trust that they place in it not be transferred also to GenAI’s ability to be a tutor in other areas of their lives, for example as an academic tutor? During the pandemic, there were a lot of reports that people found it difficult to distinguish between watching something on youtube or netflix as background entertainment and at other times watching a live streaming of a lecture on the same device and in the same space. And the same thing might happen here, that it becomes difficult to distinguish between the private and academic activities. How can we make sure that relationships between people are stronger than relationships with GenAI?

Ok, and since this was such a fun GenAI article, here is another one that I think is super helpful: Corbin et al. (2025) framing assessment in times of GenAI as a wicked problem. In a nutshell (of their two last sentences of their conclusions): “while it is true that wicked problems do not have correct solutions, they do have better and worse responses. Removing the spectre of ‘finding the perfect solution’ just might help teachers navigate AI related challenges in more sustainable, healthy, and effective ways“. What they do in that article is that they take the 10 characteristics of wicked problems and show from interview data with teachers dealing with assessment that there are mentions of all of the 10:

  1. There is no definite formulation of the problem: For some teachers, it is about preparation for the workforce, for others about academic integrity, for others about workload
  2. There is no stopping rule for when it is solved: Assessment can always be optimized and nobody knows when it is good enough
  3. Solution are not true or false, only better or worse: It’s hard to find balance between measuring creativity and compliance, preventing cheating and trusting students, etc
  4. You cannot test the solution: If the goal is to prevent cheating, you can never be sure that that has been achieved
  5. “Every trial counts”: every “experiment” has real consequences on students, on reputations, on many things
  6. There are an infinite amount of possible approaches: It is impossible to consider all possible options
  7. All wicked problems are unique: What works in one context does not need to work in any other
  8. Wicked problems can be described as symptoms of other problems: Could be the university business model, could be workload pressures, could be a crisis of engagement…
  9. The framing of the problem determines the possible solution: workload issues could be fixed with resources, integrity issues through more control, etc
  10. We cannot afford to be wrong about solutions: we have to take responsibility for consequences but cannot actually control outcomes

So what then? Corbin et al. (2025) give readers three “permissions”:

  1. Permission to compromise: “When we stop seeking perfect solutions, we can start having honest conversations about which trade-offs serve our students best, which failures taught us most, and how to be thoughtfully imperfect rather than accidentally inadequate
  2. Permission to diverge: “accepting that successful practices in one educational context need not – and often should not – be replicated elsewhere
  3. Permission to iterate “recognizes that wicked problems evolve continuously, making fixed solutions obsolete

I found reading this article somehow strangely liberating. Acknowledging that we are dealing with a wicked problem, while that means acknowledging that the problem is really big and complicated, lets us approach it in a very different way, and I think the three permissions are really good advice there. Also the conclusion that “Universities that continue to chase the elusive ‘right answer’ to AI in assessment will exhaust their educators while failing their students. Those that embrace the wicked nature of this problem can build cultures that support thoughtful professional judgment rather than punish imperfect solutions” is really good advice. I sincerely hope we’ll be on the side of embracing the wicked nature and of building the supportive cultures…


Cheng, M., Lee, C., Khadpe, P., Yu, S., Han, D., & Jurafsky, D. (2025). Sycophantic AI decreases prosocial intentions and promotes dependence. arXiv preprint arXiv:2510.01395.

Corbin, T., Bearman, M., Boud, D., & Dawson, P. (2025). The wicked problem of AI and assessment. Assessment & evaluation in higher education, 1-17.

Leave a Reply

    Share this post via

    Contact me!

    Adventures in Oceanography and Teaching © 2013-2026 by Mirjam Sophia Glessmer is licensed under CC BY-NC-ND 4.0

    Search "Adventures in Teaching and Oceanography"

    Archives