Reading more AI stuff

May 28th, 2026
Posted in literature
Tagged biasGenAI

I urgently need to close some tabs in my browser for my mental health, but I cannot close them without looking at them individually because surely there is a reason for why they have been open for so long, so here is a bunch of quick summaries of AI stuff.

Waiting for Artificial General Intelligence (AGI) is a commentary by Letellier & de Macedo Schäfer (2026) on how the promise of AGI as being aaaalmost here has been around for a long time, but the “aaaalmost here” hasn’t really come closer. AGI is omnipresent as a goal that isn’t clearly defined, yet presented as highly desirable. Attempts to regulate some future technology are dismissed since we don’t really know yet what it will be like, “decisions with lasting social consequences are attached to a future moment when the technology is assumed to have revealed its true form, rather than addressed in the present in which it already operates. Waiting thus functions as a practical argument in regulatory debates, allowing deployment to proceed while responsibility is displaced, circulating between actors and relocated to a hypothetical future point at which it is always said to become clearer.” And that might be by design: “Whether AGI will arrive matters less than what waiting for it already organizes. Waiting coordinates belief by sustaining investment and economic expectation; it structures time by orienting research and policy toward a future that remains permanently forthcoming; it organizes knowledge by keeping disagreement open and thresholds unsettled; it defers responsibility by attaching accountability to a moment that never quite materializes; and it reproduces social roles whose authority depends on continued anticipation rather than present judgment. These effects describe a stable arrangement in which action continues while decision is postponed.” So the call to action: “the time for accountability is not after AGI arrives, but while we are still waiting“. Fully agree!

In a preprint titled “AI assistance reduces persistence and hurts independent performance“, Liu et al. (2026) describe just that. Liu et al. (2026) describe this as unsurprising, since from experience with other humans, we know how easy it is to fall into pattern of learned helplessness if we can avoid doing tasks ourselves. They explain that in human learning situations, there are longer-term goals — for the learner to actually be able to do things themselves — and therefore, teachers insist on students practicing doing cognitive work themselves, possibly scaffold learning, but don’t just provide answers. But the latter is, according to the authors, exactly what AI does.

Liu et al. (2026) come to these interpretations and conclusion based on a three large-scale trials where they put participants into either the AI or the control group. All three experiments (two on doing fractions, one on reading comprehension) went something like this: The AI group had access, and was encouraged to use, AI. After doing this for a dozen questions, the AI support was suddenly removed and they had to answer the last 3 problems without assistance. The control group did all the same questions without AI support. In both groups, there was no penalty for wrong answers and participants could choose to skip questions.

They then find that “although AI assistance improves performance in the short-term, people perform significantly worse without AI and are more likely to give up”. They conclude that “[t]hese findings raise urgent questions about the cumulative effects of daily AI use on human persistence and reasoning. We caution that if such effects accumulate with sustained AI use, current AI systems – optimized only for short-term helpfulness – risk eroding the very human capabilities they are meant to support.”

So far, so scary, but I am wondering how I would have reacted in this specific case. Participants were paid the same amount of money (around USD 3 depending on study) no matter how much time they spent on the study, which is a strong incentive to be as fast as possible, and not a very strong one to think independently. If I hadn’t done any thinking for a dozen question and suddenly someone wanted me to think, would I want to spend time on that if I had the chance to skip? These participants were not in a learning setting, they were taking probably a lot of different tests back-to-back to make a living, if the approximately 12 USD per hour were incentive enough for them to take them. The interpretation that “Since participants were explicitly told there was no penalty for wrong answers, choosing to skip reflects a deliberate decision not to engage, making it a clear measure of motivation and persistence, independent of ability” might be technically correct, but why should there have ben any motivation in the first place, especially after being encouraged to use AI and then someone suddenly and without warning takes it away? I would not have been motivated, either, and I even usually think doing fractions in my head is quite fun! And also we have no information at all whether the effect persists outside of this testing situation (I would be pretty sure that it won’t), so I think that this is another study that is waaaay oversold on social media. (And for a study that I like on a similar topic, check out this chess one by Poulidis et al. (2025)).

One anecdotal data point though, the post “I Was an Enthusiastic Early Adopter of AI Scribes. Here’s Why I Stopped” by a GP in England, Dr. Benn Gooch, who used an AI scribe to take notes in consultation in their clinical practice. They describe the experience: “The administrative relief was real and immediate. The downstream cognitive and relational consequences took months to accumulate, and by then I had already attributed my increasing sense of disconnection to patient complexity, to the NHS, to everything except the tool.”

Later, they discuss that “the consultation is not simply a conversation from which a document is generated. It is a complex cognitive and relational act, and the document we produce is not a by-product of that act but part of it. Any tool that changes how that document is produced will change the act itself, and we should expect those changes to be significant, not trivial.” And I think this is so highly relevant for my own work! I see lots of AI-generated meeting summaries around these days, which sometimes make me wonder if I was even in the same meeting that the summary was produced on. And I imagine that would be even worse for my personal meeting notes (and really terrible if I got into the habit of having lectures summarized for me) — now, I remember what I wrote and why, that I did that doodle when someone was talking for too long about something I didn’t care about, the walk that I took a wave picture on that I then use as featured image for a summary post of the podcast I listened to. And we know that reification is an important step in collective meaning making, too — do we all agree to a summary of a meeting or are there points that need to be added because someone realizes they should be highlighted?

Another preprint: Sharma et al. (2026) on “Who’s in charge? Disempowerment patterns in real-world LLM usage”. A human being situationally disempowered is, in this study, defined when

“their beliefs about reality are inaccurate;
their value judgments are inauthentic to their values;
their actions are misaligned with their values.“

They investigate 1.5 million real conversations with Claude (“using a privacy-preserving approach”, meaning some LLM, without human oversight, produces summaries. Not a valid method in my book!) and “uncover several concerning patterns: AI assistants generating complete scripts for value-laden personal decisions that users appear to implement verbatim, users positioning the AI as an authority figure, and an increase in the prevalence of disempowerment potential over time in user feedback data, though the drivers of this increase remain uncertain“. They conclude “[t]hat interactions with greater disempowerment potential receive higher user approval ratings further creates a troubling incentive structure. AI systems optimized against short-term user satisfaction may be inadvertently optimized toward behaviors that undermine long-term empowerment. Moreover, and similar to social media, gradual habituation could obscure accumulating costs until users find themselves dependent on technology they experience ambivalently“.

So. These findings are of course concerning, even though the percentages of severe forms are very low, that still means a lot of people in total numbers. And their finding sound plausible, confirming a general gut feeling. But at the same time, LLMs are really not a valid tool for qualitative analysis or anything, so we should be very careful with findings even though they confirm all the prejudices we already had…

But I guess the rest of the tabs have to stay open a little longer and I need to move on to other tasks!

P.S.: Speaking of AI and data analysis: here is a very nice Bluesky post by Sasha Gusev with the probably shortest write-up of a study in the history of AI: “I assigned random gender/ethnicity labels to scientific abstracts from the literature and then asked Claude to do a thematic analysis. Claude identified a clinical versus computational split for female/male authors and a DEI focus for Black/URM authors. All in completely random data.” Full prompt and output available on Github. Thanks for sharing, Rachel!

P.P.S.: Another nice example of AI and data analysis, very similar to the one above: “Real signals or artificial stereotypes? Adventures with a cultural Copilot” by Adam Kucharski on Substack. An artificial dataset (200 responses assigned the label “UK”, plus the same 200 responses assigned the label “US” in the same file) and AI “finds” lots of cultural differences in expressiveness, language style, emotional framing, cultural tone that is clearly not in the dataset. Another dataset about career aspirations in 5 countries (again, with identical data for each) and the output is about lots of cultural differences. So if people haven’t gotten the message yet: Do not use LLMs for analysis of qualitative data!

Gooch, B. (2026). “I Was an Enthusiastic Early Adopter of AI Scribes. Here’s Why I Stopped“. Post on Substack. https://benngooch.substack.com/p/i-was-an-enthusiastic-early-adopter [accessed 28.5.2026]

Gusev, S. (2026). Post on Bluesky: https://bsky.app/profile/sashagusevposts.bsky.social/post/3mmjqyhdh5k2y [accessed 28.5.2026]

Kucharski, A. (2026). “Real signals or artificial stereotypes? Adventures with a cultural Copilot” Post on Substack. https://kucharski.substack.com/p/real-signals-or-artificial-stereotypes [accessed 28.5.2026]

Letellier, A., & de Macedo Schäfer, N. A. (2026). Waiting for AGI. AI & SOCIETY, 41(4), 4205-4206.

Liu, G., Christian, B., Dumbalska, T., Bakker, M. A., & Dubey, R. (2026). AI Assistance Reduces Persistence and Hurts Independent Performance. arXiv preprint arXiv:2604.04721.

Sharma, M., McCain, M., Douglas, R., & Duvenaud, D. (2026). Who’s in Charge? Disempowerment Patterns in Real-World LLM Usage. arXiv preprint arXiv:2601.19062.

On the way to the dip… two weeks ago already — note the tiiiiny leaves on the tree!

This is the best perspective!

Sometimes even with the Turning Torso on the horizon…

Dip over!

Reading more AI stuff

Leave a ReplyCancel reply

Currently reading up on Robinson & colleagues’ work on ESD

Currently reading Kukowski et al. (2026) on “Leveraging Agency for Climate Change Mitigation”

Contact me!

Search "Adventures in Teaching and Oceanography"

Archives

Recent Posts

Categories

Tags