Mirjam Sophia Glessmer
From colleagues working with supervised machine learning, I learned that one of the classic machine learning fails involves sheep: one thinks one's trained the computer to recognise sheep, but what it's actually learned to do is recognise fields (green backgrounds). Since I didn't take any photos while out in the snow yesterday, I've dug out a photo of a Herdwick ewe grazing on the east side of Wasdale (Cumbria, UK) on a fine August evening in 2011.

Guest post by Kirsty Dunnett: “Odd feedback at year’s end and some New Years’ learning: what will peer review in 2025 and beyond look like”

Welcome to a new guest post by my most loyal guest blogger, Kirsty Dunnett, on odd experiences with peer review and generative AI which, in combination, provoke a lot of questions. Enter Kirsty:

I sometimes think I have had my share of peer-reviews that really should not have made their way past the editor. Examples include the first half of a review being written as ‘they have…’ and the second half being ‘he has…’ to being ‘Joe Blogsed’ as my supervisor described receiving what appears to have been a review characteristic of a particular individual in the research field (seriously, for someone facing moderate hostility in their immediate work environment, quitting would not have been an entirely unreasonable reaction). I have also had several cases where pragmatism seems to be completely lacking and demands made for work to conform to current standards made without consideration of the conditions under which it was performed, sometimes supposedly in the name of ‘best practices’ and ‘maintaining standards’.

But my latest experience is quite different. Some colleagues and I have a manuscript under review that has been taking rather longer than we would have hoped. The last round of reviews (the third) had ‘accept’ (with a few minor corrections) from two reviewers who had been evaluating the article from the first round, and a new reviewer that provided more critical comments. Considering that the two reviewers recommended acceptance, and the particular journal, the new reviewer was clearly being ‘pernickety’, but most of the suggestions were reasonable, so we acted on what we could, and explained where it was impractical. Six weeks later and we receive another email from the editor ‘reviewer A is still not satisfied’. This time I am less inclined to take the review at face value: two reviewers have recommended acceptance, the third has no significant objections, and the ones they had have been addressed: the manuscript is clearly ‘good enough’, especially given that most of the co-authors on the paper are BSc and MSc students.

So I did what I usually do with reviews: break it down into sentences and find the particular things to address. At which point, the unusual verbosity of the review was unhelpful, and finding the relevant information was hard. In doing so, it became apparent that there are no fundamental flaws in the manuscript and the reviewer was being unnecessarily nitpicky. Anyway, I wrote my reactions and replies and made the corresponding changes in the manuscript. But now things started to niggle. There was something off about the review. It was not the type of ‘difficult reviewer’ I had experienced before, and the more I read it and made changes to the manuscript, the more the review simply did not add up. It made no reference to the points raised in the previous round; there were technical errors (SoTL -> reflective practice, yes, SoTL = collaborative learning, inquiry-based learning, no); non-existent (but standard) headers were referred to; and the idea that ‘the referencing style follow[ing] the journal’s specific requirements will be important for the acceptance of your work’ was beyond strange. Moreover, beyond its verbosity, it was written too nicely, and certain statements (valuable insights, the work is insightful, the case study is compelling, potential to make a significant contribution) had already made me frown as being completely inconsistent with the demand for further changes, not to mention my own insider evaluation of the manuscript (we’re not writing for Nature or Science). Then I thought, what if…

What if this far too nice, inconsistent, pernickety reviewer was actually generative AI?

It would make sense. The inconsistency and generality of the comments, the overstatements and technical inaccuracies, the verbosity and over-politeness.

So I went to Google Scholar: ‘using generative AI for academic article review -literature -systematic’. Three of the first four results letters to editors in medical research journals [1-3] about the need for transparency around the use of AI in peer review processes. From these I arrived at a Nature news article [4] on two arXiv preprints [5, 6] (such a long time since I’ve read one of those!) on the topic. It turns out that three of my four niggle words — valuable, insightful, compelling — are among those used most disproportionately by ‘generative AI reviewers’ [5]. So I think I’ve ‘identified’ the review writer, though not quite to the extent of providing a name.

The question is what to do next? Do we resubmit with changes, say nothing about our suspicions, though it’s almost sure to come back with another long response where demands for unrealistic perfectionism are hidden under polite waffle? Do we refuse to make changes on the basis that we are pretty much certain that the review was not written by a human and that generative AI cannot take responsibility for anything? Do we do both: make the small changes of a sentence or so clarification and refuse on the large ones that add content, while remarking to the editor that we have very good cause to believe that the comments from reviewer A are written by generative AI? If we do the last, how much of the above discussion on what is problematic and indicative of generative AI do we include?

And barely more than two years ago, one thought that the major areas of ethical consideration in academic research were about informed consent, anonymity and authorship!


Featured image: From colleagues working with supervised machine learning, I (Kirsty) learned that one of the classic machine learning fails involves sheep: one thinks one’s trained the computer to recognise sheep, but what it’s actually learned to do is recognise fields (green backgrounds). Since I didn’t take any photos while out in the snow yesterday, I’ve dug out a photo of a Herdwick ewe grazing on the east side of Wasdale (Cumbria, UK) on a fine August evening in 2011.


[1] Cheng, K., Sun, Z., Liu, X., Wu, H. and Li, C., Generative artificial intelligence is infiltrating peer review process. Critical Care 28, 149 (2024). https://doi.org/10.1186/s13054-024-04933-z

[2] Wu, H., Sun, Z., Guo, Q., and Li, C., Applications of generative AI in peer review process: friend or foe? International Journal of Surgery 110 (9) 5921-5922 (2024). https://doi.org/10.1097/JS9.0000000000001689

[3] Wu, H., Li, W., Chen, X., and Li, C., Not just disclosure of generative artificial intelligence like ChatGPT in scientific writing: peer-review process also needs. International Journal of Surgery 110(9): 5845-5846 (2024). https://doi.org/10.1097/JS9.0000000000001619

[4] Chawla, D. S., Is ChatGPT corrupting peer review? Telltale words hint at AI use. Nature 628, 483-484 (2024) https://doi.org/10.1038/d41586-024-01051-2

[5] Liang, W., Izzo, Z., Zhang, Y., Lepp, H., Cao, H., Zhao, X., Chen, L., Ye, H., Liu, S., Huang, Z., McFarland, D. A. and Zou, J. Y., Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews. arXiv preprint: 2403.07183 (2024) https://arxiv.org/abs/2403.07183

[6] Gray, A., ChatGPT “contamination”: estimating the prevalence of LLMs in the scholarly literature, arXiv preprint: 2403.16887 (2024) https://arxiv.org/abs/2403.16887

 

Share this post via

Contact me!

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Search "Adventures in Teaching and Oceanography"

Archives