History in Organizations

History in Organizations

The Tech Stack

AI and peer review

Rather than moral panic, a bit of a reflection on how we might use AI to improve academic work

Stephanie Decker FAcSS FBAM's avatar
Stephanie Decker FAcSS FBAM
May 22, 2026
∙ Paid

Also, a cute picture of our dog. But first things first — a service announcement on free Friday posts.

Free posts:

  • The Guide to Historical Methods in Management, of course

  • The Vibecoding Historian

  • Is AI taking over Social Science Part 1 (Part 2 is mostly free as well)

  • Rohin’s great talk about his Organization Science article

  • Paula’s excellent post on research dissemination

  • Preparing the literature review

  • What is Social Entrepreneurship by Kerryn Krige

  • Historical Research for Management Studies

  • Marking in the Age of AI, Part 1


I have been meaning to write this post for some time. I’ve seen so many AI-related discussions (and tirades) in the last few weeks and months that I felt it was time to talk not just about the whole crisis narrative, but also how you can use AI in ways to make your work better — or at least avoid using it badly.

Because I think this is missing.

Because I think it is drowned out by quite a bit of misinformation and misdirection.

Because we are increasingly operating in a two-track research environment in which quantitative scholars are engaging with the opportunities for AI-enhanced research practices, while many, including leading qualitative scholars, are focused on stigmatising AI use in a black-and-white manner that precludes any conversation about how and where it might be beneficial.

I’ll be honest, I have started to get quite frustrated with the nature and direction of the debate.

Discussing AI at my institution

What really brought this home to me was a recent meeting about running an AI research day at my home institution. Each department nominated one person, and I was nominated by our research lead — which is interesting, given that our department is mixed quant-qual, leaning towards quant. The reps from two wholly quant departments (it’s a business school) were commended for their cutting-edge involvement with AI. I’ll be honest, I am barely even vaguely aware of what they do — not a reflection of what they do, rather, that it is unlikely to be relevant to my research practice.

But it made me reflect that I cannot think of a single qualitative scholar that a research director would sing such praises about… perhaps Stine Grodal and Henri Schildt, but beyond that?

That’s not a coincidence. I reflected that I would be concerned if anyone were to single me out or one of my co-authors for our cutting-edge use of AI. Because that would feel one step away from suspecting us of research misconduct…

Because the terms of engagement with AI are fundamentally different for qualitative and quantitative scholars, partly of our own making.

A while ago, I posted about the AI doomers and boomers, but since then, the debate has moved on, and not in a positive direction IMHO.

Most of this post will be free, but with some pointers and suggestions on how to best leverage AI in responsible and acceptable ways for research, especially qualitative aspects, for full subscribers only — because right now we are in the unproductive finger-pointing stage of technology adoption (I see a process model coming on), so frankly, get lost. You want to point fingers at me, you’ll have to pay me first.

The great peer review debate … and crisis

It was at the EGOS (European Group of Organization Studies) conference in Cagliari in 2023 that I attended a panel on the crisis of peer review. Nobody talked about AI at all at this panel — ChatGPT was still a glint in the undergraduate and postgraduate students’ eyes who hadn’t quite started to deliver surprisingly smooth prose that forced their academic assessors to actually engage their critical faculties while reading it to note how conceptually empty, and at times, outright confused, the underlying meaning of said prose was.

So, let’s recap:

Peer review before 2022/3 — crisis? Oh yes.

And yes, I definitely fall into the “AI has poured oil on the fire” quadrant that the Organization Science team has devised.

So Here's the Idea: The Organization Science Substack
More Versus Better, Part III
This is the third and final installment in a series of three essays from the Organization Science AI Task Force. Part I examined the rise in AI-generated submissions and Part II assessed the prevalence and content of AI-generated reviews. We now turn to the institutional incentives driving the trends we observed and to our thoughts on the peer review pr…
Read more
10 days ago · 10 likes · Claudine Gartenberg, Sharique Hasan, Alex Murray, Lamar Pierce, and Organization Science

The great peer review crisis of 2026 has been pointedly developed in a Substack post by Scott Cunningham, based on economics journals (now behind a paywall). The Organization Science team, using an unreliable AI detector to measure their construct of academic AI slop (construct clarity, anyone? No? How odd…), comes to a similar conclusion: increased submissions of poor quality. Of course, they know these detectors are unreliable, so they only use scores high enough to argue that the text must be AI-generated, with little to no human editing.

So Here's the Idea: The Organization Science Substack
More Versus Better, Part I
This is the first in a series of three essays based on findings from the Organization Science AI Task Force. The full paper, “More versus Better: Artificial Intelligence, Incentives, and the Emerging Crisis in Peer Review,” is available to download at the Organization Science website. Over the next two days, we will examine what is happening on the revi…
Read more
25 days ago · 43 likes · Lamar Pierce, Claudine Gartenberg, Alex Murray, and Sharique Hasan

Sensible on the face of it, but what does this measure? AI — I think not. Slop — for sure.

Why?

Because with a little setup, my AI tool of choice produces text that Pangram assesses as 100% human. (With medium confidence. No confidence intervals reported.)

What do AI detectors detect? And how to fool them

I like Pangram. Unlike other tools, it pulls back the curtain and shows how the sausage is made.

With other tools like Grammarly, I have run AI detection on fully generated text (anywhere between 20-70% AI identified) and then edited it substantially. A lot of the time, I might rewrite an entire paragraph, only for it still to show as AI-generated when it really wasn’t anymore. (Also, it identifies reference lists as AI-generated...)

So what are the tells? Probably everyone knows about the preponderance of “delve” and em dashes. Pangram allows you to upload text, and it identifies what elements are typically overused by AI:

  • The by now well-known “it is not this, it is that” construction. Occasionally impactful, but Hollis Robbins writes on her Substack why this is quite rude towards a knowledgeable reader.

  • Listing items in threes - now there’s a bit of an aha moment, given that all good things are supposed to come in threes. So if any listings are threes, make them twos or fours, and watch an AI detector switch from 100% AI to 100% human.

Why am I telling you this?

Because a halfway competent, reasonably curious human can game this with little effort. Indeed, you can use your auxiliary rent-a-brain to do it in a second pass over generated text as part of general or project-based instructions. What else are you paying a subscription fee for?

So, what is the construct of majorly AI-generated journal submissions actually measuring? People who do not know what they are doing and who are using AI to generate journal submissions.

Just to emphasise here, the construct does not measure AI use (or overuse) or, indeed, wholly AI-generated submissions. Personally, I do not think that current LLMs can generate an Org Sci-level paper without significant intervention. Not least because LLMs can’t jump* — generating novel theoretical insights requires abduction, and LLMs are bad at that.

Then again, many humans are bad at generating novel theoretical insights (something reviewers frequently throw at me, and everyone else, I suspect, as it is the equivalent of the Monopoly “Go to Jail” card).

But Scott Cunningham’s point here is that model capabilities are constantly improving, and that the top of the distribution will move closer to human-generated articles, making it more difficult to distinguish between them.

So here’s my idea…

Why try to distinguish at all? Why not accept that good scholarship, AI-assisted, is still good scholarship, and bad scholarship, with or without AI, is just that? Because then we are back in the familiar territory of the peer-review crisis and of papers overloading the system. (And the thorny issue of inconsistent human judgment, academic politics, etc., which are very much not an AI thing.)

Just faster.

Which is irrelevant, as we have known this for years and have manifestly failed to do anything about it. So this might actually be a good thing (a la Kustov). At the very least, it is not a fundamental change — just one you can finally no longer patch up by upping the ante on free academic labour.

Don’t get me wrong — I recently reviewed a piece in a top journal and was pretty convinced it was mostly AI. There was potentially one hallucinated reference in there, but I felt it could be defended (there seemed to be some formatting issues), and it would not have caused me to point fingers. Rather, it was the fundamental underdevelopment of ideas and the conceptual emptiness of a competently written piece.

The problem isn’t AI. It’s the intersection of poorly developed research with the glossy competence of AI prose.

I did not upload the piece to an AI detector, which would have been inappropriate.

I did not upload the piece to my AI tool to produce a review, which would have been inappropriate.

And neither should anyone else. Two wrongs don’t make a right.

(Also, AI tools are notoriously too nice to crappy academic articles.)

What about “review by AI”

Apparently, students generating their assignments with AI are unwilling to be assessed by AI. Can’t say I am surprised. Because both decisions are unwise.

Let me tell you an anecdote.

Some years ago, say, in 2023, we submitted a paper to a major management journal for a special issue. We received two very short reviews, along with a non-existent editorial letter, from an editor with impressive shared editorial and research experience. I’ve had significantly better reviews and letters from 2* journals than from this 4.

I discussed it with my co-author, and since I was in faculty training on the new AI tools (2023, remember), I decided to try one of these tools and asked it to review our paper (after checking with my co-author). The review was way better: more substantive and more detailed. That was my first time using Claude, and I’ve stuck with it ever since.

These days, the tools are better and my instructions more precise. I don’t take all the points at face value; they prompt further thought and refinement. Not accepting an AI review may be a significant mistake. Not because it is always right, but because it can improve your research and your thinking.

As I am working on papers for submission, I have uploaded a more recent version to Claude and asked for an academic review for the specific target journal. What I got back was well-reasoned, detailed, and really pushed me to address some of the weaknesses of the piece and take strategic decisions on risky bits. My next step is to trial Refine.ink, as I've heard positive things about it. That’s mostly from quants, though. Trying out the platform, I thought it looked promising. By all means, check it out:

Refine.ink

Where are we? We should not judge scholarship by its AI content. We should not judge the quality of a review based on whether it is AI-generated. If you are good at using AI, you can use it in helpful ways. Constantly stigmatising AI use means fewer of us dare to try it, and those who know how to use it well and responsibly will not share how to do so.

This is not developmental — these are gatekeeping dynamics.

Because judgment and evaluation are human, and we should not outsource it to a simple heuristic: neither to have an AI-system replace human judgment, nor reduce human judgment to a denial of anything AI.

So what about AI and peer review?

I know that the above leaves a lot unanswered:

  • What to do if you suspect your reviewer used AI to review your article?

  • Should you use AI to help with your reviews?

I don’t promise all the answers, but I have a few suggestions after the paywall ;-)

User's avatar

Continue reading this post for free, courtesy of Stephanie Decker FAcSS FBAM.

Or purchase a paid subscription.
© 2026 Stephanie Decker · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture