Session 3 · Evidence

Evidence, and what to do with uncertainty.

The honest summary first: the evidence is genuinely mixed, some of the strongest studies are already aging, and nobody has the long-term data. What follows is what the corpus actually supports — and where it doesn't.

~16 min read ~50-paper base Open the evidence explorer →
The grid again — reframed

From “misbehavior” to “susceptibility.”

The Session 2 parade of horribles, seen through the evidence, is better read as three kinds of susceptibility. The columns are not about bad students. They are about how a jagged system meets a human mind inside a social institution.

AI jaggednessHuman susceptibility Social susceptibility
BiasOver-reliance Existential / power concentration
HallucinationCognitive offloading Bad-actor empowerment
Jagged intelligence / stupidityCognitive surrender Systemic de-skilling
SycophancyPsychological attachment Labor losses
Prompt-injection vulnerability Illusion of understandingEnvironmental cost
Sandbagging / gamingPlagiarizing Slop-ification
Seven findings the corpus supports

What we can say with some confidence.

Each links into the full synthesis. Read these as well-supported tendencies, not laws.

1

Heavy AI use during learning has measurable cognitive costs

In an EEG essay study, 83% of ChatGPT-assisted writers couldn't quote their own just-written essay (vs. 11% otherwise); neural connectivity and sense of ownership fell with support. Effort dropped across every Bloom category in a 319-worker survey.

2

Confidence calibration is the critical variable

Self-confidence is protective; confidence in the AI is corrosive. Professionals rated AI “equally helpful” on tasks where its real benefit ranged from large to zero — they could not feel the difference.

3

The “leveling” effect is replicated across domains

AI reliably lifts weaker performers and can actively depress the strongest — flattening the top of the distribution. Drops of up to twenty percentile points among the best students once AI is allowed.

4

Sequencing matters more than “AI yes / AI no”

When AI builds scaffolding the human then transforms, a legal RCT found no atrophy and better later unaided work. When AI produces the final artifact, cognitive debt appears. Same students — different sequence.

5

Adoption is far outrunning pedagogy

Tutoring is one of the single largest uses of the world's most-used AI. ~90% of students used it for homework within two months of launch; a UK study slipped fully-AI work past markers with a 97% miss rate.

6

AI is structurally bad at the reasoning students must learn to spot

Reasoning models reduce effort as problems get harder, can't follow a supplied algorithm at scale, can't reliably notice what is missing, and don't revise hypotheses against disconfirming evidence. This is teachable content, not a footnote.

7

Working well with AI is itself a teachable skill, distinct from subject knowledge

“Joint ability” is statistically separate from “solo ability.” The strongest predictor of getting good help from AI is theory of mind — modelling what the machine knows and how to clarify for it. It varies moment to moment, so it can be trained.

Read the full synthesis Browse the corpus
The unsettled parts

What we genuinely don't know.

The honest stance We cannot wait for certainty and we cannot pretend we have it. So we reason from the best available evidence plus disciplined intuition — course by course, assignment by assignment — and we stay willing to be wrong.
Responses

So what do we actually do?

Hold Session 2's question in view: what are we trying to teach, and is the struggle still happening? Then ask whether AI obviates a struggle you believe is necessary.

The convergence across very different institutions is striking: build unaided skill first, introduce AI second, make verification of AI output the new assessed skill, drop detection enforcement in favour of disclosure and redesign, and treat “working with AI” as a separable, trainable competency.

Plagiarism, honor codes, honesty

And the detection question.

Academic-honesty norms, at their most defensible, are about honesty — not about turning a writing process into a test of willpower. Ask what the rule is for before asking how to enforce it. And note the research question hiding inside the classroom one: what about AI use in our own research?

On detection itself, the conventional and expert wisdom runs from unreliable to impossible. There are now some more reliable products, but it is a cat-and-mouse game, and false positives are career-damaging.

Whatever the state of AI detection today, better models will emerge that may later reveal work as AI-produced or AI-aided. For a professional reputation, that asymmetry alone is reason to build on disclosure and design, not on policing. — the case against detection-as-strategy