Session 3 · Evidence, and What to Do with Uncertainty

The grid again — reframed

From “misbehavior” to “susceptibility.”

The Session 2 parade of horribles, seen through the evidence, is better read as three kinds of susceptibility. The columns are not about bad students. They are about how a jagged system meets a human mind inside a social institution.

AI jaggedness	Human susceptibility	Social susceptibility
Bias	Over-reliance	Existential / power concentration
Hallucination	Cognitive offloading	Bad-actor empowerment
Jagged intelligence / stupidity	Cognitive surrender	Systemic de-skilling
Sycophancy	Psychological attachment	Labor losses
Prompt-injection vulnerability	Illusion of understanding	Environmental cost
Sandbagging / gaming	Plagiarizing	Slop-ification

Seven findings the corpus supports

What we can say with some confidence.

Each links into the full synthesis. Read these as well-supported tendencies, not laws.

1

Heavy AI use during learning has measurable cognitive costs

In an EEG essay study, 83% of ChatGPT-assisted writers couldn't quote their own just-written essay (vs. 11% otherwise); neural connectivity and sense of ownership fell with support. Effort dropped across every Bloom category in a 319-worker survey.

2

Confidence calibration is the critical variable

Self-confidence is protective; confidence in the AI is corrosive. Professionals rated AI “equally helpful” on tasks where its real benefit ranged from large to zero — they could not feel the difference.

3

The “leveling” effect is replicated across domains

AI reliably lifts weaker performers and can actively depress the strongest — flattening the top of the distribution. Drops of up to twenty percentile points among the best students once AI is allowed.

4

Sequencing matters more than “AI yes / AI no”

When AI builds scaffolding the human then transforms, a legal RCT found no atrophy and better later unaided work. When AI produces the final artifact, cognitive debt appears. Same students — different sequence.

5

Adoption is far outrunning pedagogy

Tutoring is one of the single largest uses of the world's most-used AI. ~90% of students used it for homework within two months of launch; a UK study slipped fully-AI work past markers with a 97% miss rate.

6

AI is structurally bad at the reasoning students must learn to spot

Reasoning models reduce effort as problems get harder, can't follow a supplied algorithm at scale, can't reliably notice what is missing, and don't revise hypotheses against disconfirming evidence. This is teachable content, not a footnote.

7

Working well with AI is itself a teachable skill, distinct from subject knowledge

“Joint ability” is statistically separate from “solo ability.” The strongest predictor of getting good help from AI is theory of mind — modelling what the machine knows and how to clarify for it. It varies moment to moment, so it can be trained.

Read the full synthesis Browse the corpus

The unsettled parts

What we genuinely don't know.

Is there durable skill atrophy? An EEG study says yes within four months of essay-writing; a legal RCT over a similar window says no, when AI is used for scaffolding. This is the most consequential open question.
Equity. Well-resourced students may get transformative assignments while under-resourced students get automation. No study sizes the gap.
Vendor entanglement. There is an active pipeline from AI companies into classrooms. No paper rigorously studies its effects.
Are the older studies already obsolete? Several landmark results predate today's default “reasoning” models, which fail in different ways. Replication is overdue.

The honest stance We cannot wait for certainty and we cannot pretend we have it. So we reason from the best available evidence plus disciplined intuition — course by course, assignment by assignment — and we stay willing to be wrong.

Responses

So what do we actually do?

Hold Session 2's question in view: what are we trying to teach, and is the struggle still happening? Then ask whether AI obviates a struggle you believe is necessary.

No struggle threatened? No problem. Don't invent a rule to protect a difficulty that wasn't doing any teaching.
Struggle threatened, but AI could enhance it? Structure the assignment and guide the students — sequence unaided work first, make AI build scaffolding the human must transform, and assess the verification.
Struggle threatened and AI dissolves it? Use rules, structural prohibition, or an analog environment — and make honesty a norm, not an act of willpower.

The convergence across very different institutions is striking: build unaided skill first, introduce AI second, make verification of AI output the new assessed skill, drop detection enforcement in favour of disclosure and redesign, and treat “working with AI” as a separable, trainable competency.

Plagiarism, honor codes, honesty

And the detection question.

Academic-honesty norms, at their most defensible, are about honesty — not about turning a writing process into a test of willpower. Ask what the rule is for before asking how to enforce it. And note the research question hiding inside the classroom one: what about AI use in our own research?

On detection itself, the conventional and expert wisdom runs from unreliable to impossible. There are now some more reliable products, but it is a cat-and-mouse game, and false positives are career-damaging.

Whatever the state of AI detection today, better models will emerge that may later reveal work as AI-produced or AI-aided. For a professional reputation, that asymmetry alone is reason to build on disclosure and design, not on policing. — the case against detection-as-strategy

Next: design courses and research programs → Go deep on the evidence