The Generative Turn in the Classroom
The release of ChatGPT in November 2022 triggered the most acute crisis of academic integrity that higher education had faced in decades, and arguably ever. Within weeks of its public launch, students at universities worldwide were submitting AI-generated essays, problem sets, and examination responses. Within months, faculty were reporting encounters with suspiciously fluent, suspiciously confident, and suspiciously similar student submissions across every discipline. Within a year, the institutional responses — from blanket bans to wholesale revision of assessment practices — varied so widely that they revealed not just disagreement about tactics but fundamental uncertainty about what education was for and what credentials were supposed to certify.
Stokel-Walker (2023), writing in Nature, captured the initial disorientation of academic institutions confronting what ChatGPT actually was: not a better search engine or a more capable calculator, but a system that could produce, on demand, a competent essay on almost any topic in almost any academic register, without any of the cognitive struggle that the essay assignment was designed to provoke. The question that immediately arose — "should professors worry?" — quickly revealed itself as the wrong question. The right question was: what are assignments for, what do they certify, and what happens to those purposes when the work can be trivially delegated to a machine?
Rudolph, Tan, and Tan (2023), in one of the first academic papers to directly examine ChatGPT's implications for higher education assessment, characterised the dilemma bluntly. The standard essay and written assignment formats that constitute the majority of undergraduate assessment were designed to measure a combination of knowledge, analytical skill, argumentation, and written communication — all of which could now be produced, at least superficially, by a sufficiently capable language model. The credential issued on the basis of such assessments would, if AI assistance was not controlled, certify the student's access to AI tools rather than their own competencies. Whether that credential still meant what institutions claimed it meant was a question that could not be deferred.
Cotton, Cotton, and Shipway (2024) provided a systematic review of the academic integrity implications of ChatGPT and related tools, finding that the challenge was qualitatively different from previous generations of academic fraud technology. Contract cheating — the purchase of essays from human essay mills — had been a serious and growing problem before ChatGPT, with estimates suggesting that the essay mill industry was generating hundreds of millions of dollars annually in some markets. AI-generated content differed from contracted essays in three important ways: it was effectively free, available to any student with internet access; it was instantaneous; and it was, in its most capable forms, extremely difficult to distinguish from competent student work by reading alone. These differences of cost, speed, and detectability changed the scale of the challenge categorically rather than incrementally.
Perkins (2023) examined how different disciplinary contexts within higher education were affected differently by the arrival of capable generative AI. Disciplines that relied heavily on text production — humanities, social sciences, business — faced the most acute integrity challenges, since essay-based assessment was their primary mechanism. STEM disciplines faced somewhat different challenges: AI tools could assist with coding, with mathematical problem-solving, and with the written components of lab reports, but the authentic demonstration of technical competence through extended problem sets or laboratory work remained harder to fully delegate. Professional disciplines — law, medicine, teacher education — faced questions not only about integrity in assessment but about the fitness-for-purpose of practitioners who had been trained in ways that AI had rendered obsolete.
UNESCO's 2023 guidance for generative AI in education and research attempted to provide a framework for institutional response, emphasising the importance of updating pedagogical approaches rather than simply banning tools, the need to address equity implications of differential AI access, and the requirement for human oversight of AI use in ways that maintain educational values. The guidance was useful as a framework but acknowledged that implementation would require institution-by-institution judgment about disciplinary norms, student populations, and assessment purposes that no single document could prescribe.
Academic Integrity in the Age of ChatGPT
Academic integrity frameworks — the policies, norms, and enforcement mechanisms through which universities define and respond to dishonesty — were built on assumptions that AI fundamentally challenged. The International Center for Academic Integrity's definition of academic integrity rested on values including honesty, trust, fairness, respect, responsibility, and courage. The application of these values to AI use required resolving questions that the definition did not address: Was it dishonest to use an AI tool to improve a draft if the policy did not prohibit it? Was it a violation of trust to use AI for brainstorming if the final text was genuinely the student's own? Was it fair that some students had access to premium AI tools while others did not?
Institutional responses ranged from explicit prohibition to explicit endorsement, with a wide range of ambiguous middle positions. Some institutions updated their academic integrity policies to define unauthorised AI use as a form of academic misconduct equivalent to plagiarism. Others revised their policies to permit or even encourage AI use, with requirements for disclosure and reflection on how AI had been used. Many institutions — the majority, in most surveys of faculty practice — operated in a state of policy ambiguity, with individual faculty setting their own course-level expectations in the absence of institutional guidance, creating inconsistency that disadvantaged students who were uncertain about norms.
The credential integrity problem was not purely hypothetical. A medical student who used AI to complete clinical case analyses without developing genuine diagnostic reasoning; a law student who used AI to write legal briefs without developing genuine legal analysis skills; a teacher education student who used AI to write lesson plans without developing genuine instructional design capability — all would hold credentials that implicitly certified competencies they might not actually possess. The downstream consequences — for patients, for clients, for schoolchildren — of credentialing gaps produced by AI-assisted fraud were serious and difficult to dismiss.
The question of what constituted "use" versus "assistance" versus "fraud" was genuinely difficult to resolve cleanly. Using a spell-checker was universally accepted; using a grammar checker was almost universally accepted; using a tool that improved sentence structure and clarity was widely accepted; using a tool that generated content from scratch was, in most institutional policies, a violation. But the line between these uses was not crisp. Grammarly's "rewrite" features, available for years before ChatGPT, could substantially reshape prose while leaving the student as the nominal author. The novelty of ChatGPT was one of degree rather than kind — but the degree was sufficient to change the practical question from "did this student write this?" to "is there any meaningful sense in which this work is this student's?"
Floridi and colleagues (2020), developing an ethical framework for AI, identified autonomy as one of the core values that AI systems should support rather than undermine. Applied to education, this framing was clarifying: AI use in assessment was educationally harmful to the extent that it substituted for, rather than augmented, the development of student autonomy — the capacity to identify problems, formulate questions, gather evidence, reason through uncertainty, and produce conclusions independently. Assessment formats that could be fully delegated to AI were, by this framing, revealing themselves to be poor instruments for evaluating the capacities education was supposed to develop.
The sociological dimension of the academic integrity crisis was not limited to individual student dishonesty. Institutional incentives shaped the response in ways that were not always educationally optimal. Universities dependent on tuition revenue had structural reasons to avoid rigorous enforcement of integrity policies that might result in student dismissals or degree revocations. Reputational incentives similarly discouraged public acknowledgment of integrity problems at scale. The result was that institutional responses were often more performative than substantive — policy updates that signalled concern without creating the infrastructure for meaningful enforcement.
Detection Tools and Their Failures
The market for AI detection tools — software designed to identify AI-generated text — expanded rapidly after the release of ChatGPT, driven by faculty demand for technical solutions to the integrity problem. Tools including Turnitin's AI writing detection feature, GPTZero, Originality.ai, and several others attracted significant institutional adoption. The performance of these tools, evaluated against independent test sets rather than vendor-provided benchmarks, revealed substantial limitations that institutional adopters were frequently not adequately informed about.
The core technical challenge was that AI-generated text and high-quality human writing shared many statistical properties. Language models trained to produce fluent, coherent, contextually appropriate text necessarily produced text that was statistically similar to the fluent, coherent, contextually appropriate text in their training data — which was primarily written by humans. Detection tools attempting to identify AI-generated text based on statistical properties of the text itself were therefore operating on a signal that was inherently noisy, and whose noise level increased as AI models became more capable.
Independent evaluations found false positive rates — cases where genuine student writing was incorrectly flagged as AI-generated — that were practically significant and systematically skewed. Research by Stanford and others found that non-native English speakers were disproportionately flagged as AI-generated writers, because the characteristics that detection models associated with AI output — simplified vocabulary, clear sentence structure, conventional phrasing — also characterised the writing of competent but not native English writers producing academic text. A student who had learned English as a second language and had developed a careful, accessible academic writing style was at significantly greater risk of wrongful accusation than a native speaker writing in a more idiomatic, varied register.
Turnitin, whose AI detection feature was integrated into the widely-used plagiarism detection software already deployed at many institutions, acknowledged in its documentation that its detection capability came with false positive risks and should not be used as sole evidence for academic misconduct proceedings. This caveat was reasonable but frequently not reflected in institutional practice. Cases documented by student advocacy organisations described students being accused of AI use solely on the basis of Turnitin AI detection scores, without the additional evidence that the tool's documentation indicated was necessary.
The detectability of AI-generated text was further undermined by the ease with which it could be modified to evade detection. "Humanisation" tools — products specifically designed to alter AI-generated text to defeat detection algorithms — proliferated in the same period. Paraphrasing an AI-generated text, adding specific examples from personal experience, adjusting sentence variety, and introducing deliberate minor errors could all reduce AI detection scores substantially. A student motivated to cheat and aware of these techniques faced essentially no effective technical barrier.
Cotton, Cotton, and Shipway (2024) argued, on the basis of this evidence, that the pursuit of detection-based solutions to AI academic fraud was strategically misguided — a technological arms race in which the fundamental advantage lay with the generator rather than the detector, and in which the collateral damage of false positives fell disproportionately on vulnerable students. The more productive direction, they argued, was redesigning assessment rather than trying to police existing assessment formats against a technology that made them indefensible.
The Widening Digital Divide
The narrative around AI and education has often assumed, implicitly, a world in which all students have equivalent access to AI tools — in which the relevant policy question is how institutions should respond to a technology that is universally available. This assumption was inaccurate when ChatGPT was released and became more so as AI tools diversified into free and premium tiers with significantly different capabilities.
Noble (2018), examining algorithmic systems and racial inequality, demonstrated how technical systems embedded in unequal social structures tended to reproduce and amplify existing inequalities rather than transcending them. The AI access divide in education followed this pattern. Students at wealthy institutions, in wealthy countries, with high-bandwidth internet access and the disposable income to pay for premium AI subscriptions, had access to substantially more capable AI assistance than students in resource-constrained environments. The quality of AI assistance available through free tiers was meaningful but materially inferior to that available through paid subscriptions.
The World Bank's 2022 World Development Report documented the extent of the digital divide in educational contexts globally. In low- and middle-income countries, a substantial proportion of students lacked reliable internet access, appropriate hardware, and the electricity supply to power devices consistently. Remote and rural students in high-income countries faced similar constraints. The COVID-19 pandemic, which forced abrupt transitions to online learning, had revealed the depth of these inequalities in ways that generated policy attention but did not produce commensurate policy response.
Selwyn (2019), in his examination of digital sociology in educational contexts, drew attention to how digital technologies in education had consistently reproduced rather than disrupted social stratification. The pattern was not one of technology causing inequality — the causation ran in the opposite direction, with existing social inequality determining how technological benefits were distributed. AI tools in education appeared to follow this pattern. Students with greater social capital, more educated parents, access to tutoring, and experience navigating academic expectations were better positioned to use AI tools effectively — to use them as genuine cognitive tools rather than as crutches — than students without those advantages.
The credential implications of differential AI access were serious. If assessments became implicitly competitions in AI-assisted performance, the credential would certify, in part, the quality of AI assistance available to the student rather than the student's own capabilities. This would represent a new mechanism for reproducing existing social inequality through educational credentialing — adding to, rather than replacing, existing mechanisms including differential access to tutoring, to preparatory schools, and to the social capital that facilitates navigation of educational institutions.
Language access was a related dimension of the digital divide in AI-enabled education. The most capable AI language models were primarily trained on English-language text, and their performance in other languages varied significantly — generally declining with decreasing representation of those languages in training data. Students working in less-resourced language environments had access to AI tools of qualitatively lower capability, compounding other forms of educational disadvantage. This was not a temporary developmental problem that would be resolved as AI improved; it was a structural consequence of the global distribution of digital text production.
Surveillance Technologies in Education
Institutional responses to academic integrity challenges increasingly involved not only assessment redesign and policy revision but surveillance — deploying technology to monitor students in ways designed to deter or detect dishonest behaviour. The COVID-19 pandemic dramatically accelerated this development, as institutions requiring remote examinations adopted automated proctoring tools at scale and with minimal deliberation about their implications.
Proctorio, one of the leading automated proctoring platforms, attracted sustained controversy through 2020–2022 that was extensively documented by the Electronic Frontier Foundation (EFF). The platform's standard feature set included continuous video recording of the student and their screen, eye-tracking analysis flagging gaze deviations as potential cheating indicators, keystroke analysis, and algorithmic flagging of "suspicious" behaviour for human review. Academic research and student advocacy documentation identified multiple serious problems: the systems produced false positives at rates that resulted in significant numbers of innocent students being flagged and subjected to disciplinary proceedings; the systems performed poorly for students with disabilities including attention disorders, motor disabilities, and visual impairments; and the collection of biometric data — faces, eye movements, keystroke patterns — raised privacy concerns that institutions had not adequately considered before deploying the tools.
Proctorio's response to critical coverage was notably aggressive. The company sued a university researcher who publicly criticised its product, and engaged in legal action against the EFF for publishing its documentation. These responses, independent of their legal merits, illustrated a concerning dynamic: the companies marketing surveillance tools to educational institutions had commercial interests in suppressing critical research and discourse about their products' limitations and harms. Academic institutions, as customers, were poorly positioned to independently evaluate claims made by vendors whose tools they had already contractually deployed.
The broader category of educational surveillance extended beyond examination proctoring. Learning management systems collected detailed data on student behaviour — login times, document access patterns, discussion contributions, assignment submission timing — that institutions used to identify "at-risk" students for intervention. Student information systems tracked academic performance, attendance, financial aid status, and housing circumstances in databases that were shared across institutional departments and, in some cases, with third-party vendors. Campus networks monitored network traffic in ways that captured information about students' digital activities beyond their academic work.
Floridi and colleagues (2020) argued that the right to privacy was a component of human dignity and a precondition for personal autonomy. Applied to the educational context, this framing highlights a tension between the institutional interest in monitoring student behaviour and the student's interest in a space for intellectual and personal development that is not comprehensively observed. Education has historically understood itself as creating conditions for cognitive risk-taking — the willingness to attempt difficult tasks, to make and learn from errors, to develop positions that might be wrong. Comprehensive surveillance is inconsistent with the psychological conditions for that kind of risk-taking.
What Assessment Means When AI Can Answer Everything
The disruption of standard assessment formats by AI was not an accident or an oversight. It was the predictable consequence of a disjunction between what assessments were formally designed to measure and what they were actually capable of measuring. Most written assessments — essays, problem sets, case analyses, literature reviews — were designed to measure knowledge and analytical capability by proxy: if a student could produce a sufficiently competent discussion of a topic, they probably understood it. The proxy worked when producing the discussion required genuine engagement with the material. When AI could produce the discussion without genuine engagement, the proxy broke.
The deeper question this revealed was whether the written product was ever an adequate proxy for the capability it purported to measure, or whether it had always been an imperfect proxy whose limitations had simply not been exposed because the technology needed to exploit those limitations did not exist. Research on deep learning versus surface learning in higher education had, for decades before AI, suggested that many students could perform adequately on standard assessments without developing the durable conceptual understanding and transferable skills that were the stated objectives of their education. AI made this gap visible rather than creating it.
The distinction that assessment theorists drew between "performance on a task" and "evidence of capability" became critically important in the AI era. A performance on a task that could be delegated to AI was not evidence of the capability the task was designed to assess. The validity of the assessment — whether it actually measured what it claimed to measure — was undermined not by dishonest students but by the existence of a tool that could produce the required performance without the required capability.
Several alternative assessment formats attracted attention as more robust to AI assistance: oral examinations, which required real-time synthesis and communication; practical demonstrations, which required the physical performance of skills; reflective portfolios documenting a learning process over time; in-class handwritten assessments without access to digital tools; collaborative projects assessed through contribution analysis and process documentation. Each of these formats had genuine advantages and genuine disadvantages in terms of equity, scalability, accessibility, and the specific capabilities they could assess. None was universally applicable across disciplines and student populations.
Perkins (2023) noted that assessment reform in response to AI risked creating new forms of inequity if not carefully designed. In-person oral examinations, for example, favoured students with strong verbal communication skills and high comfort with interpersonal evaluation — skills that were themselves unevenly distributed across student populations by socioeconomic background, language history, and disability status. Portfolio-based assessment required sustained self-organisation and metacognitive capacity that were developed differentially across student populations. The response to AI's disruption of written assessment could not be to simply substitute other formats without attending to the equity implications of those formats for the full range of students who would be assessed by them.
Reimagining Pedagogy for the AI Era
Beyond the immediate crisis of assessment integrity lay a more fundamental question: what should education do in a world where AI can perform many of the cognitive tasks that formal education has historically trained people for? The question was not new — similar questions had been raised by calculators, by the internet, by search engines — but AI's generality of capability gave it a scope that previous tools had not had. A calculator automated arithmetic; the internet automated information retrieval; AI was automating significant portions of reasoning, synthesis, and communication. The scope of what remained uniquely human in the cognitive tasks that education prepared people for had narrowed significantly.
UNESCO's (2023) guidance proposed a framework centred on "AI literacy" — developing in students the ability to understand, evaluate, and effectively use AI tools as part of a broader set of cognitive capacities. This was more than teaching students to use ChatGPT; it involved developing the critical judgment to evaluate AI outputs, the awareness of AI limitations and biases, the ethical reasoning to make appropriate decisions about when AI use was appropriate, and the metacognitive capacity to understand what they were and were not learning when they used AI assistance.
Selwyn (2019) argued that digital technologies in education should be evaluated not primarily on their technical characteristics but on their social implications — on who benefits, on what kinds of learning they enable and disable, on how they change the social relationships of education. Applied to AI, this framing directs attention away from the technical capabilities of AI tools and toward the social conditions in which they are deployed: the power relationships between students and institutions, the economic interests of technology vendors, the cultural assumptions embedded in AI systems trained on particular bodies of text.
Floridi and colleagues (2020) identified several principles for beneficial AI that translated into educational contexts. Beneficence — AI should benefit students, not merely be efficient for institutions. Non-maleficence — AI should not harm students, including through privacy violations, false accusation, or the production of competency gaps. Autonomy — AI should support the development of student agency and independent judgment, not substitute for it. Justice — the benefits and burdens of AI in education should be equitably distributed.
The most productive pedagogical response to AI appeared, from emerging evidence, to involve treating AI as a subject of study as much as a tool for study — developing in students the ability to work with AI critically, to understand its outputs as products of specific training processes with specific limitations, and to situate their own intellectual work in relation to what AI can and cannot do. This required faculty development at a scale that most institutions had not undertaken. Educators who had not themselves developed facility with AI tools could not effectively design AI-integrated pedagogy or guide students through the ethical and intellectual challenges AI posed.
The credential question remained unresolved. If educational credentials were to continue signifying genuine competency — rather than access to AI tools — the competencies they certified would need to be either inherently difficult to delegate to AI or verified through assessment processes that could not be meaningfully assisted by AI. Both requirements pointed in the same direction: toward the development of deeper, more authentic, more transferable competencies rather than the performance of tasks on demand. Whether educational institutions had the political will, the economic resources, and the institutional flexibility to make that shift was uncertain. What was certain was that the question of what education was for — and what it owed the students and societies it served — had never been more urgently in need of a serious answer.
References
- Rudolph, J., Tan, S., & Tan, S. (2023). ChatGPT: Bullshit spewer or the end of traditional assessments in higher education? Journal of Applied Learning and Teaching, 6(1).
- Perkins, M. (2023). Academic Integrity considerations of AI Large Language Models in the post-ChatGPT era. Journal of University Teaching and Learning Practice, 20(2).
- Stokel-Walker, C. (2023). AI bot ChatGPT writes smart essays — should professors worry? Nature.
- Cotton, D.R.E., Cotton, P.A., & Shipway, J.R. (2024). Chatting and cheating: Ensuring academic integrity in the era of ChatGPT. Innovations in Education and Teaching International, 61(2), 228–239.
- Floridi, L., et al. (2020). An ethical framework for a good AI society. Minds and Machines, 28, 689–707.
- Noble, S.U. (2018). Algorithms of Oppression. NYU Press.
- Proctorio Inc. litigation history (2020–2022). EFF documentation. Electronic Frontier Foundation.
- UNESCO (2023). Guidance for Generative AI in Education and Research. UNESCO.
- World Bank (2022). World Development Report 2022: Finance for an Equitable Recovery. World Bank Group.
- Selwyn, N. (2019). What is Digital Sociology? Polity Press.
Further Reading
- AI and the Crisis of Journalism: Synthetic News, Newsroom Automation, and the Erosion of Trust
- AI in Healthcare: Algorithmic Bias, the Commodification of Health Data, and Automation in Clinical Practice
- AI in Financial Services: Algorithmic Credit, Flash Crashes, and the Concentration of Systemic Risk
- AI and Labour Displacement: Automation, the Future of Work, and Economic Inequality
- AI in Legal Services: Algorithmic Justice, Access to Law, and the Limits of Automation
- AI and Social Media: Engagement Algorithms, Mental Health, and Manufactured Consent
- AI in Automotive: Autonomous Vehicles, Liability, and the Infrastructure of Trust
- AI in eCommerce: Personalisation, Dark Patterns, and the Algorithmic Shopper
- AI for Business Owners: Productivity, Risk, and the Small Business Frontier