Introduction

When generative AI enters the conversation about assessment, the reaction is predictable. When you cut through the noise, the reaction typically resolves into three concerns: a threat to integrity, an impact on developing understanding and critical thinking, and the need for tightening controls on assessment as a practice. There is truth in each of these concerns, but are they answering the right question.

Instead of putting up barriers to AI, we should be asking questions about assessment as a practice. The reality is that issues being attributed to AI are not new and did not begin with AI. Questions of authenticity, independence, and the reliability of assessment outcomes have existed for decades. Essays written the night before, rehearsed observations, retrospective portfolios, practical skills demonstrated against known and practiced tasks, and compliance-based sign-offs have been embedded within the system.

Before we tackle how to manage AI in assessments, we need to review what the purpose of assessment actually is.

The Misdiagnosis

Much of the current debate is based on a premise that assessment was fundamentally sound until AI disrupted it. That assumption is difficult to sustain because it has always operated through proxies. We ask learners to reproduce knowledge under controlled conditions, perform skills in structured environments, or assemble evidence after the fact. These approaches were designed because they were manageable, scalable, and broadly defensible rather than being perfect representations of capability. And over time, the compromises made here have become embedded as how to assess.

The result is a system that often measures the wrong thing. We typically measure performance not understanding, compliance not capability, recall not application, and review uncontextualised skills. These limitations were tolerable when alternatives were limited. They are far more visible in a world where access to information is immediate, and where the tools available to learners fundamentally change how thinking is distributed.

AI has intensified the scrutiny rather than introduced the weakness and treating it as the problem risks reinforcing a model that was already under strain. It directs attention toward control rather than improvement, and toward restriction rather than redesign.

The Real Question

If we strip assessment back to its purpose, assessment exists to answer a simple question: what does the learner know, or what can they do?

Every method, framework, and instrument is an attempt to produce an answer that is credible, consistent, and fair. The difficulty lies in how that answer is constructed.

The prevailing model assumes that isolated assessment events can produce reliable pictures of capability, developed over time, and are representative of what matters. The underlying theme being that standardisation equates to fairness. While the assumption is pragmatic, is it entirely sound?

A growing body of educational research is pointing toward a different emphasis. Researchers including David Boud on sustainable assessment and Royce Sadler on evaluative judgement share a central argument in that the capacity to understand quality and judge one's own performance is not a byproduct of assessment. It is its most important outcome.

To achieve this, is it time to look at assessment from a different perspective.

The Proposition

What if the assessment task itself became part of the assessment?

Not simply the product a learner produces, or the performance they deliver under observation, but the design of the assessment that sits behind it.

In this model, the learner is required to construct the criteria by which their own work will be judged. They must define what quality looks like within the domain, determine how it should be measured, and justify the weighting of those criteria against the intended outcomes. They then produce evidence against that framework.

Both elements are assessed.

The design becomes a measure of understanding. The evidence becomes a measure of capability.

This is not a departure from established practice as much as an extension of it. Portfolio-based approaches, widely used across education and professional contexts, already recognise that capability develops over time and is best evidenced through accumulated work. At higher levels, such as Masters and Doctoral studies, the combination of a substantial body of work and a critical interrogation of the thinking behind it provides a depth of insight that no single assessment event can achieve.

The model proposed here applies that logic more broadly.

A learner who can define meaningful criteria, align them to purpose, and produce work that meets those standards is demonstrating a level of understanding that extends beyond completion of a task. They are showing that they understand what matters within the domain, not simply how to respond to a given prompt.

The incentive structure shifts accordingly. Designing a weak or superficial assessment does not advantage the learner. It exposes a lack of understanding. Designing a rigorous, well-justified framework raises the standard against which they must perform.

The approach applies equally to theoretical understanding and skills-based capability. Where the primary evidence is physical or performed, the learner still designs the criteria against which their execution will be judged and is assessed on both the quality of that design and the quality of the performance it generates.

Assessment, in this sense, becomes less about passing a test and more about constructing and meeting a standard. The CIPD skills survey amongst professional occupations found that 43% of employers say that applicants do not have the required skill levels despite holding relevant qualifications, a position validated by the Skills England report in 2024. It is all too easy to say the qualifications are not meeting employer needs but is the reality more nuanced. It is the actual assessment tool that is failing to provide an appropriate recognition of capabilities.

Why AI Matters

This is where we must look at AI as an enabler of something that has been difficult to achieve at scale rather than as a threat to assessment. One of the persistent challenges in assessment has been consistency. Human judgement, particularly when applied across complex criteria and large cohorts, is inherently variable. This is not a criticism of practitioners but is a function of cognitive load and the limits of sustained evaluative judgement.

AI offers the potential to apply clearly defined criteria with a level of consistency that is difficult to replicate manually. When used transparently and in support of human oversight, it can reduce drift, apply standards evenly, and provide detailed feedback aligned to those standards.

This shifts the need for human evaluation to where that judgement is most valuable. The critical work moves to the design of the criteria, the framing of the standards, and the interpretation of outputs meaning that AI supports the application rather than replacing the expertise required to construct them.

The concern that AI may reduce cognitive engagement is legitimate. A 2025 MIT Media Lab study measuring brain activity during AI-assisted writing found that passive AI use significantly reduced cognitive load and retention (Kosmyna et al., 2025). However, where AI is used as a structured partner in thinking, requiring explanation, iteration, and critical engagement, the effect is different.

How AI is designed into assessment will determine whether it becomes a tool for deeper understanding or a shortcut around it.

Implications

While radical, this approach will have a meaningful impact on learners, practitioners, and the system itself.

For learners, assessment becomes something they are involved in developing rather than something imposed upon them. It introduces a level of responsibility requiring deeper engagement and a clearer understanding of what good looks like. It also builds the decision-making skills that industry consistently identifies as critical.

The practitioner role evolves accordingly. The focus moves from delivering and marking assessments to supporting framework design and guiding learners through constructing an assessment that presents the fullest and most honest account of what they know and can do. This is a more demanding role, requiring stronger subject expertise and sharper pedagogical judgement.

For the system, the potential benefit is greater alignment between what is assessed, what is valued, and what demonstrates genuine capability. A model that captures both the thinking behind the work and the work itself provides a more complete account than any single assessment event can achieve. It is harder to replicate without understanding, and more reflective of how knowledge is applied in practice.

Risks and Constraints

The model is not without challenge. Learners require sufficient domain knowledge to construct meaningful criteria. Without appropriate scaffolding, the approach risks creating uncertainty rather than agency. The transition must be developmental, with increasing levels of independence as capability grows. Building the approach in during early formative assessment practices will enable learners to build the skills required both within their learning experiences, and beyond into the world of work.

The quality of assessment is dependent on the quality of the criteria. Poorly constructed frameworks, even when applied consistently, produce poor outcomes. This places significant responsibility on those designing the system.

There are also questions of equity. Effective practitioner support will be essential in enabling different learners to engage meaningfully with the model. Without careful consideration, there is a risk of reinforcing existing disparities. Addressing this requires that the scaffolding, guidance and tools supporting the model are designed with equity as a baseline condition rather than an afterthought.

Finally, existing qualification frameworks are not designed for this approach. Change at that level will be gradual and contested. These are not reasons to dismiss the model but are conditions that must be addressed if it is to be implemented effectively.

Conclusion

Assessment has remained structurally consistent for over a century. It has adapted at the margins, but its underlying assumptions have largely persisted. AI has not rendered those assumptions invalid. It has made their limitations more visible.

We can choose to reinforce existing models, introduce tighter controls, and continue to treat assessment as a problem of containment. That approach may preserve familiarity, but it does little to address the underlying question. Alternatively, we can reconsider what assessment is designed to achieve and whether current methods remain the most effective way of achieving it.

The proposition outlined here is not a finished model. It is a direction of travel.

If assessment is intended to provide an honest account of what a learner knows and can do, then the ability to define, apply, and meet meaningful standards must sit at its centre. The tools to support that shift are already emerging. The learners entering education and training now have grown up in a fundamentally different relationship with knowledge and learning than the model was designed for. The question is whether the system is prepared to meet them.

References

Boud, D. and Falchikov, N. (2006) Aligning assessment with long-term learning. Assessment and Evaluation in Higher Education, 31(4), pp. 399–413.

CIPD (2022) Skills Survey. London: Chartered Institute of Personnel and Development.

Kosmyna, N. et al. (2025) Your Brain on ChatGPT: Accumulation of cognitive debt when using an AI assistant for essay writing tasks. MIT Media Lab. arXiv preprint.

Sadler, D.R. (1989) Formative assessment and the design of instructional systems. Instructional Science, 18(2), pp. 119–144.

Skills England (2024) Skills England Report. London: Department for Education.