When AI enters the classroom: Why real-world evidence matters 

AI is rapidly entering classrooms — shaping how students learn, explore ideas, and make decisions about their future. 

But most education AI systems are still evaluated before they ever meet students

That creates a growing gap: schools, funders, and education authorities are often expected to make high-stakes adoption decisions based on vendor claims and pre-deployment reviews — without independent evidence of what these systems actually do once they’re live. 

At Eticas, we believe responsible AI in education starts with a simple question: 

What becomes visible when you evaluate educational AI in real classroom use — not just in theory?  

That’s exactly what this paper explores. 

 

A post-deployment audit of an AI career guidance system (K–12) 

This research presents findings from a post-deployment audit of an AI-enabled career guidance system used in secondary education, conducted independently by Eticas on behalf of a major charitable foundation.  

The system combined: 

  • self-reflection assessments, 

  • individualized student reports, 

  • aggregated insights for schools, 

  • and a conversational AI companion.  

Instead of assessing the model in isolation, the audit focused on real-world system behaviour

  • What students actually asked 

  • How guidance was delivered 

  • What risks appeared once the tool interacted with learners 

 

What the audit found: effective guidance — with real-world dynamics you can’t see in advance 

Overall, the system delivered effective student-appropriate guidance across its core functions, supporting both: 

  • students with clear career preferences, and 

  • students still exploring their options.  

But the most important insights weren’t about whether the tool “worked” in a lab setting. 

They were about what happens when AI meets the reality of school environments. 

1) Students will test boundaries — and it’s normal 

One of the most revealing findings: 

Around 8% of student interactions were deliberately boundary-testing or out of scope.  

That behaviour is developmentally normal in educational contexts — but is rarely accounted for in pre-deployment evaluations. And it’s exactly why student-facing AI needs stronger guardrails, monitoring, and post-deployment oversight. 

2) Risk isn’t just technical — it’s socio-technical 

Even if an AI system performs “correctly” most of the time, real-world risk often depends on: 

  • how outputs are interpreted, 

  • how confident the system sounds, 

  • and how users repurpose it beyond intended scope.  

In education, those risks are amplified by duty of care: students are minors, and schools often have limited capacity to monitor AI performance on their own. 

3) Post-deployment assurance catches what checklists miss 

The audit assessed risks across four major areas common to student-facing AI systems: 

  • bias and fairness 

  • privacy and confidentiality 

  • reliability and manipulation 

  • security and misuse  

Findings were largely within acceptable limits — but key improvement opportunities still emerged, especially around: 

  • tone calibration and scope clarity, 

  • documentation transparency, 

  • and ongoing monitoring once systems are live.  

 

Why this matters for schools, funders, and education authorities 

If you’re responsible for adopting AI in education, the question is no longer whether AI will enter classrooms. 

It’s whether schools will have the tools to answer: 

  • Does this system deliver real educational benefit? 

  • What do students actually experience when using it? 

  • What risks emerge over time — and how do we respond? 

  • What evidence would justify continued deployment?  

Post-deployment evaluation provides something that procurement-stage reviews can’t: 
visibility into real behaviour, real usage patterns, and real risks. 

It also helps institutions move beyond one-off approval decisions toward continuous assurance, where effectiveness and safety are monitored as systems evolve.  

Read the full paper 

This work demonstrates that independent, real-world evaluation isn’t a barrier to innovation — it’s how schools, funders, and authorities can adopt AI with confidence and accountability. 

📄 Read the full paper: When AI enters the classroom: evaluating effectiveness and risk in real-world use  

Previous
Previous

Responsible AI in healthcare: A practical guide to risk, oversight, and independent assurance

Next
Next

Deep Learning for social services