Nigerian universities are grading AI in the classroom all wrong
By Aboki Forex —
Over six weeks in 2024, about 800 senior secondary students in Edo State sat in computer labs twice a week and learned English with the help of Microsoft Copilot. By the end, their test scores had improved by close to two years of learning compared with classmates who were not in the program. Girls, who started behind the boys, caught up.
The World Bank ran the experiment as a randomized controlled trial. It treated it as one of the first real tests of generative AI as a tutor in a low-resource setting. A few years ago, that result would have sounded impossible. It is now one of the most cited education studies to come out of Africa.
Most of the coverage has fixed on the numbers. The numbers earn attention. But they are not the lesson. The lesson is in how the thing was done.
Copilot was not built to teach. It was built to draft emails and, in plenty of classrooms, to let students hand in essays they did not write. The same tool produced two years of learning in Edo State for one reason. The people running the pilot did not point it at answers. They pointed it at thinking.
Teachers opened each session with the topic and a starting prompt, stayed in the room to mentor and add prompts as students worked, and closed with a short reflection. The World Bank called the teachers 'orchestra conductors.' The AI was one section of the orchestra. It was not the conductor, and it was not the score.
That distinction is the whole argument, and we keep getting it backwards. Nowhere more so than in how students are taught to write code.
The wrong scoreboard
Walk into a university computer lab today, in Lagos or anywhere else, and you will find students who are not so much writing assignments as negotiating with a chatbot until working code appears. Ask it to explain recursion, fix a broken loop, or write a whole solution, and it will. The technology is cheap and capable, and it is not going anywhere.
The question for computer science education was never whether students would use it. The question is how we judge what happens when they do. Right now, we judge it by the answer. Does the code compile? Does it pass the tests? Is it clean enough to submit? Those are fair questions for an engineer choosing a tool. They are the wrong questions for a teacher measuring learning.