OPINION: GAI IS FORCING US TO DEVELOP NEW TYPES OF EXAMINATION
As students can now generate top-grade exam papers with artificial intelligence, universities are being forced to change their type of examination. The obvious solution is more oral examinations. However, there is a need to develop new formats that don’t put pressure on the individual examiner or on the departmental resources, writes Jens Bennedsen, Senior Professor of Engineering.
This is an opinion piece, the views expressed in the column are the writer’s own.
In an Opinion article published in Omnibus on 23 September, Niels Lauritzen, a pioneer in using generative AI (GAI) in his teaching, writes that he has now seen GAI generate perfect assignments with references to the syllabus.
OPINION: AU's GAI recommendations are a year old – but have already been overridden by technology
Yes, the development has progressed VERY rapidly, even faster than I had imagined. But when you see the enormous amounts of money being invested in the development of language models, perhaps we shouldn't be surprised. It was probably to be expected that we’d reach this point within a foreseeable number of years. And that brings me to my point – our method of assessment is not optimal and hasn't been for a long time.
We evaluate the students' process (how they solve a task) by evaluating their product. The problem is that with GAI, we now have a digital assistant that can produce a product of a quality that will typically result in the highest grade. In other words, we cannot assume a correlation between the students' process and the product they create.
New types of examination are required
What is the solution? We need to develop types of examination that allow us to evaluate the students' processes directly. This requires that we can observe the process or at least get the student to discuss it based on questions such as: What did you do to make the product? What alternatives are there? What considerations led you to choose this product? Why does it make sense to use this theory? This will probably work until micro hearing aids with integrated GAI are developed, which can be set up to deliver GAI-generated answers to the student based on questions asked by the examiner.
Why do we continue to evaluate products rather than processes? My best guess is: 1) It’s what we usually do, 2) we have just built a huge building to accommodate it, 3) it is the most time-efficient. When I talk to colleagues, they emphasise point three in particular.
We need resources to change the exams
What could be a specific solution? The obvious solution is, of course, to change the exam to an oral one (perhaps with a preliminary assignment that the student must then argue for). And we must ensure that there are resources for this change.
I can well understand that my colleagues who have classes of over 200 students do not think it is ideal to have to hold 200 x 20-minute exams = 66 hours (equivalent to around 11 days of exams with 6 hours of exams per day). When you have 1-2 other courses at the same time, it means that you will be busy conducting exams throughout the entire exam period. This calls for us to find more effective types of examination, where we focus on the process but don’t use as many resources.
We must dare to experiment with the format
I teach a large introductory programming course (approximately 175 students). Here, the type of examination has been changed from a written exam to a kind of oral exam – 12 students are assessed at the same time for two hours and are given a series of tasks to complete (possibly using GAI). The examiner and assessor walk around asking students to explain their solutions and other key concepts in the course. This means that I spend 10 minutes per student or 5 days on my course, half the time of a traditional oral exam. There is definitely room for improvement with this format – for example, letting the students work on the assignments for an hour and then discussing their solutions in the last hour. This way, we could have two overlapping groups and save even more time.
I'm sure there are many other creative lecturers out there who have different ideas. This requires that the boards of studies are willing to "experiment" with the type of examination, and that we systematically collect experience to demonstrate its quality and determine when it should be used.
It also requires greater focus on knowledge sharing about types of examination. I’d like to urge the CED (Centre for Educational Development, ed.) to make this one of their absolute priority areas. Let a thousand experiments bloom – but remember that the stakes are high for the students in their exams, so it must be well thought through.
This text was machine translated and post-edited by Lisa Enevoldsen.