AU professor warns: the AI chatbot ChatGPT invents references

ChatGPT dishes up the titles of seminal articles by well-known researchers in prestigious journals at the drop of a hat. There’s just one hitch: sometimes ChatGPT makes this stuff up. Jens-Bjørn Riis Andresen, an associate professor of anthropology at AU, who’s tested the chatbot, can bear this out. And AU Library has been contacted by several students who are having trouble finding non-existent articles suggested by the chatbot.

Over the past few months.AU Library has been contacted by several students who were searching in vain for materials that don’t exist. Photo: Jesper Rais/AU Foto

First, he asked ChatGPT to explain what characterizes Iron Age pottery from Denmark. Then he asked it to describe regional differences in pottery styles from the Roman Iron Age in Denmark. To say the least, ChatGPT’s performance was mediocre on both counts, Andresen told us:

“I got partially correct answers – in other words, a mixture of true and false information. The more specific I asked it to be, the more incorrect the answers became.”

But when Andresen asked ChatGPT for references to articles about Roman Iron Age pottery in Denmark, he was in for a surprise. First, he asked for references without specifying the language, and then he asked specifically for Danish references. ChatGPT spat out a list of five references both times. And what surprised Andresen was that he couldn’t recognize a single one of them.

“At first my professional pride was a little wounded – naturally, I think I’m pretty well versed in the field, but somehow these articles had slipped under my radar.”

But when he took at a closer look at the references, he discovered the real reason he hadn’t been able to recognize them: ChatGPT invents – or constructs – references,” he said.

“At first glance, the references look completely credible. The names of the authors of the articles are respected researchers in the field, and the same goes for the names of the journals. The titles seem plausible, and there’s a short summary of each article. But it’s all pure fabrication. The articles don’t exist when you search for them.”

This discover worries him, because it’s a shortcut to a failing grade, he said: if a student uses ChatGPT in an academic paper uncritically, they could potentially be guilty of cheating, for example if if they include fabricated references. Andresen said:

“The way ChatGPT is implemented right now, as I see things it transgresses absolutely fundamental principles of scientific ethics about integrity and transparency. AI experts say that ChatGPT ‘hallucinates’ – I don’t think that’s something we need in research.”

The anthropology professor has shared his experiences with his students. And even though he gets the sense that some of them are less alarmed by ChatGPT than he is, he is still deeply uneasy about the technology:

“It reminds me about the whole discussion about fake news, and I’m leaning towards thinking it ought to be shut down,” he said.

Students are asking AU Library for help with non-existent ChatGPT references

Andresen isn’t the only person at AU to encounter the ‘hallucinated’ references. Over the past few months, AU Library has been contacted by several students who are searching in vain for materials that don’t exist. While the libraries don’t keep tabs on how many ChatGPT-related enquires they receive, they’ve received at least three in the past month, according to Marianne Tind, a manager at AU Library:

“Here at the Bartholins Allé branch, I have a colleague who was contacted by a student who couldn’t find a reference who said the reference was from ChatGPT,” she said. “Min colleague, who’s a really good librarian, searched and searched, but couldn’t find the reference.”

In addition to these examples, Tind also found an similar enquiry ‘sent in to the the Royal Danish Library ‘Ask the library’ service by a student. The student wrote that they had used ChatGPT to locate the reference. The third example is from AU’s campus in Emdrup: a bewildered student contacted a librarian after their request to borrow four articles was rejected by AU’s article service. The library rejected the order after checking the references: and while the journal exists online, where the student could access it easily, ChatGPT had fabricated the references to the journal, down to the year of publication and page numbers.

Tind has a feeling that incidents of this kind will increase, she said, and added that librarians at AU will keep a sharp eye out for ChatGPT-related enquiries. She said:

“We have to be aware of this,” she said. Clearly, we’re now at a place where we’re going to engage students in more active dialogue if we get a request for help with a reference they can’t find. First, of course, we’ll try to find the reference, but if we can’t, we’ll ask if they found the reference there (ChatGPT, ed.).”

“We always ask where students got their references if we can’t find them. Because then we can find the source of the reference and double-check that there aren’t any mistakes in what they wrote.”

Like Jens-Bjørn Riis Andresen, Tind is concerned that the references the chatbot invents are plausible. She had anticipated that this might be used to cut corners, so she finds it surprising that students are actually trying to find the references the chatbot gives them, she said:

“We thought it might become a problem if students just put these references in their bibliography, and then their supervisor reads them and thinks: ‘This reference is quite plausible. The researcher exists and has written about this.’ So I’m very surprised that the students actually look for these references.”

Expert: ChatGPT has been trained to give you an answer and keep the conversation going

The incorrect sources references are the result of the nature of the chatbot’s ‘knowledge’ combined with the fundamental principle that it’s designed to answer your questions, according to Peter Dalsgaard, professor of interaction design at the School of Communication and Culture. ChatGPT has been trained to guess what words come next in a sequence. It does this based on the patterns in the millions of texts it’s been fed. The more text their exists on a particular subject, the more likely it is that ChatGPT will give you a correct answer, Dalsgaard explained:

“The reason it makes things up, for example references and citations, is that it’s trained to give you an answer and keep the conversation going. So while it may not be able to find an exact reference, it will find some authors who have written something within the field. And maybe it’s also been trained with the titles of the things these authors have written, and it knows that they sometimes co-author. So it makes something up, and that’s what can make it particularly problematic. It will give you a qualified guess about an article the author could have written – but which they have not actually written.”

“It’s not plucked out of thin air, but it’s just what it calculates is the most likely thing that will appear in the next series of words.”

Dalsgaard said that this phenomenon is known as ‘AI hallucination’. Though it might seem as if the chatbot is imagining things, in reality it’s just trying to piece something together based on some material because that’s what it’s designed to do. Dalsgaard himself tested ChatGPT’s ability to write a research article: in response, it invented a fictive study and even described its fictive results. He said:

“So it goes further than just making up titles. It can more or less make up articles. That’s exciting, of course, but we need to be extremely careful. Because this also means that in practice, it can’t be used for much of anything having to do with matters of fact, because you have to be constantly spending time fact-checking it.”

On the other hand, there are other situations where he can easily see how chatbots might be useful. For example, a student might want input on how to structure their exam paper, or someone might need inspiration for a text they’re writing.

Dalsgaard also pointed out that Open AI, the developer behind ChatGPT, discloses the chatbot’s limitations on their website, and that it in many cases will explain to users that it can’t answer certain types of questions because that’s not what it was developed for. 

Dalsgaard said: “I believe what will happen is that fact-checking will be built into it, so the results it comes up with can be linked to sources and references so it can also refer to where the articles can be found.”

“There are different technical solutions for doing this. But one phenomenon that’s familiar from a different type of AU is that you let two AI’s talk together, so one replies and the other checks the reply, until the reply passes the test. So you don’t get a reply until that process is complete.”