7. Evaluating AI-generated content

The University of Queensland Library

7. Evaluating AI-generated content

A red-toned illustration shows a man's head surrounded by swirling AI icons, with small, mischievous witch-like figures flying around him. The man's expression appears disoriented and fatigued, symbolizing the mental overload caused by the overwhelming flood of AI tools and news. The witches represent the chaotic, cackling nature of rapid AI developments, adding to the sense of dizziness and confusion. — Nadia Piet + AIxDESIGN & Archival Images of AI / Better Images of AI / AI Am Over It / CC-BY 4.0

You should evaluate the quality and reliability of AI-generated content before relying on it, just as you would information from any source. Information provided by generative AI tools may be:

incorrect
out of date
biased or offensive
lacking common sense
lacking originality.

AI tools tend to produce ‘middle-of-the-road’ answers, based on a consensus of the most common information in the AI’s training data. You should continue to think critically as you use the tools for your learning. Ask yourself:

is the response you’ve been given too conservative?
is there an alternative viewpoint that has been missed?
what are your views — do you disagree with the information?

Methods for evaluating information

There are many methods for evaluating information. The TRAAP test is useful to consider when evaluating information generally and also emphasises some of the challenges with the evaluation of AI-generated content.

Applying the TRAAP test

Timeliness
Relevance
Authority
Accuracy
Purpose

Challenges

Different generative AI tools have different limitations. It is not always clear how current information contained in LLMs is. OpenAI’s documentation states that GPT-4 “generally lacks knowledge of events that have occurred after the vast majority of its data cuts off (September 2021)”. It is, however, capable of searching the web to find more recent information.

LLMs may not always present you with the sources for answers, or rely on unsuitable sources. This can make it difficult to judge the relevance, authority, accuracy and purpose of the information.

I can help with a wide range of topics, but there are some limitations. For example, I don’t have access to:

Personal data unless shared with me during our conversation.
Real-time data like live sports scores or stock prices.
Confidential or proprietary information.
Certain copyrighted content in full, such as books, articles, or songs.

Source: Answer provided by Microsoft Copilot on 26 November 2024.

Tips for confirming the information provided by AI tools

Ask the tool to provide you with sources. You can ask for a specific type of source (peer reviewed journal articles, news articles or academic sources). You can provide other constraints such as a time limit, e.g. ‘Can you provide academic sources from the last 5 years?’. Writing your prompt in academic or formal language will increase the chance of getting those types of sources. Note that there’s no guarantee that the AI tool will give you what you ask for but these techniques can increase the chance of better outcomes.
Locate the sources provided and confirm the information is real. Generative AI tools will present false information as fact and make up references.
Once you confirm the sources, consider their quality and whether they are appropriate for your task.
Look for other reputable sources that also confirm the information.

“Treat the AI like a slightly unreliable friend. Have a chat, ask some questions. Don’t trust the answers though.”

Can AI do your reading for you and should it?

Human in the loop

Evaluating the outputs of AI tools is sometimes referred to as “human-in-the-loop” work. Many of the AI models are based on predictive modelling and contextual understanding of the prompts they’re given. These models make mistakes!

Users of a new Google AI feature were told to eat rocks and add glue to pizza.

Constant feedback by the human-in-the-loop can improve your specific output and also the AI tools and models “and enhance the accuracy, reliability, and adaptability of ML systems, harnessing the unique capabilities of both humans and machines” (Source: What is Human-in-the-Loop in AI & ML?).

Activity

Choose a topic you know well — this could be a hobby, sport, musical instrument, game, or any area where you have confidence in your knowledge.

Ask an AI tool (such as ChatGPT, Bing Copilot, or Google Gemini) a question about this topic. For example, “How do you tune a guitar?” or “What are the offside rules in soccer?”
Carefully read the AI-generated response and evaluate it for accuracy, clarity, and completeness.
Identify any inaccuracies, misleading explanations, or gaps in the information provided.

Consider

What did the AI tool get right?
What did it miss or get wrong?
Did knowing the topic well help you spot issues?
Does this exercise shape your view on using AI tools for learning?

Example

A student asked ChatGPT: “What are the strings on a standard 6-string guitar tuned to?”

AI Response: “The strings on a standard 6-string guitar are tuned to E-B-G-D-A-E, from the lowest (thickest) string to the highest (thinnest).”

What’s incorrect: This is a reversal. The correct tuning from the lowest (thickest) string to the highest (thinnest) is E-A-D-G-B-E. The AI listed the strings in reverse order, which could confuse a beginner.

Reflection: A student familiar with the guitar would immediately recognise the error, but someone new might unknowingly accept the incorrect answer.

Licence

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License