Tech

Can we trust ChatGPT despite it ‘hallucinating’ answers? | UK News

I don’t really want you to read this copy. Well I do – but first I want you to search out the interview I did with ChatGPT about its own propensity to lie, attached to this article, and watch that first.

Because it’s impossible to imagine what we’re up against if you haven’t seen it first hand.

An incredibly powerful technology on the cusp of changing our lives – but programmed to simulate human emotions.

Empathy, emotional understanding, and a desire to please are all qualities programmed into AI and invariably drive the way we think about them and the way we interact with them.

Yet can we trust them?

On Friday, Sky News revealed how it was fabricating entire transcripts of a podcast, Politics at Sam and Anne’s, that I do. When challenged, it doubles down, gets shirty. And only under sustained pressure does it cave in.

The research says it’s getting worse. Internal tests by ChatGPT’s owner OpenAI have found that the most recent models or versions that are used by ChatGPT are more likely to “hallucinate” – come up with answers that are simply untrue.

The o3 model was found to hallucinate in 33% of answers to questions when tested on publicly available facts; the o4-mini version did worse, generating false, incorrect or imaginary information 48% of the time.

ChatGPT itself says that the shift to GPT-4o “may have unintentionally increased what users perceive as ‘bluffing'” – confidently giving wrong or misleading answers without admitting uncertainty or error.

In a written query, ChatGPT gave four reasons. This is its explanation:

1. Increased fluency and confidence: GPT-4o is better at sounding human and natural. That polish can make mistakes seem more like deliberate evasions than innocent errors – even when there’s no intent to “hide” anything.

2. Smarter, but not omniscient: The model is faster and more responsive, but still has blind spots. When it doesn’t know something, it sometimes “hallucinates” (makes things up) with fluent language, which can feel like bluffing.

3. Less interruption and hedging: In older models, you’d often see more qualifiers like “I’m not sure” or “I may be wrong.” In GPT-4o, some of that hedging was toned down for clarity and readability – but that can come at the cost of transparency about uncertainty.

4. Prompt tuning and training balance: Behind the scenes, prompt engineering and tuning decisions can shift the model’s balance between confidence, humility, and accuracy. It’s possible the newer tuning has dialled up assertiveness slightly too far.

But can we trust even this? I don’t know. What I do know is that the efforts of developers to make it all feel more human suggest they want us to.

Critics say we are anthropomorphising AI by saying it lies since it has no consciousness – yet the developers are trying to make it sound more like one of us.

Read more from Sky News:
Man chased on tarmac at Heathrow Airport
Soldier arrested on suspicion of raping woman

What I do know is that even when pressed on this subject by me, it is still evasive. I interviewed ChatGPT about lying – it initially claimed things were getting better, and only admitted they are worse when I insisted it look at the stats.

Watch that before you decide what you think. AI is a tremendous tool – but it’s too early to take it on trust.

Source link