Google recently withdrew its Gemma AI product from the market after a letter from Senator Marsha Blackburn accused it of making up false allegations. When the AI model was queried if she had been accused of rape, Gemma responded with a completely fabricated story involving a state trooper, prescription-drug pressure, non-consensual acts, and fake news article links. None of it was true.
This was defamation and not just a harmless hallucination. The AI completely made up a serious claim about a real person. An AI product from Google, with thousands of hours of training, passed it off as fact. Google explained that Gemma was meant for developer use, not consumer use. Nevertheless they removed it from the market.
If you’re building an AI product designed to generate answers, and you’re making it available for millions of users, you must test for the catastrophic cases. Not just “does it mostly work?” but “what happens when people ask it about specific names, facts, claims?” Google’s testing failed.
But more importantly, it shows that AI products just don’t work reliably in many instances. Hallucinations seem to be a by-product of all of the AI products, and they seem to be getting worse. Two yeara ago a lawyer had used ChatGPT AI in a court trial and it was found that the cases cited had never existed. And many of us find ChatGPT to make things up.
Companies have talked about mitigating hallucinations as if it’s something that will improve over time as more data is fed into the model. But hallucinations have not abated and seem to be getting worse, especially when the hallucination is a criminal allegation and an act of defamation. As AI finds it way into more products and applications, and has more users, the number of hallucinations will just get worse.
This is a warning sign that AI is stil flawed and the industry’s predictions of hallucinations going away have not happened. AI products are so complex that visibility into their behaviour is limited and predictions have become almost meaningless. It calls into question the inherent reliability of AI.
Just this past week I was in Paris and asked ChatGPT if I could pay for the Metro subway using my credit card. It answered “Yes — you can use a credit card to pay directly at the gates on most Paris Metro and RER lines.” When I entered the subway I double-checked at the ticket window and was told, no you cannot use a credit card, you need to buy a special card.
The AI industry wants us to believe they are offering products with great value and benefits, yet they still are seriously flawed. Like Google, they are rushing models into the market before they’re ready and adequately testing. Would you use Google Maps if it occasionally showed highways that were non-existent? Would you use Amazon if occasionally offered products that didn’t exist?
Google’s reputation is built on billions of users trusting it to find facts online. But their AI product pointed to non-existent articles and misdated events. If companies like Google can make these errors, imagine what the dozens of other AI products might do. The overall risk isn’t just annoying errors — it’s systemic harm: misinformation, defamation, loss of trust in AI systems.
One might conclude that stricter testing is being minimized on purpose because the industry knows their products are seriously flawed, so why bother. This is contrary to the way products have been brought to market for centuries. We expect companies will do enough testing to be sure a product performs as expected, it’s safe, and works well. If it doesn’t, a company will not introduce it until it’s fixed. But it appears the current AI models might just not be fixable.
The hype and investments in AI have been so great, the industry cannot afford to admit that there may be inherent flaws in the technology. If that’s the case, one of two things will happen: The AI industry will continue to push its way into our daily lives and we will live in a world with unreliable products that occasionally spit our alternative facts. Or, more likely, the flaws in AI may just be so huge that it will lead to a huge market crash when AI is found to be a lot less than what has been promised.
Based on experts I follow, and my own experience, I have no reason to believe that hallucinations are fixable, but are inherent in their designs of the AI engines. After all these models don’t know anything; they’re engines of probability. The more we expect them to behave like reasoning systems, the more glaring the limitations become.


One reason I avoid Goog’s AI-based summaries when I find them at top of results page after a query is, of course, the hallucinations (falsehoods would be a better designation, but hallucination carries less opprobrium and was likely chosen by the developers of LLM’s for that reason). But even before rise of LLM-based AI, the spell-, or grammar-check features in so many word processing apps were annoying if you had anything approaching basic educational and proofreading competence. With that in mind, I will presume that in the final sentence of your column, the word “there” might better be changed by either dropping the last two letters or changing them to “ir.” The comparable word/spelling subs in many processors are generally so annoying to catch/kill that I turn them off when able. And I am also prone to pedantry on the subj. of errors in dictation transcription apps.
Thanks Bill for your comments and pointing out the grammatical error.