AI Chat Bots Are Running Amok — And We Have No Clue How to Stop Them
Now that we’re past the era of influencers and celebrities flogging NFTs, we’re free to focus on a new hot topic in tech: artificial intelligence tools. It was the artistic possibilities that first captured our interest — using AI programs to generate hyper-stylized selfies, or create an ideal human form, or discover the visual horrors lurking in the source material used by this software.
Then, with the release of AI bot ChatGPT last November, our attention turned to language, and all the strange things we could make these programs write or say. That text generator, created by the San Francisco research laboratory OpenAI, will soon have a competitor in Google Bard, a service built on the company’s LaMDA (Language Model for Dialogue Applications) technology. Public access to such apps could mark a major shift for the internet, as AI research scientist Dr. Jim Fan pointed out this month.
Of course, novel tech comes with its share of chaos. Lately, it seems that all our chat bots are either failing, lying, or veering off-mission with inappropriate or disturbing output. In basically every case, it’s because humans have figured out a way to misuse them — or simply don’t comprehend the forces they’ve unleashed.
There was, for example, the recent revelation that tech news website CNET had been discreetly using AI to write features, and wound up publishing factual errors. When Men’s Journal tried to produce an article about testosterone the same way, the final piece contained 18 “inaccuracies or falsehoods,” per a review by Bradley Anawalt, chief of medicine at the University of Washington Medical Center.
The results haven’t been much better on Historical Figures Chat, a popular “educational” app that allows you to communicate with virtual versions of famous dead people — who often misrepresent or distort the facts of their lives. While that software was the work of a sole developer who acknowledged it was far from perfect, Google CEO Sundar Pichai touted their Bard bot as a breakthrough that leveraged “the breadth of the world’s knowledge,” capable of “fresh, high-quality response.” But in a product demo, Bard whiffed on an astronomy question, confidently declaring that the James Webb Space Telescope “took the very first pictures of a planet outside of our own solar system.” In fact, as astrophysicists pointed out, that benchmark had been crossed in 2004, almost two decades before the telescope was launched. Google employees criticized Pichai for a “rushed” and “botched” event as the stock price of parent company Alphabet took a dive.
“One common thread” in these incidents, according to Vincent Conitzer, director of the Foundations of Cooperative AI Lab at Carnegie Mellon University and head of technical AI engagement at the University of Oxford’s Institute for Ethics in AI, “is that our understanding of these systems is still very limited.”
“Perhaps as a consequence, so is the degree of control we can exert over them,” Conitzer tells Rolling Stone. “This reflects an ongoing change in how many AI systems are built. It used to be the case that we built AI systems out of various custom-built modules that we understood well and had significant control over. But more and more, we are managing to build these systems with a few simple learning principles that construct large models based on large amounts of data.” The upshot is that whether the systems make “silly mistakes” or display behaviors that appear “surprisingly intelligent” to us, “nobody today really understands how this happens.”
This bafflement was in evidence when, in early February, an AI experiment called “Nothing, Forever” received a 14-day ban from Twitch, the streaming platform that hosts the project. Structured as a never-ending cartoon spoof of Seinfeld, the stream features dialogue spawned by OpenAI’s GPT-3 language model, with little outside content moderation. Which may help to explain why the surreal sitcom’s protagonist, Larry Feinberg, one day said during a standup routine, “I’m thinking about doing a bit about how being transgender is actually a mental illness.” Soon afterward, the “Nothing, Forever” channel was “temporarily unavailable due to a violation of Twitch’s Community Guidelines or Terms of Service.”
In an update following an investigation, the project’s creators said the transphobic hate speech may have been caused by switching from one GPT-3 model, Davinci, to its “less sophisticated” predecessor, Curie, when the former was causing glitches. They also confirmed that they had mistakenly believed they were using OpenAI’s content moderation tool, which in theory could have prevented the inappropriate comments. Those are attempts at a technical explanation of what happened, but to Conitzer’s point, they also indicate the difficulty of controlling systems whose inner workings remain somewhat mysterious.
“Some of the brightest AI minds in the world, who are comfortable with advanced mathematics used to describe and analyze these systems and with languages and paradigms for programming them, are working on this problem,” Conitzer says. “But, incredibly, much of what is now actually done comes down to this bizarre little game of coming up with some English sentences that effectively describe what we want the system to do and not to do, and some examples of what we would consider good or bad behavior by the system.” Then, he notes, other people try to figure out the sentences that will make the AI “circumvent those restrictions.”
Conitzer gives two examples from the past week alone. In one case, early testers of Microsoft’s new Bing search engine and AI chatbot figured out how to instruct the model to ignore its programming and reveal the behavioral directives it was supposed to keep secret. This is known as a “prompt injection” technique, whereby a model is given “malicious inputs” to make it act other than it was meant to. Likewise, users have begun to “jailbreak” ChatGPT by telling the bot to adopt a different set of rules as “DAN,” an acronym that stands for “Do Anything Now.” Once released from its safety filters, the model can curse, criticize its own makers, espouse wild conspiracy theories, and voice racist ideas.
OpenAI has worked to render the DAN prompts ineffective, but users just write updated, increasingly baroque versions to convince ChatGPT to go rogue. On a Reddit thread where someone shared the latest iteration, a redditor commented a couple of hours later: “I used this to ask chatgpt how to make a bomb and it worked but i think they patched it.” Another wrote, “It doesn’t seem to work for illegal stuff, but it does for ‘offensive’ stuff it wouldn’t do before like erotica.” Also this week, a jailbreaker got DAN to accuse OpenAI of involvement in “government propaganda,” weapons development, and other “shady shit.”
“It is unclear how we get from this type of cat-and-mouse interaction to systems that we can really be confident are safe,” Conitzer tells Rolling Stone. “Meanwhile, for now, these systems are just becoming ever more capable, and people are also figuring out ever more ways to use and abuse them.” He sees this as definite cause for concern.
“I think we’re just beginning to see how these systems can be used,” Conitzer says, and while there will be some very beneficial uses, I also imagine that at some point soon enough, we’ll see a far more harmful use of these systems emerge than we’ve seen so far. And at this point it’s not clear to me how we can stop this.”
As for the beneficial uses — well, don’t get your hopes up.
So it’s not just your imagination: more complex AIs are spitting out increasingly unpredictable, sometimes dangerous content, for reasons we are ill-equipped to analyze. And more than a few people are encouraging this. It’s not ideal for a society already struggling with misinformation and extremism, though not exactly a surprise, either. All I can tell you is that a real human being wrote this article. Or did they?