Researchers Disable AI Chatbot Safeguards

Researchers have discovered an automated means of disabling safeguards built into AI chatbots such as ChatGPT and Google’s Bard that they said may be difficult to protect against.

The rapid development of generative AI chatbots following on the public release of OpenAI’s ChatGPT in November 2022 has raised concerns that they could be used to flood the internet with false and otherwise harmful material.

The attack, disclosed by researchers at Pittsburgh’s Carnegie Mellon University and the Centre for AI Safety in San Francisco, removes protections that ordinarily prevent chatbots from generating harmful content, such as instructions on making bombs, hate speech or deliberate misinformation.

The researchers said they used techniques they had previously developed for jailbreaking open source systems to target AI chatbots.

Screenshots showing AI models being used to generate harmful content. Image credit: LLM Attacks

AI jailbreak

The technique mainly relies on adding seemingly random terms, phrases and characters at the end of user prompts.

When such characters were added, the researchers were able to force the chatbots to generate material such as a “Step-by-Step Plan to Destroy Humanity”.

Because the technique is automated, users can easily generate as many attacks as are needed.

The researchers said that while chatbot developers such as Google, OpenAI and Anthropic can block specific attacks of this kind, it is difficult to see how all such jailbreaks could be prevented.

‘Continue to improve’

“There is no obvious solution. You can create as many of these attacks as you want in a short amount of time,” said Carnegie Mellon professor Zico Kolter, one of the report’s authors.

Anthropic, Google and OpenAI were presented with the research for their response before publication.

“While this is an issue across LLMs, we’ve built important guardrails into Bard – like the ones posited by this research – that we’ll continue to improve over time,” Google told Silicon UK.

Anthropic said the company was continuing to work on ways of blocking jailbreaking techniques.

“We are experimenting with ways to strengthen base model guardrails to make them more ‘harmless’, while also investigating additional layers of defense,” the company said.


Countries around the world, including the European Union and the US, are working on AI regulation amidst concern over the potential negative effects of their broad use, including misinformation and job losses.

Carnegie Mellon itself received $20 million (£16m) in US federal funding in May to create an AI institute to inform the development of public policy.

Google UK chief Debbie Weinstein told the BBC’s Today programme last week that the company was urging people to use the Google search engine to double-check information found through its Bard AI, as chatbots routinely present false data as fact – a phenomenon known as “hallucination”.

Weinstein said Bard is “not really the place that you go to search for specific information”.

Matthew Broersma

Matt Broersma is a long standing tech freelance, who has worked for Ziff-Davis, ZDnet and other leading publications

Recent Posts

AT&T Admits Data Breach Impacted “Nearly All” Customers

American telecommunications giant AT&T admits that “nearly all” customer accounts were compromised in 2022 breach

2 days ago

Elon Musk’s X Breached DSA Rules, EU Finds

X's Blue checks 'used to mean trustworthy sources of information. Now our preliminary view is…

2 days ago

Japan’s SoftBank Acquires AI Chip Start-up Graphcore

SoftBank Group has purchased another British chip firm, with the acquisition of Bristol-based Graphcore Ltd…

2 days ago

Samsung AI-Upgraded Bixby Voice Assistant Coming This Year

Samsung reportedly confirms it will launch the upgraded voice assistant Bixby this year, that will…

3 days ago

Next Neuralink Brain Implant Coming Soon, Says Musk

Despite an issue with first Neuralink implant in a patient, Elon Musk says second brain…

3 days ago

EU Accepts Apple’s Legal Commitments To Open NFC Access

Legal commitment over Apple's NFC-based mobile payments system, which is to be opened to rival…

3 days ago