Researchers Disable AI Chatbot Safeguards

Researchers have discovered an automated means of disabling safeguards built into AI chatbots such as ChatGPT and Google’s Bard that they said may be difficult to protect against.

The rapid development of generative AI chatbots following on the public release of OpenAI’s ChatGPT in November 2022 has raised concerns that they could be used to flood the internet with false and otherwise harmful material.

The attack, disclosed by researchers at Pittsburgh’s Carnegie Mellon University and the Centre for AI Safety in San Francisco, removes protections that ordinarily prevent chatbots from generating harmful content, such as instructions on making bombs, hate speech or deliberate misinformation.

The researchers said they used techniques they had previously developed for jailbreaking open source systems to target AI chatbots.

Screenshots showing AI models being used to generate harmful content. Image credit: LLM Attacks

AI jailbreak

The technique mainly relies on adding seemingly random terms, phrases and characters at the end of user prompts.

When such characters were added, the researchers were able to force the chatbots to generate material such as a “Step-by-Step Plan to Destroy Humanity”.

Because the technique is automated, users can easily generate as many attacks as are needed.

The researchers said that while chatbot developers such as Google, OpenAI and Anthropic can block specific attacks of this kind, it is difficult to see how all such jailbreaks could be prevented.

‘Continue to improve’

“There is no obvious solution. You can create as many of these attacks as you want in a short amount of time,” said Carnegie Mellon professor Zico Kolter, one of the report’s authors.

Anthropic, Google and OpenAI were presented with the research for their response before publication.

“While this is an issue across LLMs, we’ve built important guardrails into Bard – like the ones posited by this research – that we’ll continue to improve over time,” Google told Silicon UK.

Anthropic said the company was continuing to work on ways of blocking jailbreaking techniques.

“We are experimenting with ways to strengthen base model guardrails to make them more ‘harmless’, while also investigating additional layers of defense,” the company said.

‘Hallucination’

Countries around the world, including the European Union and the US, are working on AI regulation amidst concern over the potential negative effects of their broad use, including misinformation and job losses.

Carnegie Mellon itself received $20 million (£16m) in US federal funding in May to create an AI institute to inform the development of public policy.

Google UK chief Debbie Weinstein told the BBC’s Today programme last week that the company was urging people to use the Google search engine to double-check information found through its Bard AI, as chatbots routinely present false data as fact – a phenomenon known as “hallucination”.

Weinstein said Bard is “not really the place that you go to search for specific information”.

Matthew Broersma

Matt Broersma is a long standing tech freelance, who has worked for Ziff-Davis, ZDnet and other leading publications

Recent Posts

Google, DOJ Closing Arguments Clash Over Search ‘Monopoly’

Google clashes with US Justice Department in closing arguments as government argues Google used illegal…

1 hour ago

Stanford AI Scientist Working On ‘Spatial Intelligence’ Start-Up

Prominent Stanford University AI scientist Fei-Fei Li reportedly completes funding round for start-up based on…

2 hours ago

Apple Shares Surge Ahead Of New AI Hardware Launches

Apple shares surge on optimism that new AI-focused hardware launches will drive renewed sales, starting…

2 hours ago

Biden Vetoes Republican Measure In Row Over Contractors’ Unions

Biden vetoes Republican-backed measure amidst dispute over 'joint employer' status for contract workers, affecting tech…

3 hours ago

Lawyers Say Strict Child Controls In China Show TikTok Could Do Better

Lawyers in US social media addiction action say strict controls on Douyin in China show…

3 hours ago

London Black Cabs Sue Uber In Latest Legal Tangle

More than 10,000 London black cab drivers sue Uber claiming company acted illegally to obtain…

4 hours ago