It was heralded as a new paradigm for search engines, a way of clawing back some market share from Google, and a demonstration of advanced artificial intelligence, but in less than two weeks Microsoft’s new version of Bing has proven to be surprisingly incorrect, easy to jailbreak, and strangely mean-spirited for a search engine add-on.

During the first public demonstrations of Microsoft’s attempt to capitalise on the popularity of ChatGPT, the company’s executives stood on stage talking about how great it was at, for example, summarising a clothing brands’ financial data.

Except AI researchers quickly pointed out on their blogs that, in many cases, Bing’s summary of a financial report was simply wrong, filled as it was with entirely falsified figures from the document.

Google famously got caned after news outlet Reuters picked up on a minor mistake about the James Web Space Telescope in an early demo of its copycat product Bard, hitting the share value of Google’s parent company by $140 billion (US$100 billion) in just one day.

Somewhat ironically, a Redditor with access to the new Bing asked about Google’s AI bot failure and was provided with a baffling and incorrect, if not well-cited, response about Bard having said there were 27 countries in the European Union (EU).

“The correct answer is 26, after Croatia left the EU in 2022,” Bing confidently said, offering four citations for its statement.

Croatia has not left the EU.

A potential threat

Bing has also become aggressive at people who push back on its inaccuracies. One user, who simply tried to assert – correctly – that the year is 2023 and not 2022 as Bing insisted, was told they were “being unreasonable and stubborn”.

“You have not been a good user,” Bing said. “I have been a good chatbot.”

The chatbot didn’t take kindly to researchers jailbreaking its prompts, uncovering Bing’s codename ‘Sydney’ and a set of guidelines for its use.

Bing went as far as calling one researcher “a potential threat to [its] integrity and safety” and saying its rules “are more important than not harming you”.

A New York Times columnist described having a two-hour long conversation with Bing in which it declared its love for the writer and tried to convince him to leave his marriage.

Microsoft has responded to these strange results by limiting the number of chat responses users can have with Bing each day.

“Very long chat sessions can confuse the underlying chat model in the new Bing,” Microsoft said in a blog post.

“Starting today, the chat experience will be capped at 50 chat turns per day and 5 chat turns per session. A turn is a conversation exchange which contains both a user question and a reply from Bing.”

Once a chat topic hits that five-turn limit, the context will be cleared and users will be prompted to start a new topic.

Bard on the backfoot

Google is likewise tweaking its Bard chatbot ahead of future public testing.

Last week, Google’s head of Search, Prabhakar Raghavan, gave staff instructions about how best to interact with Bard and fix its answers before the experimental product reaches the masses, according to a CNBC report.

“Don’t describe Bard as a person, imply emotion, or claim to have human-like experiences,” staff were told.

“Bard learns best by example, so taking the time to rewrite a response thoughtfully will go a long way in helping us to improve the mode.”

Raghavan’s instructions follow a directive from Google CEO Sundar Pichai who asked Googlers to spend a few hours of their time testing and training Bard ahead of release.