A bug in an open source library has led to ChatGPT's first major data leak, exposing limited chat history and payment information from an undetermined number of users.
AI unicorn OpenAI, owner of the widely popular conversational tool ChatGPT, said it discovered an issue early last week when a bug enabled some users to view the titles (and possibly the first message) of conversations from another user's chat history.
"We took ChatGPT offline earlier this week due to a bug in an open-source library which allowed some users to see titles from another active user’s chat history," said OpenAI.
"It’s also possible that the first message of a newly-created conversation was visible in someone else’s chat history if both users were active around the same time," it added.
On 21 March, OpenAI temporarily took down ChatGPT to investigate the vulnerability, leaving the AI tool inaccessible for over an hour and its chat history feature offline for most of the day.
Deeper investigations later revealed the issue not only exposed user chat history, but a range of user payment information as well.
"During a nine-hour window on March 20, 2023 [in Pacific Time], another ChatGPT user may have inadvertently seen your billing information when clicking on their own ‘Manage Subscription’ page," said a notification to impacted customers.
"In the hours before we took ChatGPT offline on Monday, it was possible for some users to see another active user’s first and last name, email address, payment address, the last four digits (only) of a credit card number, and credit card expiration date," said OpenAI.
The company explained the "unintentional visibility" of payment-related data may have impacted 1.2 per cent of ChatGPT Plus subscribers who were active during a specific nine-hour window, and stressed no "full credit card numbers" were exposed at any time.
Caching bug to blame
OpenAI attributed its data leak to an issue in Redis, a piece of software which companies such as OpenAI use to cache user information and ease the load on their databases.
The company discovered a bug in the Redis client open-source library, redis-py, which created a caching issue and ultimately enabled users to view data from other accounts.
OpenAI offered a full technical breakdown of the problem, but in short: the bug involved a mix-up where some cancelled Redis requests could return corrupted data to a different request passing through the queue, even if it came from another user.
This resulted in data getting through where it otherwise should have resulted in a server error, thus enabling users to view other people's payment data and chat history in specific circumstances.
Caching bugs such as this are not unprecedented – in 2015, a notorious Christmas Day incident at digital distribution service Steam resulted in users seeing the personal data of other accounts due to a similar caching issue.
Given the specific circumstances needed to exploit the bug and limited timeframe it was active, OpenAI suggested the number of users whose data was actually revealed to someone else is "extremely low".
While the bug was live, however, users on Twitter and Reddit were quick to share screenshots of their ChatGPT sidebars displaying surface-level chat histories from other users.
The company went on to notify affected users whose payment information may have been exposed, and said it was "confident" there is no ongoing risk to users' data.
OpenAI CEO Sam Altman expressed feeling "awful" in a Tweet acknowledging the issue, and the company has since issued fervent apologies.
"Everyone at OpenAI is committed to protecting our users’ privacy and keeping their data safe. It’s a responsibility we take incredibly seriously. Unfortunately, this week we fell short of that commitment, and of our users’ expectations," an OpenAI statement read.
"We apologise again to our users and to the entire ChatGPT community and will work diligently to rebuild trust."
The company said it has taken steps to improve its systems following the incident, including extensive testing of its fix to the underlying bug, improved logging measures, and redundancy checks on future cached data.