OpenAI debuts voice cloning

Artificial intelligence research company OpenAI has deemed it too risky to publicly release its Voice Generation AI model, which it says can reproduce a person’s voice using a 15-second sample of their speech.

San Francisco-based OpenAI said its Voice Generation model could create “emotive and realistic voices” and had already been used to power the voice of its ChatGPT chatbot, but the technology still had “serious risks”.

Voice Engine was first developed in late 2022, but the company has only shared it with “a small group of trusted partners” due to “the potential for synthetic voice misuse”, it said in a blog post.

OpenAI said it believed the technology could be used to help with reading, translation, supporting people who are non-verbal or recovering their voice, or assist with service delivery in remote areas.

But the company cites November’s US election as part of the reason why there are risks to releasing its voice technology, and why the company is gathering feedback “from across government, media, entertainment, education, civil society and beyond”.

“Based on these conversations and the results of these small-scale tests, we will make a more informed decision about whether and how to deploy this technology at scale,” the company said.

It added that broader steps may need to be taken to protect people, including phasing out voice-based authentication for accessing banks and other services, better audiovisual content tracking, educating the public about AI, and exploring policies to protect individuals’ voices.

The company is also yet to publicly release its Sora text-to-video generator, due to the potential for malicious use.

Opening ChatGPT 'to make AI accessible’

On Monday, just days after it confirmed it would not publicly release its Voice Engine yet, OpenAI announced it would allow anyone to access ChatGPT without needing to sign up for an OpenAI account.

The company said it was opening up the service “to make AI accessible to anyone curious about its capabilities”.

Using ChatGPT without an account will still have drawbacks, as users can’t save their conversations and will face “additional content safeguards … such as blocking prompts and generations in a wider range of categories”, Open AI said.

The firm may also use data provided by users without accounts to improve its AI models — but users can opt out of this.

^{OpenAI CEO Sam Altman says the company is developing AI in a responsible way. Photo: Shutterstock}

Rebecca Johnson, a generative AI researcher at the University of Sydney, told Information Age that while many other AI chatbots are publicly available, opening up ChatGPT brings an already popular system to a new level of scale and accessibility — which carries its own risks.

“Every time you remove a little layer like that, and make it even more widely accessible, and for people to remain more anonymous, then you're just exacerbating the problem,” she says.

‘A perfect storm’

Johnson says that while it’s positive that OpenAI is being cautious with “powerful technology”, she believes the company should open up its work to a wider range of researchers.

“They are really limiting — they always have been — how much access they give to academic researchers and to scholars,” she says.

“So it makes it difficult for people to independently test and research these products before they get out into the public.”

Johnson cites the potential for election interference, AI bias, and invasions of personal security and privacy as some of the key risks of generative AI, and wants greater public education.

More than 50 countries, containing around half of the world’s population, are due to hold national elections in 2024.

“We're standing at a point in time where what we're seeing coinciding is this huge election year with widespread generative AI technologies — text, images and audio — and what I'm seeing is a perfect storm,” Johnson says.

"In this new world that we're standing in, unfortunately, individuals need to be alert to how generated content may be used.

“We know that seeing is no longer believing, and hearing it with your own ears now carries less weight.”

OpenAI investor Microsoft unveiled a similar voice model called VALL-E in January 2023, which it said could mimic speech using only a three-second audio sample.