OpenAI Unveils Text-to-Voice Tool, Raises Concerns Over Misuse

OpenAI reached another milestone in AI development with the introduction of a text-to-voice tool capable of generating natural speech from a mere 15-second clip of someone’s voice.

OpenAI has reached another milestone in artificial intelligence (AI) development with the introduction of a text-to-voice tool capable of generating natural speech from a mere 15-second clip of someone’s voice. This innovation allows the generated speech to closely resemble the original speaker, ushering in a new era of eerily human-like AI capabilities.

However, the company is cautious about the potential misuse of this technology and has opted not to release the Voice Engine publicly, limiting its availability to early testers for now.

In a statement, the San Francisco-based company acknowledged the serious risks associated with generating speech that mimics individuals’ voices, particularly highlighting concerns in an election year.

Voice cloning AI technology is not entirely novel and has already been employed in troubling circumstances. Prior to the primary vote in the United States earlier this year, AI-generated robocalls imitating President Joe Biden were disseminated to thousands of voters, urging them to stay home and refrain from voting.

The repercussions of such deceptive practices prompted the US Federal Communications Commission (FCC) to take action, banning AI-generated robocalls just last month.

Yet, the implications extend beyond elections, as voice cloning technology, including deepfakes, opens avenues for fraudulent activities such as extortion scams.

Nevertheless, there are positive applications of this technology. OpenAI has demonstrated how it can aid patients with sudden or degenerative speech conditions by restoring their voices using videos or audio recordings from before their speech impairment.

Moreover, the tool has the potential to provide a natural-sounding voice to individuals who cannot speak or face difficulties in doing so, offering a more humane alternative to robotic voices.

OpenAI emphasizes that Voice Engine is currently accessible only to select partners who have agreed to strict usage policies prohibiting the impersonation of individuals or organizations without consent.

Among these partners are Age of Learning, an education technology company, HeyGen, a visual storytelling platform, and Lifespan, a health system.

To address concerns regarding misuse, OpenAI has implemented various safety measures, including watermarking to trace the origin of generated audio and requiring explicit and informed consent from the original speaker.

The company also advocates for the incorporation of voice authentication experiences to verify that individuals are knowingly contributing their voices to the service. Additionally, they propose the establishment of a “no-go voice list” to detect and prevent the creation of voices resembling prominent figures without authorization.

As OpenAI continues to advance its AI capabilities, it remains imperative to strike a balance between innovation and ethical considerations to harness the full potential of such technologies while mitigating associated risks.

A text-to-voice tool is a software application or service that converts written text into spoken words. It takes input text in various formats, such as plain text, documents, or web pages, and generates audio output that replicates human speech.