Turning written language into computer code

Easily creating code from language inputs is one step closer with news that OpenAI has released the latest version of its Codex, the AI system that translates natural language to code.

The OpenAI Codex can interpret simple commands in natural language and execute them on the user’s behalf — making it possible to build a natural language interface to existing applications.

Codex is the model that powers GitHub Copilot, built with GitHub and proficient in more than a dozen programming languages.

“Unlike most AI systems which are designed for one use-case, the API today provides a general-purpose ‘text in, text out’ interface, allowing users to try it on virtually any English language task,” the developers said.

It’s now in private beta for businesses and developers to trial by integrating the API into an existing product, developing an entirely new application or exploring the strengths and limits of the technology.

Formed in 2015, OpenAI is an AI research outfit that wants to push AI developments for the public domain.

It has been working to advance artificial general intelligence (AGI), which are highly autonomous systems designed to outperform humans in work that is economically valuable.

It’s backed by Microsoft, LinkedIn co-founder Reid Hoffman, and PayPal co-founder Peter Thiel.

Building on GPT-3

OpenAI has been developing a first-of-its-kind API that can be applied to any language task — semantic search, summarisation, sentiment analysis, content generation, translation — with only a few examples or by specifying a task in English.

Natural language processing brings together AI, computing sciences and linguistics to design and develop computational processing of human languages, both written and spoken using software and machine learning.

OpenAI Codex is a descendant of GPT-3, the third generation of the text-producing AI model that takes an input language and turns it into another language using a pre-trained algorithm model.

The training data contains both natural language and billions of lines of source code from publicly available sources, including code in public GitHub repositories.

The Codex is most capable in Python, but it’s also proficient in over a dozen languages including JavaScript, Go, Perl, PHP, Ruby, Swift, TypeScript and even Shell.

It has a memory of 14KB for Python code, compared to GPT-3 which has only 4KB — so it can take into account over three times as much contextual information while performing any task.

To understand the central difference between GPT-3 and OpenAI Codex – and why developing the Codex means significant improvements in language-to-code processing – it’s necessary to look at the language prompts, on inputs.

Natural language

The developers said that GPT-3’s main skill is generating natural language in response to a natural language prompt, but this means the only way it affects the world is through the mind of the reader.

“OpenAI Codex has much of the natural language understanding of GPT-3, but it produces working code — meaning you can issue commands in English to any piece of software with an API.

OpenAI Codex empowers computers to better understand people’s intent, which can empower everyone to do more with computers,” they said.

Writing code starts with breaking down a problem into simpler problems and then mapping these simple problems to existing code, such as libraries, APIs or functions that already exist.

And it’s here where OpenAI Codex can have the greatest impact in making this last time-consuming and challenging stage easier – and it has many applications.

While the results may not always be the same, the developers believe OpenAI Codex is a general-purpose programming model that can be applied to essentially any programming task.

“We’ve successfully used it for source-to-source compilation (transpilation), explaining code and refactoring code.

“But we know we’ve only scratched the surface of what can be done,” they said.