EleutherAI claims new NLP model approaches GPT-3-level performance

Elevate your enterprise data technology and strategy at Transform 2021.

AI-powered language systems have transformative potential, particularly in the enterprise. Theyre already being used to drive chatbots, translate natural language into structured query language, create application layouts and spreadsheets, and improve the accuracy of web search products. OpenAIs GPT-3, which may be the best-known AI text-generator, is currently used in more than 300 apps by tens of thousands of developers and producing 4.5 billion words per day.

As business interest in AI rises, advisory firm Mordor Intelligence forecasts that the natural language processing (NLP) market will more than triple its revenue by 2025. But noncommercial, open source efforts are concurrently gaining steam, as evidenced by the progress made by EleutherAI. A grassroots collection of AI researchers, EleutherAI this week released GPT-J-6B (GPT-J), a model the group claims performs nearly on par with an equivalent-sized GPT-3 model on various tasks. Contributor Ben Wang led the work.

We think its probably fair to say this is currently the best open source autoregressive language model you can get by a pretty wide margin, Connor Leahy, one of the founding members of EleutherAI, told VentureBeat.

GPT-J is whats known as a Transformer model, which means it weighs the influence of different parts of input data rather than treating all the input data the same. Transformers dont need to process the beginning of a sentence before the end. Instead, they identify the context that confers meaning on a word in the sentence, enabling them to process input data in parallel.

The Transformer architecture forms the backbone of language models that include GPT-3 and Googles BERT, but EleutherAI claims GPT-J took less time to train compared with other large-scale model developments. The researchers attribute this to the use of Jax, DeepMinds Python library designed for machine learning research, as well as training on Googles tensor processing units (TPU), application-specific integrated circuits (ASICs) developed specifically to accelerate AI.

Training GPT-J

EleutherAI says GPT-J contains roughly 6 billion parameters, the parts of the machine learning model learned from historical training data. It was trained over the course of five weeks on 400 billion tokens from a dataset created by EleutherAI called The Pile, an 835GB collection of 22 smaller datasets including academic sources (e.g., Arxiv, PubMed), communities (StackExchange, Wikipedia), code repositories (Github), and more. (Tokens are a way of separating pieces of text into smaller units in natural language, and they can be words, characters, or parts of words.)

Above: GPT-J can solve basic math problems.

Image Credit: EleutherAI

For compute, EleutherAI was able to leverage the TPU Research Cloud, a Google Cloud initiative that supports projects with the expectation that the results of the research will be shared via code and models. GPT-Js code and the trained model are open-sourced under the Apache 2.0 license and can be used for free via EleutherAIs website.

GPT-J is more capable than the two previously released EleutherAI models: GPT-Neo 1.3B and GPT-Neo 2.7B. For example, it canperform addition and subtraction and prove simple mathematical theorems, like Any cyclic group is abelian. It can also answer quantitative reasoning questions from a popular test dataset (BoolQ) and generate pseudocode.


Above: GPT-J proving a theorem.

Image Credit: EleutherAI

[OpenAIs] GPT-2 was about 1.5 billion parameters and doesnt have the best performance since its a bit old. GPT-Neo was about 2.7 billion parameters but somewhat underperforms equal-sized GPT-3 models. GPT-J, the new one, is now 6B sized similar to the Curie model of OpenAI, we believe, Leahy said.

Looking ahead

EleutherAI plans to eventually deliver the code and weights needed to run a model similar, though not identical, to the full DaVinci GPT-3. (Weights are parameters within a neural network that transform input data.) Compared with GPT-J, the full GPT-3 contains 175 billion parameters and was trained on 499 billion tokens from a 45TB dataset.

Language models like GPT-3 often amplify biases encoded in data. A portion of the training data is not uncommonly sourced from communities withpervasive gender, race, and religious prejudices. OpenAI notes that this can lead to placing words like naughty or sucked near female pronouns and Islam near words like terrorism. Other studies, like one published in April by Intel, MIT, and the Canadian Institute for Advanced Research (CIFAR) researchers, have found high levels of stereotypical bias in some of the most popular models.


Above: GPT-J answering a word problem.

Image Credit: EleutherAI

But EleutherAI claims to have performed extensive bias analysis on The Pile and made tough editorial decisions to exclude datasets they felt were unacceptably negatively biased toward certain groups or views.

While EleutherAIs model might not be cutting edge in terms of its capabilities, it could go a long way toward solving a common tech problem: the disconnect between research and engineering teams. As Hugging Face CEO Clment Delangue told VentureBeat in a recent interview, tech giants provide black-box NLP APIs while also releasing open source repositories that can be hard to use or arent well-maintained. EleutherAIs efforts could help enterprises realize the business value of NLP without having to do much of the legwork themselves.


VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *