The ultimate achievement for some in the AI industry is to create a system with artificial general intelligence (AGI), or the ability to understand and learn any task a human can do. Long relegated to the realm of science fiction, it has been suggested that AGI would create systems with the ability to reason, plan, learn, display knowledge, and communicate in natural language.
Not every expert is convinced that AGI is a realistic goal – or even possible. But one could argue that DeepMind, the Alphabet-backed research lab, took a step in that direction this week with the release of an AI system called Gato.
Gato is what DeepMind describes as a “general purpose system,” a system that can be taught to perform many different types of tasks. DeepMind researchers trained Gato to complete 604, to be precise, including captioning images, engaging in dialogue, stacking blocks with a real robotic arm, and playing Atari games.
Jack Hessel, a research scientist at the Allen Institute for AI, points out that a single AI system that can solve many tasks is not new. For example, Google recently started using a system in Google Search called multitask unified model, or MUM, that can process text, images, and videos to perform tasks from finding interlingual variations in the spelling of a word to relating a search query to an image. But what is potentially newer here, Hessel says, is the diversity of the tasks being tackled and the training method.
“We’ve seen evidence before that individual models can handle surprisingly diverse sets of inputs,” Hessel told TechCrunch via email. “I think the key question when it comes to learning to multitask is whether the tasks complement each other or not. You could imagine a duller case if the model implicitly separates the tasks before solving them, for example: ‘If I detect task A as input, I will use subnet A. If I detect task B instead, I will use another subnet B. For that null hypothesis, similar performance could be achieved by training A and B separately, which is disappointing. If training A and B together leads to improvements for either (or both!), then things get more exciting.”
For example, like all AI systems, Gato learned billions of words, images from real and simulated environments, button presses, joint pairings, and more in the form of tokens. These tokens served to represent data in a way that Gato could understand, allowing the system to — say — tease the mechanics of Breakout, or whatever combination of words in a sentence might make grammatical sense.
Gato doesn’t necessarily do these tasks good† For example, when you chat with a person, the system often responds with a superficially or factually incorrect answer (for example, “Marseille” in response to “What is the capital of France?”). When captioning photos, Gato is giving people the wrong gender. And the system only stacks blocks correctly 60% of the time with a real robot.
But on 450 of the 604 tasks listed above, DeepMind claims that Gato outperforms an expert more than half the time.
“If you think we need General [systems]what many people are in the field of AI and machine learning, then [Gato is] a big problem,” Matthew Guzdial, an assistant professor of computer science at the University of Alberta, told TechCrunch via email. “I think people who say it’s a big step toward AGI are overhyping it a bit because we still don’t have human intelligence and probably won’t get to that anytime soon (in my opinion). I’m personally more into it.” the camp of many small models [and systems] more useful, but there are certainly advantages to these generalized models in terms of their performance on tasks outside of their training data. †
Oddly enough, from an architectural standpoint, Gato isn’t dramatically different from many of the AI systems in production today. It shares features with OpenAI’s GPT-3 in that it is a ‘transformer’. Dating back to 2017, the Transformer has become the architecture of choice for complex reasoning tasks, demonstrating an aptitude for summarizing documents, generating music, classifying objects in images, and analyzing protein sequences.
Perhaps more remarkable, Gato is an order of magnitude smaller than single-task systems, including GPT-3, in terms of the number of parameters. Parameters are the parts of the system learned from training data and essentially determine the ability of the system to handle a problem, such as generating text. Gato has only 1.2 billion, while GPT-3 has more than 170 billion.
DeepMind researchers purposely kept Gato small so that the system could control a robotic arm in real time. But they hypothesize that Gato — if scaled up — could tackle any “task, behavior, and embodiment of interest.”
Assuming this turns out to be the case, several other hurdles would have to be overcome to make Gato superior to advanced single-task systems in specific tasks, such as Gato’s inability to learn continuously. Like most Transformer based systems, Gato’s knowledge of the world is based on the training data and remains static. If you ask Gato a date-sensitive question, such as the current US president, chances are he will respond incorrectly.
The Transformer – and Gato, by extension – has yet another limitation in its context window, or the amount of information the system can “remember” in the context of a given task. Even the best Transformer-based language models can’t write a long essay, let alone a book, without remembering important details and losing sight of the plot. Forgetting happens with any task, be it writing or controlling a robot, which is why some experts have called it the “achilles heel” of machine learning.
For these reasons and others, Mike Cook, a member of the research collective Knives & Paintbrushes, cautions against assuming that Gato is a path to truly universal AI.
“I think the result is somewhat prone to misinterpretation. It sounds exciting that the AI is able to do all these tasks that sound very different because to us it sounds like writing text is very different from controlling of a robot. But in reality, this is not much different from GPT-3 which understands the difference between plain English text and Python code,” Cook told TechCrunch via email. “Gato receives specific training data on these tasks, just like any other other AI of its kind, and learn how patterns in the data relate to each other, including learning to associate certain types of inputs with certain types of output. This isn’t to say it’s easy, but to the outsider, it might sound like the AI can also make a cup of tea or easily learn another ten or fifty other tasks, and he can’t.We know that with current approaches to large-scale modeling, several t aken can be learned at the same time. I think it’s a nice piece of work, but it doesn’t seem like a big stepping stone on the way to anything.”