When computer scientists at Microsoft started to experiment with a new artificial intelligence system last year, they asked it to solve a puzzle that should have required an intuitive understanding of the physical world.
"Here we have a book, nine eggs, a laptop, a bottle, and a nail," they asked. "Please tell me how to stack them onto each other in a stable manner."
The researchers were startled by the ingenuity of the AI system's answer. Put the eggs on the book, it said. Arrange the eggs in three rows with space between them. Make sure you don't crack them.
"Place the laptop on top of the eggs, with the screen facing down and the keyboard facing up," it wrote. "The laptop will fit snugly within the boundaries of the book and the eggs, and its flat and rigid surface will provide a stable platform for the next layer."
The clever suggestion made the researchers wonder whether they were witnessing a new kind of intelligence. In March, they published a 155-page research paper arguing that the system was a step toward artificial general intelligence, or AGI, which is shorthand for a machine that can do anything the human brain can do.
Microsoft, the first major tech company to release a paper making such a bold claim, stirred one of the tech world's testiest debates: Is the industry building something akin to human intelligence? Or are some of the industry's brightest minds letting their imaginations get the best of them?
"I started off being very skeptical – and that evolved into a sense of frustration, annoyance, maybe even fear," said Peter Lee, who leads research at Microsoft. "You think: Where the heck is this coming from?"
Microsoft's research paper, "Sparks of Artificial General Intelligence," goes to the heart of what technologists have been working toward – and fearing – for decades. If they build a machine that works like the human brain or even better, it could change the world. But it could also be dangerous.
And it could also be nonsense.
Making AGI claims can be a reputation killer for computer scientists. What one researcher believes is a sign of intelligence can easily be explained away by another, and the debate often sounds more appropriate to a philosophy club than a computer lab.
But some believe the industry has in the past year or so inched toward something that can't be explained away: a new AI system that is coming up with humanlike answers and ideas that weren't programmed into it.
Microsoft has reorganized parts of its research labs to include multiple groups dedicated to exploring the idea. One will be run by Sebastien Bubeck, who was the lead author on the Microsoft AGI paper.
About five years ago, companies like Google, Microsoft and Open AI began building large language models, or LLMs. Those systems often spend months analyzing vast amounts of digital text, including books, Wikipedia articles and chat logs. By pinpointing patterns in that text, they learned to generate text of their own, including term papers, poetry and computer code. They can even carry on a conversation.
The technology the Microsoft researchers were working with, Open AI's GPT-4, is considered the most powerful of those systems. Microsoft is a close partner of Open AI and has invested $13 billion in the San Francisco company.
The researchers included Bubeck, a 38-year-old French expatriate and former Princeton University professor. One of the first things he and his colleagues did was ask GPT-4 to write a mathematical proof showing that there were infinite prime numbers and do it in a way that rhymed.
The technology's poetic proof was so impressive – both mathematically and linguistically – that he found it hard to understand what he was chatting with.
"At that point, I was like: What is going on?" he said in March during a seminar at the Massachusetts Institute of Technology.
For several months, he and his colleagues documented complex behavior exhibited by the system and believed it demonstrated a "deep and flexible understanding" of human concepts and skills.
When people use GPT-4, they are "amazed at its ability to generate text," Lee said. "But it turns out to be way better at analyzing and synthesizing and evaluating and judging text than generating it."
source