Photo Lee Unkrich, one of Pixar’s foremost animators, as a seventh grader. He stares at an image of a train engine on the screen of his school’s first computer. Wow, he thinks. However, some of the magic wears off when Lee discovers that the image didn’t appear simply by asking for “a picture of a train.” Instead, it had to be painstakingly coded and rendered – by hard-working people.
Now imagine Lee stumbles upon DALL-E 43 years later, an artificial intelligence that generates original works of art from human-supplied clues that could literally be as simple as “a picture of a train.” As he types words to create image after image, the Wow is back. Only this time it won’t go away. “It feels like a miracle,” he says say. “When the results came out, my breath was taken away and tears welled up in my eyes. It’s that magical.”
Our machines have crossed a threshold. All our lives we’ve been reassured that computers weren’t capable of being truly creative. But suddenly, millions of people are now using a new breed of AIs to generate stunning, never-before-seen photos. Most of these users aren’t professional artists, like Lee Unkrich, and that’s the point: they don’t have to be. Not everyone can write, direct and edit an Oscar winner Toy Story 3 or Cocobut everyone can launch an AI image generator and type in an idea. What appears on screen is stunningly realistic and detailed. So the universal response: Wow. On four services alone—Midjourney, Stable Diffusion, Artbreeder, and DALL-E—people working with AIs are now collectively creating more than 20 million images a day. Brush in hand, artificial intelligence has become a wow engine.
Because these surprising AIs have learned their art from billions of photos taken by humans, their output hovers around what we expect photos to look like. But being an alien AI, fundamentally mysterious even to their creators, they restructure the new photos in a way no human is likely to imagine, filling in details most of us wouldn’t have the artistry to imagine , let alone execute the skills. They can also be instructed to generate more variations of something we like in seconds in whatever style we like. In the end, this is their greatest advantage: they can create new things that are recognizable and understandable, but at the same time completely unexpected.
In fact, these new AI-generated images are so unexpected that – in the quiet awe that immediately follows the Wow– another thought occurs to just about everyone who has encountered them: Man-made art must now be over. Who can compete with the speed, cheapness, scale and, yes, wild creativity of these machines? Is art yet another human endeavor that we have to surrender to robots? And the next obvious question: If computers can be creative, what else can they do that we were told they couldn’t?
I’ve spent the past six months using AIs to create thousands of eye-catching images, often losing a night’s sleep in the endless search for Another one beauty hidden in the code. And after interviewing the creators, power users, and other early adopters of these generators, I can make a very clear prediction: Generative AI will change the way we design just about everything. Oh, and no human artist will lose their job because of this new technology.
It is no exaggeration to mention images generated using AI co-creations. The sobering secret of this new power is that its best uses come not from typing a single prompt, but from very long conversations between man and machine. Progress for each image comes from many, many iterations, back and forth, detours, and hours, sometimes days, of teamwork – all thanks to years of advances in machine learning.
AI image generators were born from the marriage of two separate technologies. One was a historical lineage of deep learning neural networks that could generate coherent, realistic images, and the other was a natural language model that could serve as an interface to the image engine. The two were combined into a language-controlled image generator. Researchers searched the Internet for any images with adjacent text, such as captions, and used billions of these examples to match visual shapes to words and words to shapes. With this new combination, human users could enter a series of words – the prompt – that described the image they were looking for, and the prompt would generate an image based on those words.