This Online AI Tool Takes Your Words And Turns Them Into Nightmares

By Rhett Jones on at

We’ve seen lots of machine learning systems create strange new phrases and dreamlike images after being trained on large amounts of data. But a new website lets you do the generating, and the results are just as bizarre as you’d expect:

Image: Hudson Hongo/T2i

The web applet, built by researcher Cristóbal Valenzuela, is based on a new paper from another team of researchers. Their machine learning algorithm is called AttnGAN, (Attentional Generative Adversarial Network). It’s meant to improve upon other text-to-image AI by refining images at the word level. For now, the results are closer to surrealist art:

Image: Hudson Hongo/T2i

Machine learning, as you probably know by now, is the process researchers use to train algorithms on large datasets, allowing them to solve complex problems like “what is this a picture of?” on their own. These algorithms can also do the opposite, creating new images out of words. The new paper explains that older text-to-image programs formed images using entire sentences, which wasn’t great. Their method instead creates a general image from the entire sentence, then refines the image using the sentence’s sub-parts.

The researchers trained the network on the COCO, or Common Objects in Context dataset. It’s a good reference source for images of common objects, like stop signs, animals, and... Modest Mouse lyrics.

Image: Ryan Mandelbaum/T2i

Image: Hudson Hongo/T2i

Image: Ryan Mandelbaum/T2i

Valenzuela’s tool excelled at creating the stuff of fever dreams in response to Gizmodo staffers’ twisted requests. Our own Hudson Hongo got especially good at getting the images he wanted.

Image: Kelly Bouret/T2i

Image: Hudson Hongo/T2i

Image: Hudson Hongo/T2i

Unsurprisingly, Janelle Shane’s AI Weirdness blog is where we found out about AttnGAN, so we asked her what it says about the current state of AI.

“This demo is a really interesting way of showing how much a state of the art image recognition algorithm understands about image and text,” she told Gizmodo. “What does it understand about what ‘dog’ means? Or ‘human?’” But she noticed that structure is difficult for these algorithms. “If it sees a human arm pointing toward it vs to the side, it looks really different in a 2D image.”

Shane also pointed out that the algorithm drew birds really well when it only needed to draw birds, but things got worse as more became expected of it—the version of AttnGAN on Valenzuela’s site tries to draw whatever a user types in. She compared it to self-driving cars, who have many more tasks they need to do and obstacles they need recognise.

Gizmodo reached out to the study’s first author, Ph.D student Tao Xu at Lehigh University, and will update the post when we hear back.

But please, have fun with this one and show us your worst in the comments.

Image: Jennings Brown/T2i

Image: Marina Galperina/T2i

As a final thought, these would make really good Dixit cards.

[arxiv/T2i via AI Weirdness]

Update: The study’s author Tao Xu, graduate student at Lehigh University, responded to Gizmodo’s request for email. She explained that it was a significant improvement over the best prior reported result:

Nowadays, with the recent advances in deep learning, the computer vision systems are so powerful, for example, they can diagnose diseases from medical images, identify humans and cars for autonomous driving. However, we still cannot conclude that those systems truly understand the visual world. Because if the machines have such “intelligence,” they should not only recognize images but also be able to generate them.

Our AttnGAN incorporates the Attention mechanism with Generative Adversarial Networks (GANs), which significantly boosts the text-to-image generation performance. As attention is a human concept, our AttnGAN learns such “intelligence” and is able to draw like humans, i.e., repeatedly refer to the text description and pay more attention to relevant words while drawing a certain region of the image.

Although AttnGAN greatly outperforms the state of the art for text-to-image synthesis, generating realistic images with objects from multiple categories is still an open problem in the community. And we would like to investigate more on this direction in the future.