These days, thinking of ways to communicate with your friends and family is just too difficult. Thanks to Google, software will now do that for us.
Yesterday, Google unveiled Allo, a new Messaging app that comes with Google Assistant built in. It’s got some really cool features, but it’s unclear if Allo will be able to surpass market leaders like WhatsApp or iMessage.
There is one feature that really creeps me out, though: Photo Reply. The premise of Photo Reply is that software will analyse photos sent to you and auto-generate appropriate replies, letting a computer do the thinking for you.
There’s no doubt that this technology is impressive. The fact that an app can figure out wether I’m looking at a picture of a baby or a pile of shit — and then almost instantly generate a response — is crazy.
But it feels weird. Photo Reply removes the most human element of a conversation, letting an algorithm emote for you. And what about the person sending the picture? How will they know if the reply they received was a genuine, heartfelt, typed-out reply and not just what an app suggested?
The developers behind this feature wrote a blog post explaining their rationale for the new feature yesterday, and it doesn’t do much to make the feature seem less robotic:
We utilize Google’s image recognition technology, developed by our Machine Perception team, to associate images with semantic entities — people, animals, cars, etc. We then apply a machine learned model that maps those recognized entities to actual natural language responses. Our system produces replies for thousands of entity types that are drawn from a taxonomy that is a subset of Google’s Knowledge Graph and may be at different granularity levels. For example, when you receive a photo of a dog, the system may detect that the dog is actually a labrador and suggest “Love that lab!”. Or given a photo of a pasta dish, it may detect the type of pasta (“Yum linguine!”) and even the cuisine (“I love Italian food!”).
So, this raises a few questions. Are there pictures that Google won’t auto generate a response for? Can it tell the difference between a wedding and a funeral and then generate an appropriate response?
At runtime, Photo Reply recognizes entities in the shared photo and triggers responses for the entities. The model that maps entities to natural language responses is learned offline using Expander, which is a large-scale graph-based semi-supervised learning platform at Google. We built a massive a graph where nodes correspond to photos, semantic entities, and textual responses. Edges in the graph indicate when an entity was recognized for a photo, when a specific response was given for a photo, and visual similarities between photos. Some of the nodes are “labeled” and we learn associations for the unlabeled nodes by propagating label information across the graph.
This is some interesting technology, but it’s a little bit too much for me. It’s probably a sign of the times that software that instantly processes and forms emotions and reactions for us is celebrated as a cool future. But remember: this software is also invading some of the most intimate and personal aspects of our lives. [Google Research Blog]