How Microsoft is Using AI to Tackle Fake News

By James O Malley on at

Imagine someone telling you in the late 90s that by 2018, Microsoft would be a plucky underdog. You’d be forgiven for thinking that they are crazy. And that’s before they tell you who the President is.

Perhaps the category where this is most clear is in search, which has one company that is so dominant the brand has committed “genericide”, and we now talk about “googling” things as a general term of searching the internet.

But Microsoft still isn’t giving up. Eight years ago, it launched Bing - and the company is still attempting to go toe-to-toe with Google, investing in some super-smart artificial intelligence technology to take on its rival - as well as the challenge of fake news.

“We’ve been using deep learning in a more intense way in our product and so we are able to develop some interesting features and scenarios”, says Jordi Ribas, Microsoft’s Corporate Vice President in charge of Bing.

To the user, searching looks the same to how it always does: A search box and a “Go” button - so what’s new? How would a search today differ, say, from a search made five years ago?

“I think the main difference is that the algorithms that we use, use much longer models”, says Jordi.

“We’ve always been using AI and neural networks for search but at this point we’re able to use a higher computation and higher models. You might be familiar with the work that we’ve done with Intel where we leverage FPGAs [ie: special chips] which are integrated in our Data Centre machines, and what those FBGAs allow us to do for deep learning is we are able to have very, very long models that in essence allow us to come up with more relevant results, again leveraging machine reading comprehension and some more advanced deep learning technology.”

But what does this deep learning mean in practice? “Leveraging machine reading comprehension, we can come up with this what we call intelligent answers, which basically for a given question we try to come up with a single best answer, or sometimes multiple best answer, or multi-answer.”

Fighting Fake News

Fake news is clearly on Jordi’s mind. One of Bing’s most recent features is called “perspectives”, and works similarly to how Google will often just show the answer you’re looking for in a box at the top of the results - no need to click. The difference though is that if answer is deemed to have multiple perspectives, the box won’t be definitive, but will offer differing opinions.

“What we find for certain questions is there are different clusters of documents after we apply the sentiment analysis”, Jordi says, giving the example of a search for “Is coffee good for you?”

This is all driven by algorithms acting automatically, Jordi says, though there can be a human quality control element too. But the core is deep-learning models applying sentiment analysis to passages on the web, to give Bing a better semantic understanding of what the text is saying.

“Automatically the machine learning algorithm finds two sets of clusters of documents, some that are positive [and] some that are negative. And that basically tells our algorithm that ‘hey, you know this is a multi-perspective answer where we should have alternative results for both perspectives’. And then we in Bing feel like we should have responsibility to provide this more objective result”.

While this might be fine for something like “is coffee good for you?”, it is easy to imagine how such a system could be conceivably be manipulated: you only have to look at how, say, climate change deniers, rather than insist the pro-climate change opinions be dismissed, often urge the need for “debate” to legitimate their position. By creating a false equivalence, a desire for balance can actually make things worse. So does Bing wait by authority, I wondered? So that it knows to, say, trust the BBC more than InfoWars?

Luckily, Jordi has thought about this too.

“There’s some common features that we can extract that tell us whether the BBC is more authoritative”, he says. “A lot of it has to do with the quality of the content, also the way the links across the web refer to each other - like typically authoritative sources refer to other authoritative sources and vice versa.”

“And so we do leverage all those signals. And so when we determine whether to offer multi-perspective or not, the authoritativeness of the documents make a big difference. And so if we find different classes of documents but [only one side] has authoritative documents and the other does not, then we will only show one result for that answer.”

One example of this is indeed climate change - ask about that, and brilliantly, Bing will only show you the science.

“In the beginning the search engines were not very good at distinguishing what were some of the good documents versus some of these bad ones that were trying to game the system”, Jordi says. “So we need to do the same thing with the information on the web to make sure that the more credible and the more objective and more reputable sources really come at the top.”

So to what extent does Microsoft have an editorial responsibility over search results? Should it be acting like a newspaper editor - having a responsibility to choose what people see and don’t see?

“I wouldn’t say editorial but definitely I would say that search engines have a responsibility to provide as comprehensive and as objective results as possible. In the past some of our competitors basically blamed the algorithm and would say, ‘Hey, you know it’s data on the web,’ but I think we need to do better.”

“I think our mission should be to provide trustworthy results and it might take extra effort, it might take deeper models, might take more of the sentiment analysis that I was telling you about. Determining this authoritativeness signal better. A combination of all this is what we’re working on and I think ultimately we should have that – not only aspiration – but that responsibility to provide, again, this comprehensive and objective results. Because again in this world of fake news and people trying to game the public, I think objectivity in search couldn’t be more important.”

The G-Word

It was at this point in the interview - getting towards the end - when I realised that I’d managed to get this far without uttering the name of Bing’s major competitor - but inevitably - I had to now drop a G-bomb, and mention Google.

Why? Because isn’t machine learning and big data reliant on who has the largest dataset? Google has many times the amount of raw data being fed into it from users, and many times the amount of feedback from users clicking on things - so surely Google must, by the logic of AI training, be better placed to better understand what users are looking for, right? How can Bing ever hope to compete?

“Definitely the more data the better”, says Jordi, but he says, “the question is how much of that do we need to be competitive?”

To be fair to Bing, as Jordi points out, it isn’t being completely crowded out by Google. He points out that Bing accounts for 23.8% of searches on PCs - actually up from the 4% share it had when it launched. And this still translates to millions of queries every day.

“So we’ve got to a point where we have enough data, especially in [the PC market], where we can really get very strong signal - and the product has been improving”, Jordi argues. “More people have been coming to the product and so I think that as much as more data is better, we have been able to make very good use of the data that we have to continue to grow, and also we have been able to push the limits on making sure that the algorithms are as dense as possible to again provide as relevant as possible results.”

“I think the main thing if you like for what we’re trying to do is by using this higher level intelligence in our algorithm, try to come up with results that are ultimately more comprehensive and more objective, and especially the work that we’re doing in intelligent answers in this world of fake news and misinformation on the web, we’re trying to come up with results that are more objective and if anything, I think objectivity in search couldn’t be more important today.”