You are here
Algorithms to Detect Hoaxes
Verifying a piece of information without context is somewhat of a gamble. According to a 2006 study conducted by Texas Christian University, we are not much better than chance at detecting deception.1 A decade on, social networks have exposed us to such a massive flow of data that it has become even more difficult to sort through it.
Numerous research teams have thus developed resources to identify hoaxes and other items of fake news, misleading information that can range from a simple joke to large-scale political manipulation. At the Irisa,2 Vincent Claveau, a CNRS researcher, and Ewa Kijak, a lecturer at the Université de Rennes 1, are working with doctoral student Cédric Maigrot on automating the hunt for falsified images and phoney stories.
"Unless they are looking at a basic montage, humans cannot detect alteration or reuse of a photograph," reckons Claveau. "Only information technology can do so." According to him, automation has two objectives: to process a mass of news that is unmanageable by humans, and to offer a vision that is specific to a machine, less biased than a human being. In fact, the researcher admits that "one is less likely to question a piece of information if it confirms their opinion."
The origin of information
Verification can be performed in three different ways. First, network analysis makes it possible to identify a message's trail. Does it come from a respected press agency, or a website that mass-produces phoney content? The research team also monitors sites that serve as sound boxes without necessarily producing content.
Yet some social networks do not always allow tracing the entire origin of a piece of information, primarily in order to keep their own algorithms secret. Yet researchers are undeterred. A team from the ISC-PI23 and CAMS,4 for example, has implemented the Politoscope project, which maps the diffusion of tweets. The system reveals the formation and evolution of political communities based on how Twitter accounts behave with regard to the content circulating, and even shows which group is the quickest to react and share each new message. The platform achieved this by scrutinizing more than 80 million tweets.
The next step, looking at readers' comments, provides clues about the validity of the publication.
Finally, the content itself is naturally also the subject of analysis, especially if it combines text and images. Was the photo altered or distorted? Is the message related to the image? A document's language level can also betray its origin, such as the presence of smileys, an overabundance of exclamation and question marks, an absence of quotes, an excess of phrases in the first and second persons, etc.
An algorithm can identify and isolate these elements, along with the names and dates that structure the information, some of which are directly present in keywords or hashtags.
A search engine to detect falsification
Irisa researchers especially focus on images. Aside from actual photomontages, it is possible to fool readers by using an authentic image with a modified caption. This was the case for old photographs of victims of a bombing, "recycled" to accuse an actor in an entirely different conflict.
While the general public can trace a photograph on Google Images, researchers have created their own search engine for images. "Google very poorly processes some of the simplest and quickest alterations, such as reversing left and right, changing hue, or cropping..." explains Claveau. "Irisa's search engine is much more robust, and less susceptible to these ruses."
It can scrutinize the elements of a photo, detect whether they come from different images, and were subsequently combined. "The same photo is used for every storm," the researcher cites as an example. "It shows a strip of flooded motorway with a swimming shark. We were able to separately find the real photos of the flood and the shark."
Technical details can also help identify certain photomontages. For instance, a dual compression in the file indicates that part of the photo comes from another image that was itself compressed. The search engine also analyzes the text that accompanies the image. Extraction of the most important keywords, such as place names and people, makes it possible to compare and detect signs of misuse or tampering.
(Artificial Intelligence algorithms can now sync videos to an audio source. The German artist Mario Klingemann was able to automatically create a sequence where the French singer Françoise Hardy gives a rendition of a controversial speech by Kellyanne Conway, advisor to US President Donald Trump. While this clip obviously looks fake, the rapid progress in the field promises to further blur the line between fact and fiction.)
Alerting rather than ruling
How should these advances be used in the fight against false information? Ideally, they should be inserted into the very architecture of social networks in order to spot hoaxes as early as possible, although this option is subject to the goodwill of companies. For example, depending on the country, Facebook has launched an option enabling users to transmit information that they deem suspicious to established media for verification. Yet there is a risk that the public will not denounce material that supports its opinions. In fact, a Yale study has just shown that this system had no positive effect against the spread of fake news.5 Irisa researchers are therefore leaning toward extensions that are integrated into web browsers.
"Plug-ins would skim through webpages, tweets, and Facebook posts and indicate anything that seems dubious," Claveau suggests. "The idea is to prioritize decision-making by the reader. The machine does not determine the truth, but instead provides leads."
The question of legitimacy indeed surfaces regularly, and some efforts have been criticized. When decoders from the French daily Le Monde launched Décodex, which color-codes informational websites based on their reliability, they received mixed reviews.
"An algorithm will, perhaps wrongfully, be seen as more impartial than a media outlet judging other medias outlet," adds Claveau. "Still, the Décodex team led a useful effort in helping us assess the quality of information."
Regardless, hoaxes follow certain cycles, and tend to increase after each major mediatized event. The crudest ones are easy to spot, but others raise more philosophical questions regarding the evaluation of truth. The purpose of algorithms is therefore to warn rather than judge, the latter being left to the reader.
- 1. Pers Soc Psychol Rev. 2006, vol. 10 (3) : 214-34.
- 2. Institut de recherche en informatique et systèmes aléatoires (CNRS / Université Rennes 1 / ENS Rennes / Insa Rennes / Université Bretagne Sud / Inria / CentraleSupélec / IMT Atlantique).
- 3. Institut des systèmes complexes de Paris Île-de-France (CNRS).
- 4. Centre d’analyse et de mathématiques sociales (CNRS / EHESS).
- 5. G. Pennycook et D. G. Rand, Assessing the Effect of « Disputed » Warnings and Source Salience on Perceptions of Fake News Accuracy, SSRN, online on 15/09/2017.
Share this article
A graduate from the School of Journalism in Lille, Martin Koppe has worked for a number of publications including Dossiers d’archéologie, Science et Vie Junior and La Recherche, as well the website Maxisciences.com. He also holds degrees in art history, archaeometry, and epistemology.