Making sense of science

Does Big Data Cause Pollution?

Does Big Data Cause Pollution?

Energy-hungry information technology has an environmental cost that is by no means virtual. The computer science specialists Jean-Marc Pierson and Laurent Lefèvre explain the nature of this pollution.

From commerce and energy to finance, health, transportation, culture, and even science, digital data is for many the oil of the twenty-first century. Each day the Big Data industry grows larger, already representing more than 4 million direct jobs worldwide! Yet a piece of essential information may have been overlooked in the process: everything has a cost. And that of Big Data is ecological. Behind this virtualized, distributed, and distant technology there is real infrastructure with a high-energy consumption and carbon footprint. To put it plainly, Big Data pollutes.

Colossal Infrastructure

To take things from the start, Big Data simultaneously designates the ability to produce or collect digital data, as well as store, analyze, and present it. It is very often defined by its "3V" characteristics (volume, velocity, variability): data comes in massive amounts, especially with the simultaneous emergence of the Internet of Things, and does so at unprecedented speed and in greater variety than in the past. In 2015, the global pool of data will reach 8 zettabytes (1021 bytes). Colossal infrastructure is therefore already necessary to store this avalanche of data, not to mention that needed to process it. Data analysis, resulting from environmental observations, scientific experiments or marketing data, requires very powerful calculation methods, concentrated in large centers and supercomputers.

Big Data, Datacenter, Stockage de données, Google
Google must keep a record of millions of web pages in its data center.
Big Data, Datacenter, Stockage de données, Google
Google must keep a record of millions of web pages in its data center.

Take for example Google, which indexes millions of documents to facilitate and accelerate search and retrieval. Its data is also analyzed to provide users with advertising content—an approach that forms the basis of its business model. Yet to perform this processing, it is estimated that Google has more than a million servers, like the other three Internet giants Amazon, Microsoft, and Facebook. Whether they own their infrastructure or lease it, even more modest companies and institutions often make use of thousands of interconnected pieces of equipment, from the capture of data to its analysis.

Containing a growing energy need 

Research studies, such as that conducted by the thinktank Écoinfo, created by the CNRS, emphasize how energy-hungry information technology can be, and how much greenhouse gas it releases through every stage of its life cycle, from the design and transportation of equipment to usage and end-of-life. In the usage phase, the essential elements of Big Data can be divided into three categories: terminal equipment, networks, and data centers that each consume similar amounts of electrical power, on the order of 40 gigawatts in 2013, which is equivalent of about forty nuclear units.1 This figure obviously has repercussions on the climate, even if this carbon footprint depends on the energy mix of the user country (34 grams of CO2 per kwh in France in February).

Information technology produces greenhouse gases at every step of their life cycle.

So will the (provocative) scenario of the company Cisco, in which only certain machines will have the right to communicate (those with an even IP address for example) ever become reality, like road space rationing based on car number plates? Fortunately a multitude of responses and alternatives are being put in place. Codes of conduct have been proposed to industrial actors and web hosting service providers to improve their infrastructure for large-scale calculation, communication and storage. Equipment design and its recycling potential make use of innovation. Research on energy efficiency has helped find new ways of limiting the environmental impact of information technology. All these initiatives are essential to contain the growing energy needs of IT infrastructure, while guaranteeing good service quality for users.

Reducing other sources of pollution?

If Big Data is itself polluting, does it for that matter limit other sources of pollution? A number of examples seem to support this hypothesis. For instance, analysis of massive amounts of data makes it possible to optimize industrial processes, and therefore reduce associated polluting emissions. Similarly, farmers can receive real-time information about their crops from sensors and satellites, in order to use just the necessary quantity of water and right amount of pesticides.

Big Data analysis can help reduce industrial emissions.

Gathering data and making it available can also lead to changes in behavior. In Portland (US), a community of citizens deploys sensors measuring the air quality in their neighborhood. The results, whose precision and quality exceed those of the Environmental Protection Agency, are analyzed and presented on a website. This open data will encourage new uses and behaviors that are less reliant on cars during pollution peaks. A last example comes from China, where the city of Beijing and IBM have teamed up to launch a program aimed at reducing urban pollution—whose extent is well known. To achieve this, information is collected by a number of sensors, coupled with meteorological data from satellites, and analyzed by large-scale artificial intelligence systems. The ultimate goal is to produce a pollution prediction map 72 hours in advance.

Big Data is the modern gold rush. Like its glorious forerunner, it generates great expectations—founded or not—and raises important issues, while allowing the development of new territory. It is up to society to ensure that the environmental cost of these technologies is offset—at least in part—by progress in the fight against global warming and pollution.

The analysis, views and opinions expressed in this section are those of the authors and do not necessarily reflect the position or policies of the CNRS.

Footnotes
  • 1. A nuclear unit comprises a nuclear reactor and the associated electricity generation system, primarily a turbine and generator.