You are here
“On average, performance doubles every three years.” That is how Patrick Le Callet, who is a professor at the Université de Nantes and a researcher at the Laboratory of Digital Sciences of Nantes (LS2N),1 summarizes the evolution of video streaming technology. Increasing performance by a hundred percent makes it possible to double or even quadruple the number of points of an image at the same data rate, or to divide the rate by two for the same resolution. This is why it is now possible to watch videos on a smartphone with a quality that compares very favorably to that of old DVDs.
Researchers in Nantes working with Netflix
French teams have acquired globally recognized know-how in this field, and that is why the video-on-demand streaming giant Netflix—with more than one hundred million subscribers worldwide—approached LS2N two years ago. The laboratory has developed skills in three areas, which is fairly rare: the development of methods—or algorithms—for compression, automatic prediction of the video quality for the audience, and the creation and standardization of protocols for evaluating subjective video quality, which are applicable to all image compression technologies. During the Mobile World Congress in Barcelona (Spain) last March, Netflix presented a new tool for video coding produced in collaboration with LS2N, one that offers high quality images at a rate of 100 kilobits per second2—40 times less than High Definition (HD) television—and is compatible with mobile telephone networks. Beyond this contractual collaboration, Netflix is also a patron of LS2N, a first for the industrial actor outside the US. It has provided financing for research, at the sole condition that the results be released under a "free license" in open access.
1001 ways to encode images
Compression is necessary because an HD camera’s raw images constitute a stream with a rate of a few hundred megabits per second, which is incompatible with the capacity of storage media as well as the bandwidth of both landline and mobile telecommunication networks. Compression involves eliminating as much unnecessary or redundant data as possible, while limiting image distortion. This process requires such heavy calculations, that the quality on a live televised video stream will never be the same as for an on-demand film, where the distributor had plenty of time to compress! Of course, the chosen method must conform to industry standards in order to ensure that devices are able to decode the videos. Over the last twenty years, coding algorithms have enabled image size to increase sharply, with content producers now starting to broadcast in "4K," which is approximately 4,000 pixels wide, and includes 8 million points—or pixels—per image. This is a long way from the 400,000 pixels in the MPEG-2 from our old DVDs, even though the data rates are very similar, on the order of 5 Mbit/s!
“The most recent HEVC (High Efficiency Video Coding) standard is like a toolbox,” explains Le Callet. Engineers and researchers have complete freedom for coding, as long as their video streams ultimately conform to the standard. As a result, everyone is developing their own formula. “There are millions of ways to proceed for each image or fragment of an image, with each one impacting the quality of that image as well as the ones that follow.” Furthermore, our processing must be able to distinguish between a static or moving scene: “When the camera is not moving, the texture of the grass in a stadium is visible. But when the camera moves, if we’re not careful the grass will suddenly look like a billiard cloth with no relief!” The encoding method therefore has to constantly adapt—and must do so automatically—as it processes the images, while predicting the impact a coding choice will have on the viewing quality of the ensuing images. “Optimization is an art that requires finding, at each and every moment, striking the right balance between the information rate and perceptual image distortion.”
Relying on visual experience
LS2N has turned to sample groups of users in its efforts to model image perception. Such tests feed into databases that can test and validate compression algorithms. “We are also studying aspects related to artistic intention, for instance to prevent compression from betraying the emotion that the images were meant to convey. To do so, we use an instrument known as an oculomotor to follow the viewer’s gaze, in an effort to improve our understanding of the quality of experience, a research area with a bright future.” The laboratory is also studying, among other things, a technique for increasing the contrast of images known as HDR. “We’ve observed, for instance, that depending on HDR rendering, certain screens can make some details more prominent, or on the contrary reduce their perception.”
HDR (High Dynamic Range) is, along with compression, one of the specializations of the Laboratory of Signals and Systems (L2S) located in Gif-sur-Yvette (Paris, France), where Frédéric Dufaux works as a CNRS senior researcher.3 This technique, which is also used in photography, uses more bits to code each pixel in order to show more detail in the dark and light areas of an image. “The challenge for television and video on-demand is to combine this improvement with the broadcasting of 4K images, without slowing the rate,” Dufaux explains.4 “We based ourselves on the characteristics of human vision, and therefore devoted more information to what the eye perceives better, and less to what it sees less. We also took different situations into consideration since light conditions can change rapidly, for example when a camera emerges from a building into broad daylight.” L2S has therefore developed an algorithm that combines HDR and HEVC. “It offers the same performance as conventional methods in non-complex lighting conditions, and provides better results for images with high contrast changing over time.” The laboratory is also studying the coding of 3D images, which requires specific compression methods as well.
Applications from the defense industry to healthcare
Compression is not limited to the broadcasting of television and on-demand video streams. “It is also widely used in video surveillance, defense, and medicine, and plays an increasing role in the automotive industry with reversing cameras, collision alerts and lane departures, as well as for the detection of road signs,” Dufaux points out. Each application requires specific compression methods, as a television viewer will not accept distortion, but a member of the military analyzing a surveillance drone image will adapt, as long as it allows him or her to make the right decision. “In medical imagery, doctors did not want to hear of image compression for fear of losing details important for establishing a diagnosis. Yet with progress, compression has become widely accepted!” Just as it has in our devices, for any smartphone today can do a much better job in this area than the large computers from the end of the last century!
- 1. CNRS / Université de Nantes / Ecole Centrale de Nantes / Institut Mines Telecom Atlantique / Inria.
- 2. The bit is the basic unit of information. The size of an image is the product of the number of points it contains (approximately 2 million points) and the number of bits used to represent each of these points.
- 3. CNRS / CentraleSupélec / Université Paris Sud.
- 4. Co-author, with Patrick Le Callet, of “High Dynamic Range Vidéo” published by Elsevier in 2016.