You are here
Untangling the Jungle of Viruses
RNA viruses sometimes prove to be redoubtable infectious agents, as testified by the major Ebola epidemic that affected West Africa between December 2013 and March 2016. During that period, the virus caused more than 11,000 deaths among the 28,000 people infected. Although the reasons for such carnage remain complex, the impossibility of defining the characteristics of the viral strain that caused the epidemic partly explains its exceptional severity: “When a virus such as Ebola, HIV or Zika infects humans, its RNA undergoes a succession of mutations in order to escape the immune response of the host. These mutations give rise to a multitude of genomic variants that are very similar to one another, qualified as quasi-species," explains Éric Rivals, a bioinformatics scientist at the Montpellier Laboratory of Informatics, Robotics and Microelectronics (LIRMM)1 and Director of the Computational Biology Institute (IBC).2
This adaptive strategy seriously hampers the task of the scientists who are trying to sequence viral genomes. This operation indeed generates millions of fragments of a hundred nucleotides, which can then be assembled using complex computer calculations to reconstruct the whole genome.
“These bioinformatics techniques are extremely efficient for reconstructing the single genome of a plant or animal” explains Éric Rivals. “Yet faced with a virus displaying a range of genomic variants, they prove ineffective as they tend to bring together RNA fragments from different quasi-species.”
Scientists at the LIRMM and Centrum voor Wiskunde en Informatica (CWI) in Amsterdam (the Netherlands) have succeeded in assembling the genomes of these viral quasi-species in order to distinguish them.3 To achieve this, they combined an indexing method capable of identifying reliable overlaps between fragments of viral nucleotides—and an algorithm that enables gradual and simultaneous reconstruction of the genomes of quasi-species. Called SAVAGE, the software is intended for use following the sequencing phase that consists in breaking the genome into several pieces of RNA. Its principal mission is to ensure that there are no errors during the next step, designed to place the fragments of the same viral quasi-species end-to-end in order to reconstruct the genome. “By applying overlaps that are three times larger than those achieved with standard assembly methods between these sequences, SAVAGE should make it possible to differentiate genetically similar viral sub-species while precisely estimating their respective proportions within the same viral population,” explains the researcher. But it was still necessary to measure the efficiency of this new assembly strategy on real viral genomes.
As a first step, the scientists modeled the genomic structure of viral populations (hepatitis C, HIV, Zika, Ebola) that each contained around ten quasi-species. Having simulated the breakdown of these different variants, they used the SAVAGE software and succeeded in reconstructing 90% of the genome of all the quasi-species, in each of the viral species tested.
In order to confirm the excellent performance of their technique, the scientists then compared their findings with real data. To do so, they used a set of existing genome sequences obtained by inoculating five HIV quasi-species into cells in culture; each was also introduced at different proportions. Based on the viral RNA extracted from these cultures and previously sequenced, the team then sought to determine the genome of all the quasi-species inoculated using different assembly techniques, including SAVAGE.
They found that only this new software proved capable of establishing both the genomic diversity of the HIV population and the respective proportions of the five variants in the cell culture. “All these tests confirmed the ability of our software to efficiently manage this viral complexity in the absence of a reference genome, which is very often the case during an emerging epidemic,” emphasizes Rivals. Without reliable genetic information on the nature of the enemy to be controlled, viral infections—such as the Ebola outbreak of 2014 or the Zika epidemic in Brazil in 2015—spread rapidly and cannot be contained. Yet by combining this new and promising assembly software with the high-throughput sequencing of viral genomes, it now seems possible to develop therapies that target the main viral sub-species causing these outbreaks.
- 1. CNRS / Université de Montpellier.
- 2. (CNRS / Université de Montpellier / Cirad / Inria / IRD / Inra / SupAgro Montpellier / Inserm). The IBC in Montpellier develops innovative methods and software to analyze, integrate and contextualize large-scale biological data for use in the fields of health, agriculture and the environment.
- 3. Jasmijn A. Baaijens et al.,“De novo assembly of viral quasi-species using overlap graphs”, Genome Research, 2017. 27: 1-14.
Share this article
After first studying biology, Grégory Fléchet graduated with a master of science journalism. His areas of interest include ecology, the environment and health. From Saint-Etienne, he moved to Paris in 2007, where he now works as a freelance journalist.