Called SAVAGE or Strain Aware Viral Genome Assembly, the software, developed by scientists from the Montpellier Laboratory of Informatics, Robotics and Microelectronics (LIRMM) and Centrum voor Wiskunde en Informatica (CWI) in Amsterdam, could become a useful tool in developing efficient medical responses to contain epidemics. In an exclusive interview with RFI’s Dhananjay Khadilkar, Eric Rivals, a scientist from the National Center for Scientific Research (CNRS) and director of the Computational Biology Institute in Montpellier, explains the working and use of this software. Excerpts:
Q1. How does SAVAGE work?
Eric Rivals: SAVAGE is a bioinformatics software that performs genome assembly – it reconstructs the target genome sequences. It takes the output of sequencing machines which sequence small fragments of DNA molecules. This output is made of tens of millions of short sequences. Each short sequence generally covers 300 base pairs of DNA molecules. From this input, using a special algorithm, SAVAGE computes the genome sequences of individual variants of a virus present in the sample.
Q2. Why is it important to obtain genome sequences of individual variants?
ER: When a virus infects a host, usually the host tries to fight the infection with its immune system. In turn, the genome of the virus undergoes mutation after which several variants appear within the host. Each of these variants differ by their genome sequences. If you want to understand how the virus reacts to the immune system, then you have to detect those variants and their genomes.
It’s important to know at which positions of the genome, those variants differ from each other. These mutations in genome may create protein variants that change the virus and change its ability to escape the immune system. And if a virus is sensitive to a drug, then a mutation may make it resist the same drug. During treatment, it’s important to look at new variants of the virus because one of those may acquire mutations that let the virus survive in the host despite the drug treatment. These variants will either contaminate other hosts or it may make the illness prolong in the same host.
Q3. How different is SAVAGE from other genome assembly software tools?
ER: Some of the other products aim at assembling one genome sequence for all the variants. So the genome sequence they reconstruct is a consensus of the sequences of the variants that are in the host. As a result, you don’t know exactly which nucleotide is belonging to which variant.
We don’t know the number of variants that are in the sample. It could be 3 or 15. We also don’t know by how much their sequence differs from one another. It may be one nucleotide every 1000 or one nucleotide every 10,000.
SAVAGE aims at computing individual sequences for all the variants. With it, you can obtain genome sequences of 10 variants, recover the precise sequence of each variant and not merged sequences of two different variants. It is important because if you take several samples at different times during the outbreak, you can find out which of those variants is adapting to the host and which will become the prominent strain during the outbreak. The most dangerous variant will appear and propagate as the one in majority in infected people. If you look at different samples at different times, you understand how these strains evolve.
Q4. Why is it important to sequence all the variants, even those that are in minority?
ER: When a virus infects a human host, but comes from, say, a chimp or a rat or other distant hosts, the variant is adapted to rat biology and not to human biology. So, when it infects a new host for the first time, it’s not very effective. But as time goes by, the infection evolves and new variants appear that are more adapted to humans and not to rats. You want to know about it at an early stage because these variants become more dangerous for a human host. At the beginning, the variant appears in minority. But then, this minority variant fights better against the human immune system and becomes the dominant variant. That’s why looking at different samples in the same host after the first infection is really important.
Q5. Can SAVAGE be used to contain epidemics?
ER: Quick sequencing of different variants of viruses and bacteria is especially useful during an epidemic. That’s because you want to know how the variant of the virus of the outbreak differs from the earlier ones in order to understand why the epidemic is going so fast and why the host cannot resist this new variant. This enables the development of effective treatment against an epidemic.
Though it hasn’t been deployed during an outbreak yet, we have tested it on real HIV, Ebola, Zika and Hepatitis C virus data. Comparing SAVAGE's results to those of some existing solutions, we show that it recovers variant genomes with much higher precision and reliability. We are confident that it can prove useful during an epidemic or in other infectology studies, and generally to understand how viruses evolve.