Millions of Genomes

This was that title of a talk recently given by Richard Durbin at the Wellcome Trust Sanger Institute. Excitement and expectation, reassured by a continuous trend of exponential growth, made inspired listeners feel the same way Google or Facebook employees must have felt at their company’s peak time.

Some numbers presented by Richard gave context to the startling prediction that by 2015 millions of individual genomes will be sequenced. This is in fact the expected number if the current pattern of growth continues. Ten years have now been celebrated after the draft for the first Human Genome was released in 2001. By 2006, with next generation sequencing in full swing and sequencing centers churning out many gigabases per week, tens of genomes had been sequenced. Today the number of individual genomes is in the order of thousands, meaning that every year a 4 fold growth is predicted. Extrapolating this estimation to five years from now makes thus the number of genomes sequenced 1024 times (45) our current number, hence millions of genomes.

Having such an incredible amount of data will clearly create challenges which we are just beginning to find. How are we going to hold all this data when processing capacity in computers “only” grows 2 fold every year? The answer is that as more genomes become available, an individual’s data will not be stored in its totality but only the differences that define his/her particular variations.

Although many genomes may have been sequenced by now, accessing them is not a trivial matter. Stored in many different places, with different restrictions and inconsistent levels of detail, the bulk of this data is likely to remain at least mildly challenging to handle.  Results of investigations will certainly be accessible, but think of the effort it could cost to access every single database containing public individual genome data. I do not believe that a great number of genomes will be optimally researched unless more straightforward and standardized access protocols are put in place, something that currently is lacking. Times for excitement are reasonably justified, yet base pair to bedside medicine may be delayed if current data sharing procedures are not streamlined.

