Genome Diary

Whole genome sequencing data from 4 years deceased relative now in the public domain

Finally I have received the sequencing results from the whole genome of my Aunt, who passed away 5 years. Genomic DNA from her hair roots was purified, fragmented and its DNA library prepared and sequenced on a Illumina HiSeq with a sequencing configuration of 2×150 paired ends (PE). To our knowledge, this is the first time a Direct-to-consumer experiment is conducted to sequence the genome of a dead relative. The purpose of this study includes:

  • To diagnose my Aunt’s likely genome mutations that might have contributed to her demise, a melanoma which took her 4 years ago
  • To understand how the genome of the dead can inform those of the living

In terms of the consent to perform this experiment, she already performed a 23andMe test which she consented to put online and research has been performed on this dataset.

In addition, the sample from which the DNA was purified, 36 hairs from a comb of hers, was provided with the consent from her next of kin, my uncle, who gave the sample for her sequencing.

Following with the open access tradition of the experiments performed with the Corpasome, we now make all data publicly available under a CC0 public domain license. I have put as much data as possible in figshare under the DOI in the link below.

Follow this link to access the data:

Here I include only the files that I can upload to figshare, which is the BAM index file, the checksum file, a PDF report, a csv dataset containing the resequencing summary report and 3 VCF files: a standard VCF file, a GVCF and a VCF file for structural variations.

Due to the restrictions on the size of files I can upload to figshare I still have not been upload the fastq files and BAM file yet to a public host.

During the analysis, it was noticed a low mapping percentage to the human reference genome and a higher duplication rate. The low input/fragmented DNA is likely a contributing factor to this.

Looking at the report, we find that we get around 89M reads aligned, which is about 18.5% of all reads (read 1 and read 2). The mean the coverage is 1.2x. Figure 1 shows the distribution of the depth of coverage.

Screen Shot 2017-09-01 at 16.48.53

Figure 1: Distribution of the depth of coverage of the genome of my aunt, using Illumina Sequencing.

We find 1,542,437 Single Nucleotide Variations (SNVs), 50.43% of them found in dbSNP. We find 893 stop gains and 17 stop lost SNVs and a total of 3 copy number variants overlapping genes.

We realise the little amount of confidence that can be derived from this genomic data.

  • It has been 4 years since she passed away – the DNA degrades
  • Her hair roots from which we purified her DNA was dyed – we think this might have affected the DNA as well

We have however a few good extra sources of data that we will use to infer any potential clinical information from this personal genome. She passed away due to skin cancer, melanoma. We still have her 23andMe genotype data and the variants from exomes for 4 member relatives of her family, one of them her sister (my Mum).

Initially we intended to use my Aunt’s genome to inform the genomes of the living, but now, it is probably going to be the other way round: we are going to use the genomes of the living to inform my Aunt’s clinical genome interpretation. We hope that this data will allow us to infer some of the somatic mutations predisposing her to melanoma, hence offering the pioneering experience of providing a diagnosis for her disease 4 years after she passed away.


Samuel Benavides, Mingo Bioinformatics


Categories: Genome Diary

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s