You may have read about the release of my raw personal exome data in a previous entry. Although users were not required to report back any finding derived from this data, my hope was that some of them would return with interesting results. The response to this call has been overwhelmingly positive and in less than a week Oxford Gene Technology (OGT) has kindly provided me with a report to facilitate the analysis of my personal exome. OGT’s donation has allowed the start of the “My Personal Exome Analysis” series in this blog. In Part I, I will be sharing some data and preliminary metrics gathered from OGT’s exome analysis services. I will continue to report further findings and insights as I keep exploring my personal exome at the deepest level that technology (and budget) currently allows.
In addition, I release under a CC0 license the following sequence-derived data from OGT’s services: a) the aligned and processed BAM file, b) the BAM file index and c) the compressed VCF file. The BAM file (.bam) is the binary version of a tab-delimited text file that contains sequence alignment data. The BAM file index (.bai) provides fast random access to the BAM file. The compressed VCF file (.vcf.gz) describes variant calls in text format. These format types are industry standard and can be used in a variety of research contexts involving genome visualization and analysis.
Looking at the summary metrics in OGT’s report, my personal exome produces:
- 30,702 variations to the reference genome (GRCh37)
- 5,565 non synonymous coding variations with consequences
- A minimum of 61.42% of the on-target regions, covered with a depth of at least 20x (remember that this data was sequenced by the BGI).
- A total of 2.54 Gigabases of sequence data read and aligned at high quality.
Figure 1 is a screenshot of the OGT report showing the summary of all variants identified, including those in dbSNP release 132.
Figure 2 summarizes all novel variants identified by OGT, filtering those in dbSNP release 132.
Download of BAM and VCF Files
You are allowed to use my personal exome’s BAM and VCF files under a completely free license CC0. You can add this data to any database or resource with no need for attribution. Any usage or finding derived from this data communicated back to me will be shared (if considered noteworthy) through this blog or publication, with due attribution or request for coathorship in papers.