Bioinformatics

Converting Genes and Genomic Features From NCBI36 to GRCh37


The Human Genome is a like map where features and genes are mapped to. As techniques improve, our fine-grained resolution for that map increases and new versions are released every few years. When a new coordinate reference map (or assembly) for the Human Genome is released, it produces lots of headaches for those who work in the field as it means that the locations of genes, chromosomal bands and other features like Single Nucleotide Polymorphisms (SNPs) or Copy Number Variation (CNVs) change.

In order to have the most up-to-date version for the Human Genome set of genes and features sometimes it is necessary to convert from one assembly to another. In the past I have written a tutorial on how to remap from NCBI36 to GRCh37 human assemblies using liftOver. In this tutorial I present a simple step-by-step guide for feature remapping using NCBI’s remapping tool.

Important:

Please make sure you know in advance the assembly to which your aberration data is currently mapped to. If by mistake you remap an aberration already in GRCh37 to GRCh37 you will get new coordinates for the region mapped to the wrong coordinates.

The NCBI provides a web facility to convert coordinates from one assembly into another. To convert coordinates using their genome remapping service do the following:

  1. Make sure that your data is in BED format,  e.g. “chr3            100000 999990 myId0000123” -> CNV aberration in NCBI36/hg18
  • Please note that each field is separated by a tab and each line by a character return. Please follow this strictly or the remapping tool may throw an error.
  • Add as many lines as aberrations you would like to remap
  1. Go to the NCBI Remap page:
  1. Select “Organism for source data” Homo Sapiens, “Source Assembly” NCBI36 (hg18) and “Target Assembly” GRCh37 (hg19)
  2. Please leave all “Remapping Options” (Minimum ratio of bases that must remap, etc) with default values
  3. Select for “Input format” BED, “Output format” Same as input
  4. Paste your aberration in the input box where it says “Paste data here” and hit submit at the bottom of the page
  5. Wait until results are returned
  6. To retrieve results download “Mapping Report”, which is in excel format or alternatively Mapping report Sample in the results page

Please note that your aberration may remap to more than one location. I recommend that you manually check the coordinates and select the most appropriate of the doubly remapped aberration in the new assembly. Please also note that your aberration may not remap because the region is partially or entirely deleted in the new assembly or split in GRCh37. In this case I recommend that you use another start or end point position, maybe use the start/end of alternative probes until you find a region where it maps.

Another possibility could be to look at the genes for the region in the old assembly and select a region in GRCh37 that includes the same genes as in NCBI36. Each of these solutions requires careful deliberation and may not be applicable to your particular case (e.g. genes in different chromosomes would not allow remapping based on genes).

Categories: Bioinformatics, Genomics, Tutorials

Tagged as: , , ,

3 replies »

  1. One note: Remap can take formats other than BED, including GFF, GFF3, GVF and plane locations (like chr1:1000-2000). But you are correct- we do validate the file format so random spaces and errors can cause a problem.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s