Reference genome and builds

Every variant is described relative to a reference genome, a standardised sequence. The two common versions, GRCh37 (hg19) and GRCh38 (hg38), number positions differently, so the same variant has different coordinates. Knowing the build is essential to read a position correctly.

REFERENCE GENOME · GRCh37 vs GRCh38 same variant rs429358 GRCh37 · hg19 chr19:45.411.941 GRCh38 · hg38 chr19:44.908.822 Liftover Same variant, two coordinates. The rsID stays the same; a number needs the build label.

Why a reference

For findings to be comparable, a shared coordinate system is needed: the reference genome. It is an assembled standard sequence, not the genome of a single person. Against it every position is named and every variant described as a deviation, with a reference allele and an alternative allele.

GRCh37 and GRCh38

The reference genome is improved over time. Two versions are common: the older GRCh37, also hg19, and the newer, more complete GRCh38, also hg38. Because sequence was inserted and corrected in between, the numbers shift: the same variant has a different coordinate in each build. Converting between them is called liftover.

Why rsIDs are more robust

A bare numeric coordinate is ambiguous without the build label and a common source of error. An rsID, by contrast, points to the variant itself, regardless of where a build places it. That is why this wiki names markers by their rsID. Anyone working with coordinates should always state the build alongside.

What Genome measures. Genome works in a defined build. A position like chr19:44,908,822 only makes sense together with its build. The rsIDs the wiki names are build-independent.

Related topics

Sources

  1. 1Church et al., 2011 Modernizing reference genome assemblies. PLoS Biology 9:e1001091. doi.org/10.1371/journal.pbio.1001091
  2. 2Schneider et al., 2017 Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Research 27:849–864. doi.org/10.1101/gr.213611.116