rsIDs, microarray and SNP lookup

rsID

A SNP is a single position in the genome where people differ by one letter. An rsID is the stable address of that position. Consumer tests such as 23andMe measure many thousands of such positions; Genome reads the same places from full sequence data and looks up individual rsIDs. This page explains how that works and how to read the result.

ANATOMY OF AN rsID rs429358 · one genome position AGCTACGATCG 5′ 3′ reference allele C · alternative allele T You inherit two copies, the genotype names both: C C C / C homozygous reference C T C / T heterozygous T T T / T homozygous variant Which effect an allele has depends on the gene; strand orientation can vary by provider.

What an rsID is

rs stands for Reference SNP. The number is an entry in the NCBI dbSNP database that records a precise position and the variants known there. The advantage: while coordinates shift between reference builds (GRCh37, GRCh38), the rsID stays the same. This lets a variant be named unambiguously and found again in the literature.

Reference, alternative and effect allele

The reference allele is the letter present in the reference genome; the alternative allele is the differing variant. A third role matters: the effect allele is the one for which a study reports an effect. It need not be the rare or the alternative allele. A statement like 'the T allele raises risk' is only interpretable with this information.

How to read a genotype

You have each autosomal gene twice, so the genotype names both alleles, for example C/C, C/T or T/T. C/C and T/T are homozygous (two identical), C/T is heterozygous. Many effects are dose-dependent: one risk allele acts more weakly than two. Some providers write the alleles on the minus strand, so C/T appears as G/A. That is the same variant, just written the other way around.

Microarray versus full sequencing

A microarray measures only selected, predefined positions (typically a few hundred thousand). Full sequencing (WGS) reads the whole genome and can reproduce each of these positions, plus everything in between. Genome uses this: it reads the alleles at each desired site from the alignment and writes them in the format of 23andMe, AncestryDNA or GEDmatch so your own data fit into familiar tools. None of this leaves the machine.

Limits

A single SNP is rarely a verdict. Most effects are small, depend on ancestry, lifestyle and other genes, and come from association studies, not proof of causation. Array calls can also be wrong at individual positions. The dedicated pages on the vitamins and genes explain, for important markers, what is actually documented and what is not.

Vitamins and micronutrients

Each vitamin has its own page with the documented markers, effect alleles and sources: vitamin A (BCMO1), vitamin B6 (ALPL), vitamin B9 / folate (MTHFR), vitamin B12 (FUT2, CUBN, TCN2), vitamin C (SLC23A1), vitamin D (GC, DHCR7, CYP2R1, VDR), vitamin E (CYP4F2, SCARB1), vitamin K (VKORC1) and the antioxidant enzymes (SOD2, GPX1, CAT, NQO1). They all follow the same structure: markers, meaning, context.

Further SNP panels in Genome

Beyond vitamins and enzymes, Genome provides built-in panels for pharmacogenetics (including CYP2C19, CYP2C9, CYP2D6, DPYD, TPMT, SLCO1B1 and HLA hypersensitivity), disease-associated markers (MTHFR, APOE, cardiovascular, autoimmunity, cancer), iron metabolism (HFE C282Y and H63D, TMPRSS6), detoxification (GSTP1, COMT, CYP1A1), sports genetics (ACTN3, PPARGC1A), sleep (DEC2, ADA, CLOCK) and lactose/caffeine. Each panel contains rsIDs, genotype interpretations and the underlying PubMed sources.

What Genome measures. The genotype (both alleles) at a position named by rsID, read from the alignment, plus the export of many such positions in the format of common microarray providers.

Related topics

Sources

  1. 1Sherry et al., 2001 dbSNP: the NCBI database of genetic variation. Nucleic Acids Research 29:308–311. doi.org/10.1093/nar/29.1.308
  2. 2Buniello et al., 2019 The NHGRI-EBI GWAS Catalog of published genome-wide association studies. Nucleic Acids Research 47:D1005–D1012. doi.org/10.1093/nar/gky1120
  3. 3Bush & Moore, 2012 Chapter 11: Genome-wide association studies. PLoS Computational Biology 8:e1002822. doi.org/10.1371/journal.pcbi.1002822