About   Help   FAQ
Introduction to Mouse Genetics

Introduction to Mouse Genetics

Mouse genetics has experienced many milestones over the last few decades, from the generation of inbred lines which consistently exhibited a coat color trait or susceptibility to tumors in the early 1900s, up to routine gene manipulation, conditional targeting and whole genome sequencing or transcript profiling of individuals. As an experimental system, the mouse has allowed incredible insight into gene function, dissection of gene-disease relationships and a wealth of other data.

Table of Contents

Genes, Alleles and Genotypes

The study of genetics in it’s most basic form is the study of genes. What they are, where they are, how they work, how they differ, and how this forms the basis of inherited traits. Inherited traits cover the gamut of metabolic activity, disease susceptibility, body patterning, hair color, blood type and more.

The code for these traits resides in an organism's DNA. When a gene is activated, the DNA is transcribed into messenger RNA (mRNA) which is then translated into protein. The regulation of gene activation is controlled by transcription factors which can bind to non-coding promoter or enhancer elements. These recruit RNA polymerases and other factors to the transcriptional start site, which copy the sequence into a new molecule. The regions of the gene that will eventually code for protein are called exons and are spliced together with introns removed as part of mRNA processing. The canonical splice site consensus sequence in pre-mRNA uses "CAG| G" for the 3' donor and "MAG | GTRAGT" for the 5' splice site acceptor, where M is A or C and R is A or G[1]. Additional post-transcriptional regulation may also occur by non-coding microRNAs (miRNA), small interfering RNAs (siRNA), small nuclear or nucleolar RNAs (snRNA and snoRNA), or other processes. mRNA is capped, polyadenylated and exported from the nucleus to the ribosomes, where it is translated into protein.

central dogma
Figure 1. The central dogma of biology: DNA to mRNA to protein.

Figure 2. Infographic illustration of the hierarchy between species level genomes, genes (functional units), the sequence variation which defines alleles, leading to individual genotypes, which, when expressed dictate phenotypes. See right for details. The bottom photo shows an obese C57BL/6J- Lepob/Lepob mouse on the right, with a heterozygous Lepob/Lep+ lean littermate on the left.

Within a species, all members carry the same set of genes, or functional units. The mouse reference genome contains an approximately 24,500 protein-coding genes and was fully sequenced in 2002 [2]. Structurally, these are distributed across 19 pairs of autosomes, plus X and Y sex chromosomes. For the sake of comparison, humans have 23 pairs of chromosomes (22 autosomes plus XX or XY) which contain an estimated 20,687 protein-coding genes [3]. Over 90% of the mouse and human genomes can be aligned into regions of shared synteny, which is to say, blocks where homologous genes are conserved in the same relative order, indicative of shared common ancestry. At the gene level, a mouse homolog has been identified and classified for 17,096 human genes.

Variability between individuals within a species is due to variant gene forms called alleles. Some changes are able to alter the expression or function of genes resulting in observable phenotypic differences, such as coat color, disease resistance/susceptibility, or metabolism but many other alleles result in little to no detectable variation beyond the sequence level. Variant alleles may include single nucleotide polymorphisms (SNPs), insertions and deletions (indels), or copy number variation (CNVs). In the case of laboratory model organisms, targeted alleles also exist where genetic engineering techniques have been used to specifically alter, delete or introduce DNA sequences. A single gene may have multiple alleles with different biological consequences, and every allele in MGI is given a unique identifier, which is appended to the gene symbol as a superscript. See the section below on Nomenclature for more.

Barring the unusual case of chimeras, every nucleated cell in a mouse’s body carries the same DNA code. However, different gene expression patterns allow these cells to develop into a variety of tissues and organs, as well as respond to stimuli. The Gene Expression Database at MGI allows you to search for genes which are expressed in different mouse anatomical structures, with a particular emphasis on embryonic development.

The end result, whether a single measurement or the composite of all of an individual's observable or measurable traits is called the phenotype. In MGI, phenotypes are annotated to the complete genotype, which is comprised of both the allele(s) of special interest as well as the background. Just as different specific alleles may be expected to have different characteristics and biological impacts (consider missense coding variants versus knockout alleles versus conditional or reporter targeted alleles), other alleles in the strain background, whether known or unknown, can have modifier effects on a trait, or set of traits. For example, the Lepob allele shows background sensitivity with homozygous Lepob/Lepob on a C57BL/6J background exhibiting severe obesity with a pre-diabetes-like syndrome, while Lepob/Lepob on a C57BLKS background become severely diabetic and infertile in addition to obese, with a significantly shortened life expectancy.

Quick Glossary

For a full glossary, see: MGI Glossary

Term Definition
Inbred strain A strain that is essentially homozygous at all loci. In mice, requires brother- sister matings for at least 20 sequential generations. Within an inbred strain, individuals are genetically identical to one another; though different inbred strains each carry a unique set of alleles. See more.
Allele One of the variant forms of a gene, differing from other forms in its nucleotide sequence. Diploid organisms carry two alleles at each locus which may be the same (homozygous) or different (heterozygous). Hemizygous is used to describe alleles where you would not expect two alleles for an organism in the "normal" state (i.e. Y-chromosomes, X-linked genes in males, or insertion of foreign DNA). Includes natural variation and targeted variants.
Phenotypic Allele An allelic variant which produces an observable change. May be spontaneous mutations or targeted.
Targeted Allele Using laboratory techniques to specifically disrupt, alter or introduce DNA sequences such that the “targeted” allele will be heritable in the germ line of offspring. Includes knockouts, knock-ins, reporters, conditional alleles, and more.
Transgene Any DNA sequence or combination of sequences that has been introduced via a construct into the germ line of the animal by random integration.
  1. The complete set of genetic information carried by an individual. All alleles, natural and targeted along with background.
  2. A description of the genetic information carried by an organism. In the simplest case, genotype may refer to the information carried at a single locus, as in A/A, A/a, or a/a.
  1. The observable or measurable characteristics of an organism or individual.
  2. In MGI, also refers to a structured descriptive vocabulary (see Mammalian Phenotype Ontology)
Locus/Loci (pl.) Literally "place". Refers to chromosomal position or coordinates, or a gene/cluster which can be mapped there.
Endogenous That which originates from within the organism. ex. Gene sequences which have not been targeted.
  1. Two or more genes related by descent from a common ancestral DNA sequence. May be applied to genes in different species (ortholog), or gene duplications within a species (paralog).
  2. One of a pair of chromosomes that segregate from one another during the first meiotic division.
  3. A morphological structure in one species related to that in a second species by descent from a common ancestral structure.

Inbred Strains

To begin understanding mouse model genetics, it is important to begin at the roots - inbred lines. An inbred strain is one in which mice have been carefully bred either brother-sister or parent-progeny over a minimum of 20 sequential generations to produce a line which is genetically and phenotypically consistent. Most laboratory strains are now several hundreds of generations inbred. Within an inbred mouse strain, all individuals have the same genotype and are homozygous (carry a single allelic variant) at all loci. Members of an inbred strain are essentially genetic clones of one another, which allows good experimental reproducibility for genetically influenced traits, guaranteeing a consistent and uniform animal model for study, even across multiple generations or different laboratories.

inbred derivation

Figure 3. Derivation of inbred strains. Unique strains of laboratory mice are generated by repeated intercrossing (brother-sister) and/or backcrossing (parent-offspring) until mice are homozygous at all loci and progeny are identical.

Figure 4. Consequences of inbreeding for 20 generations. Points on the solid line indicate the portion of the genome that will be homozygous in any individual animal at each generation. Points on the dashed line indicate the portion of the genome that will be fixed identically in the two animals chosen to generate the following generation. From (Silver, 1995).


While all mice are genetically homogenous within an inbred line, different inbred lines will carry a different set of fixed alleles and exhibit different phenotypic characteristics as a result. Choosing the right inbred strain for a particular experiment or control is an important consideration when working with laboratory mice.

Having a well characterized and consistent background is also a valuable tool when it comes to examining the effects of a targeted allele or manipulation. By comparing the phenotype of the targeted line to its parental strain, it is possible to determine the which biological systems have been affected and draw inferences about the function of the single, modified gene.

Inbred strains are given names to identify the specific genetic background. These names are composed of a combination of capital letters and/or numbers and may also have substrain designations (letters or numbers following a forward slash) and-or lab codes to indicate where the strain is being maintained. See the nomenclature tutorial video below (requires Flash), or browse the Full Guidelines for Nomenclature of Mouse and Rat Strains. If the tutorial below is not displayed on this page, you may also be able to view it here.

Used with permission from The Jackson Laboratory.

The most commonly used inbred strain is C57BL/6J which was also the strain sequenced to generate the “reference” mouse genome. For more information on specific inbred strains, see the strain detail pages on the Mouse Phenome Database[4].

Targeted Alleles

As mentioned above, one of the advantages of inbred strains in genetic research is that they provide a stable framework to study the role a single gene by using mutants or knockout alleles. Unlike in population-based studies in humans where many genetic factors are segregating, a new, known mutation on a predictable background allows researchers to attribute new phenotypes to that specific gene and draw inferences about it’s normal function. Because a stable, predictable background is required, mutant phenotypes in MGI are attributed to the complete genotype of a mouse (specific allele on a defined inbred background). Many mutant alleles exhibit the same phenotype regardless of background, but others have altered penetrance or display a range of phenotypes in a background-dependent fashion as a result of other pathway modifiers.

One of the most useful aspects of the mouse as an experimental genetic model is this ability to easily generate targeted alleles. The canonical targeted allele is a gene knockout, where all or part of the coding region of a targeted gene is replaced by a selection cassette, completely eliminating function. This is achieved by introducing a vector engineered to contain homologous sequences on either side of the targeting cassette into embryonic stem (ES) cells. The cellular machinery within a cell pairs the vector’s homologous arms to the mouse’s endogenous sequence and if recombination occurs on both sides, the cassette will be introduced into the germ line. These ES cells can be selected and transplanted back into a pseudopregnant female mouse where they will grow and develop. The illustration in Figure 5 (C) shows a heterozygous targeted allele.

homologous recombination

Figure 5. Generation of targeted alleles by homologous recombination. Targeted alleles of all kinds can be generated in embryonic stem cells by taking advantage of the normal processes of homologous recombination in the cell. (A) First, a targeting vector is constructed which contains the cassette to be introduced (engineered cassette) flanked both upstream and downstream with DNA sequences that match perfectly (i.e. are homologous) to the upstream and downstream regions around the endogenous gene or region an investigator wants to replace. In the case of a knockout allele, typically the engineered cassette will replace coding exons with a selectable marker such as a neomycin resistance gene (NEO cassette) to facilitate selection and enrichment of positive ES cell clones. Often a negative selection marker will be built into the vector downstream of the homologous region to allow screening against incomplete (single arm) or random recombinations. (B) The targeting vector is introduced into the nucleus of a pluripotent embryonic stem cell, where the homologous arms match up with their matching sequences in the endogenous gene. In a process similar to what occurs during meiosis, the aligned regions cross over, incorporating the engineered construct into the endogenous locus and removing the enclosed endogenous sequence. (C) The complete, properly targeted cell now has the engineered construct in it’s germ line in place of the reference sequence. ES cells can be screened for proper targeting using the selectable markers, sequencing, and/or Southern blotting, then positive clones can be grown to blastocysts and implanted in a pseudopregnant recipient mouse where they will develop as embryos.

This same strategy can also be used to introduce specific new variation into a mouse. This may be a single, specific engineered substitution designed to duplicate a mutation found in human clinical cases (ex. M390R in Bbs1tm1Vcs), a recombinase recognition sequence to allow in vivo genome manipulation (LoxP, FRT), or the introduction of a complete cDNA from another species, which is known as a transgene.

Figure 6. Schematic for four of the most common targeted allele types in mice. The top row depicts the endogenous (or unaltered) allele where exons are properly spliced (see dashed lines) and a full length, functional protein is produced. Below, a knock-out allele (KO) uses homologous recombination to replace certain functional exons with a targeting cassette containing a selectable marker. This gene is no longer able to produce a functional protein. A conditionally targeted allele, also commonly referred to as "floxed" for "flanked by LoxP" introduces recombinase recognizable sequences both upstream and downstream of the targeted exons. When no recombinase enzymes are expressed, these genes function as normal; however, if the correct recombinase is expressed within the same cell (illustrated with the scissors), it is able to act on these sites and produce an altered allele, most commonly excising the exon for a truncated, non-functional protein. For a reporter allele, a cassette is introduced which contains a detectable marker, commonly a fluorescent tag or lacZ gene which can be visualized under a microscope. These are often used to detect patterns of gene expression as the marker will indicate when the target promoter has been activated. They often disrupt the normal gene splicing or protein conformation, so may act as nulls or have only partial activity. The bottom row depicts a transgene, which is when foreign (non-mouse) DNA is introduced into the germ line with the intention or producing a new, functional (non-mouse origin) protein.

most common targeted allele types

Table 1. Common engineered allele types. Several of the most common engineered alleles used by the research community and included in MGI. Other allele types exist, as well as combinations and variations on those indicated below

Allele type Generation Description Most common uses (others may apply)
Knockout Targeted, homologous recombination Replaces the coding region of a known gene (complete gene or specific exons) with a targeting cassette. Complete loss-of-function. Used to study function of the targeted gene by comparison to non-targeted controls.
Knock-in (nucleotide substitution) Targeted, homologous recombination Replaces the coding region of a known gene with an alternative sequence of the same gene containing a specific mutation Used to study the phenotypic impact of this variation, often pulled from human clinical observations. Functional impact is allele specific.
Knock-in (insertion) Targeted, homologous recombination A targeting construct is used to introduce a new variant into a specific locus. May be an alternative variant of the targeted gene or a complete new cDNA Used to study the role of the variant and/or express the desired variant or new gene product, ex. humanized mice or a recombinase under the control of an endogenous promoter.
Floxed Targeted, homologous recombination LoxP recombinase recognition sequences are introduced flanking a targeted gene, exon or portion of a construct Co-expression of cre recombinase in the same cell will remodel the flanked region according to the directionality of the LoxP sites. In the absence of cre the gene is typically expected to function as wild type. See section on Conditional Alleles and cre-Lox. Permits tissue specific excision or expression of a gene. Allows study of genes required for developmental viability that would be embryonic lethal in a knock-out.
FRT Targeted, homologous recombination FRT recombinase recognition sequences are introduced flanking a targeted gene, exon or portion of a construct Similar to floxed, but sensitive to co-expression with Flp recombinase (flippase).
Targeted Reporter Targeted, homologous recombination In frame insertion of a reporter (GFP, LacZ, luciferase, etc) gene into a known locus following some regulatory sequence. May disrupt an endogenous gene though this is not required. Used to monitor transcriptional activity of an endogenous promoter in vivo. If used as a tag on a functional protein, may be used to study protein localization or cell trafficking of the gene product
Gene trap Random insertion A known cassette with a selectable marker is randomly incorporated into the germ line. See the International Gene Trap Consortium High throughput gene disruption. Often cassettes have reporter function allowing researchers to determine transcriptional activation of the endogenous promoter once the site of insertion has been determined
Transgenic (cre/Flp) typically random insertion, may be targeted Introduction of cre or Flp recombinase genes under the control of a promoter Used to specify which tissues will express a site-specific recombinase enzyme, and therefore when and/or where within an organism a floxed/FRT gene will be remodeled. See section on Conditional Alleles and cre-Lox
Transgenic (expressed) typically random insertion, may be targeted Introduction of a foreign-origin gene for expression in the mouse Expression of a non-mouse protein of interest.
Transgenic (reporter, cre) typically random insertion, may be targeted Introduction of a reporter gene with a loxP-flanked STOP codon before the coding sequence Allows researchers to assess the transgene activity of a cre recombinase transgenic line. Co-expression with cre removes the STOP codon and activates the reporter, marking tissues where cre activity is present. See more.

Conditional Alleles and the cre-Lox System

For many genes, a complete null allele can have very serious consequences for embryonic development or viability. As well, investigators may often wish to examine the role of a gene exclusively in the context of a particular organ system or cell type without causing abnormalities elsewhere. Site-specific recombinase technology can be used to efficiently cause deletions, translocations and inversions in genomic DNA with high fidelity when a genome-remodeling recombinase enzyme is co-expressed. By restricting expression of the recombinase to specific tissues or cell lineages - or by placing it under the control of a drug-inducible promoter - the genome remodeling can also be restricted to specific tissues or time points.

The most commonly used system in laboratory mice is called "cre-lox" which involves the cre recombinase enzyme, originally cloned from the P1 bacteriophage, and recombinase recognition sites referred to as "loxP sites" (locus of X-over P1). LoxP sites are 32 base pair consensus sequences with an 8-base core and two inverted repeats.


The inverted repeats give the loxP site the capacity for directionality. If the two flanking loxP sites (upstream and downstream of the target) are oriented in the same direction, the floxed segment will be excised along with one of the loxP repeats (as shown in Figure 5), if the loxP sites are opposite orientations (facing each other), the segment will be inverted, and if the loxP sites are located on different chromosomes (a trans arrangement), the recombinase will mediate chromosomal translocations.

Typically, cre and loxP strains are developed separately and crossed together to produce a cre-lox strain.

  1. mouse with a targeted "floxed" allele, carrying two loxP flanking a targeted region
  2. transgenic mouse with a cre recombinase enzyme expressed under the control of a promoter

The offspring of this cross will carry both the floxed targeted allele for the gene of interest as well as the recombinase. In tissues where the recombinase is expressed, the region flanked by loxP sites will be excised (or otherwise recombined) generating a tissue-restricted knockout. In tissues where the cre promoter is inactive, the recombinase will not be expressed and the floxed gene will remain functionally intact.

tissue specific excision, cre lox
Figure 7. Tissue specific excision of a floxed allele by co-expression with cre-recombinase.

Recombinase activity of a given cre transgene allele can be determined by crossing the cre mouse to a reporter strain which carries a loxP-flanked STOP cassette in front of a reporter gene (lacZ or GFP, for example). If cre recombinase is active in a tissue, the STOP will be excised and the reporter expressed.

To locate a cre recombinase strain that expresses in a particular tissue type, or is under the control of a specific known promoter, use MGI's Recombinase (cre) Portal, or search using the Phenotypes, Alleles and Disease Models query form, specifying "transgenic (Cre/Flp)" in the Categories section.

Structured vocabularies and ontologies

Structured, controlled vocabularies in biology allow data to be organized and accessed using standardized, hierarchical terms. It also allows databases such as MGI to annotate similar findings under similar headings, so that researchers do not need to browse multiple keywords at different levels of precision.

MGI annotates data using three major vocabularies:
  1. The Mammalian Phenotype (MP) Ontology
    Provides standardized terms for annotating mammalian phenotypic data. It is currently in use by MGI and the Rat Genome Database (RGD). The phenotype tables on allele detail pages and the affected physiological systems and phenotypes (left) in the Human-Mouse:Disease Connection results are populated by MP terms. Highest level terms represent physiological systems (for example, "cardiovascular system", "growth/size", "renal/urinary", "reproductive", "tumorigenesis", etc), with hierarchically organized child terms for increasingly specific annotations.
    Citation: Smith CL, Goldsmith CA, Eppig JT. The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biology. 2005;6(1):R7. PMID: 15642099

  2. The Gene Ontology (developed as part of the Gene Ontology consortium)
    • Cellular component: the parts of a cell or its extracellular environment
    • Biological process: operations or sets of molecular events with defined beginning and end, pertinent to the functioning of integrated living units - cells, tissues, organs and organisms
    • Molecular function: the elemental activities of a gene product at the molecular level, such as binding or catalysis
    Provides standardized terms for a range of gene product properties and describes normal behavior of these gene products within a cellular context. GO is species neutral and aimed to be applicable to a wide range from prokaryotes to complex multicellular organisms. In MGI, GO terms associated with a gene product can be found in the Gene Ontology (GO) classifications ribbon on gene detail pages.
    Citation: Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics 2000 May;25(1):25-9. PMID: 10802651

  3. The Mouse Developmental Anatomy Browser.
    Provides standardized terms for tissues, organs and anatomical structures found in the mouse throughout embryonic development and as an adult. The Anatomical Dictionary is part of the e-Mouse Atlas Project (EMAP). Mouse development is divided into Theiler Stages (TS) during which specific processes are expected to occur. Stages 1-26 are embryonic, stage 27 is perinatal and stage 28 is postnatal (adult).
    Citations: Hayamizu TF, Baldock RA, Ringwald M. Mouse anatomy ontologies: enhancements and tools for exploring and integrating biomedical data. Mammalian Genome 2015 Jul 25 PMID: 26208972
    Hayamizu TF, Wicks MN, Davidson DR, Burger A, Ringwald M, Baldock RA. EMAP/EMAPA ontology of mouse developmental anatomy: 2013 update. Journal of Biomedical Semantics 2013 Aug 26;4(1):15 PMID: 23972281


1. Baralle D, Baralle M. Splicing in action: assessing disease causing sequence changes. J Med Genet. 2005 Oct;42(10):737-48. PMID: 16199547

2. Mouse Genome Sequencing Consortium, Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, et al. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002 Dec 5;420(6915):520-62. PMID: 12466850

3. ENCODE Project Consortium, Bernstein BE, Birney E, Dunham I, Green ED, Gunter C, Snyder M. An integrated encyclopedia of DNA elements in the human genome. Nature 2012 Sep 6;489(7414):57-74. PMID: 22955616

4. Grubb SC, Bult CJ, Bogue MA. Mouse Phenome Database. Nucleic Acids Res. 2013 Nov 15. PMID:24243846