This Help document answers the following questions about the SNP Data in MGI report:
What SNP data does the MGI database contain?
- The MGI SNP data load consists of mouse reference SNPs (rs) and the corresponding submitted SNP assays (ss), where at least one ss assay of a reference SNP has strains defined for the alleles submitted.
- MGI loads mouse Reference SNPs (rs) and their corresponding submitted SNP assays (ss) where at least one ss assay of a Reference SNP has strains defined for the alleles submitted.
- MGI loads SNPs that map to multiple locations only if all locations for the SNP map to the same chromosome. (SNPs that map to more than one chromosome and SNPs that map to the same chromosome in more than two locations are not loaded).
What SNP data does the MGI database NOT contain?
MGI does not load the following data:
- SNPs (rs and ss) that have no submitted assays with defined strain/allele relationships.
- Submitted SNP (ss) strain alleles where the value of the strain allele is N (N = Allele determination failed for that strain).
- SNPs (rs) where all submitted SNP strain alleles have a value of N.
- SNPs (rs and ss) that map to more than one chromosome in the C57BL/6J genome.
- SNPs (rs and ss) that are unmapped in the C57BL/6J genome.
Where does MGI get mouse SNP data and when does it load this data?
MGI loads mouse SNP data from dbSNP at NCBI in conjunction with new dbSNP build releases.
How can I tell which dbSNP build the data comes from?
The dbSNP build number appears at the top of the SNP Data in MGI Report
in the Provider/Version column.
What's in the SNP Data in MGI report?
The report is a summary of the dbSNP data in the MGI database and contains information about the:
- dbSNP build number
- number of submitted SNP assays (ss) loaded
- number of reference SNPs (rs) loaded
- average number of ss per rs
- total number of strains with SNPs loaded
- number of ss per submitter
- number of rs per strain, by chromosome.
The report contains links to:
- dbSNP home page at NCBI
- dbSNP build statistics
- assay submitter details at dbSNP.
How do I interpret the data in this report?
|The name of the provider (dbSNP at NCBI) and beside it, a link to the provider's web site. Beneath this is the dbSNP build number that comprises the SNP data in MGI.
||Details about MGI SNP data, Reference SNPs (rs), submitted SNP assays (ss), and what SNP data the MGI database does and does not contain. Beneath these details are links to statistics for this build at NCBI dbSNP and to a list of all mouse Reference SNPs from dbSNP not loaded (at MGI) plus the reason for the omission.
||All data values are from the most current dbSNP build loaded into the MGI database. Note: Some values for number of SNPs in MGI may differ from Provider values. See Description for details. For more information, see dbSNP Data Statistics.
|Total Assays (ss):
||Number of submitted SNP assays in the MGI database.
|Total RefSNPs (rs):
||Number of reference SNPs in the MGI database.
|Assays per RefSNP (ss/rs)
||Average number of submitted SNP assays divided by the number of reference SNPs in the data load.
|Total Strains with SNP data
||There are 86 strains, each of which should appear in the RefSNPs (rs) per Strain by Chromosome table. Strains without SNP data are not represented.
|Assays (ss) per submitter
||Name of each submitter, linked to the submitter's details at dbSNP, followed by the number of assays submitted.
|RefSNP s (rs) per Strain by Chromosome
||The table consists of:
Strains are sorted by total SNP count across all chromosomes. Strains appear in the same order as on the Mouse SNP Query Form.
- a row for each of the 86 mouse strains, plus one at the top with the heading All Strains.
- a column for each of the 21 mouse chromosomes.
Rows sum but columns do not.
- The number in the All Chromosomes column (on the far right) is the sum of the row values beneath Chr 1 thru Chr Y (the first is the SNP total for All Strains, followed by the SNP totals per strain).
- The number in the All Strains column (across the top) is NOT the sum of the SNP totals in the column beneath it: e.g., the SNPs on Chromosome 1 for all 86 strains do not add up to this number. This is because SNPs can be represented multiple times (i.e., when typed to multiple strains).
Why do the numbers returned by the query form differ from those in this summary?
When you use the Mouse SNP Query Form to search for SNPs search on a given strain or chromosome, the match is actually on the number of genome locations for SNPs returned.
- By default, the number of SNP locations is the same as the number of unique SNPs, since SNPs that map to multiple locations are not returned under the default Output options settings.
- If you click Include SNPs that map to multiple locations (Output options -->
Multiple Locations on the SNP Query Form) then the number of SNP locations can be different from the number of unique SNPs returned because some returned SNPs might map to multiple locations in the genome.
- By contrast, the numbers displayed in the Statistics section on the this summary (SNP Data in MGI) represent unique SNP counts.
- Unique SNP identifiers are counted once, whether the SNPs map to a single location or to multiple locations in the genome.
- Chromosome X, All Strains
unique rs, single location: 9876
unique rs, multiple locations: 76
total unique rs: 9952
locations for rs with single location: 9876
locations for rs with multiple locations: 168
total locations for rs: 10044
- Chromosome X, strain 129s1/svimj
unique rs, single location: 750
unique rs, multiple locations: 15
total unique rs: 765
locations for rs with single location: 750
locations for rs with multiple locations: 36
total locations for rs: 786
How is this report sorted?
The default sort order for the MGI SNP Summary report is by strain. The 23 most common inbred strains appear in alphanumeric order, followed by a list of any additional mouse strains with strain allele data in dbSNP, in alphanumeric order.