The goal of the Human–Mouse: Disease Connection (HMDC) is to provide seamless human-to-mouse 
data traversal, enabling clinical and translational researchers to take advantage of the wealth 
of data and annotations from mouse models; as well as allowing mouse researchers to connect 
their findings directly to genetic associations reported in human disease.
The mouse is genetically and physiologically similar to humans, is tractable as a 
laboratory animal, has a fully sequenced and well-annotated genome, and a readily available 
set of powerful molecular technologies for manipulating its genome in very precise ways. 
Now, clinical researchers whose primary focus is on human genetic disease, variants, and 
natural mutations have a highly-accessible way to explore experimentally characterized mouse 
mutants for a spectrum of associated phenotypes, as well as known disease models developed 
by the greater research community. Investigators can begin with genes (symbols, names or IDs) 
or gene lists, genome positions (as coordinates from human or mouse, or as .vcf
 files), OMIM diseases 
or phenotypes, and retrieve a list of genes, annotated mammalian phenotypes, associated human 
diseases and/or available mouse models for further research, along with comprehensive 
supporting references. If you are new to working with mouse models, or encounter terms 
which are unfamiliar to you, we also encourage you to browse the 
Introduction to Mouse Genetics in order to aid with the interpretation of your results.
Beginning your search
Entry into the HMDC is designed to be straightforward. On the homepage, three boxes appear 
where investigators may choose to enter (1) genes - either individually or multiples, (2) 
a genome location or set of genomic regions, and/or (3) disease name(s) or mammalian phenotype(s).
	- Searching genes:
			Gene based queries will match official symbols, full gene names and synonyms in both human and 
			mice. An asterisk (*) can be used as a wildcard in the prefix or suffix. 
			Separate multiple entries using commas, spaces or new lines.  
	- Searching genome locations:
			To search positions, please ensure that you are using the current genome 
			assembly (build) and have selecting the appropriate species using the radio 
			buttons above the search window. If you need to convert your data, a 
			simple online 
			remapping tool has been provided by NCBI. Currently 
			only base pair (bp) positions are supported, not megabase (Mb) or 
			linkage positions (i.e. centimorgans or band position). Separate multiple entries 
			using spaces, commas or new lines; please do not use commas within coordinate 
			positions as these will be misinterpreted as item separators for independent 
			genome locations. 
	- Searching disease and phenotype terms:
			Disease and phenotype terms may be searched using text-matching by simply 
			typing in the box, or you can take advantage of structured vocabulary by 
			selecting from the autocomplete list that appears. Multiple entries are 
			supported, results must match at least one term, but are not required to match 
			all terms. Use the filters once the grid has been generated to restrict your list.
			 The shaded grey text to the right of autocomplete terms indicates the vocabulary 
			 that an annotation applies to:
			
				- Mammalian Phenotype: will match all 
					genes where a mouse model has been reported to exhibit this phenotype. 
					These are hierarchical, so selecting a broad general term will also 
					bring back genes annotated to more specific child terms. See the 
					
					Mammalian Phenotype (MP) browser.
 
				- OMIM: from the 
					Online Mendelian Inheritance in Man database. Contains human diseases 
					with associated human genes and is cross-referenced externally and within 
					MGI for mouse genetic models of these human diseases. 
						
							- Some terms are very specific and may reference only a 
								subtype of the disease (ex. "Alzheimer disease, familial, 
								5"). In these cases, it may be preferable to use the 
								text-based matching without selecting an autocomplete value.
 
						
					 		
			
			 
		- Upload a vcf:
			Variant call format (.vcf) file upload is also supported for phenotype and 
			disease annotation. This tool is not equipped to do functional analysis of 
			variants or filtering, so we recommend uploading a trimmed candidate list. 
			See 
			Exomiser for a filtering tool designed to process human data. The default is to discard all SNPs with a 
			known dbSNP identifier (rs#). If you want to keep these genes in your results 
			clear the contents of the sixth column, but do not delete it, so as to preserve 
			standard column structure. See Section on vcf files for more 
			information. 
			
				
 
For many experimental questions, a single search box will be sufficient, but if two boxes 
have search terms entered, the results will reflect cases matching on both categories (boolean 
AND). While this can be an efficient shortcut, if a very precise phenotype or disease annotation 
is used, some potentially relevant results may be omitted.
Whichever search box is used, the same set of results tables will appear. To follow along
with this example, enter “trp53 proc apc cdkn2a” into the Genes search box (not 
case sensitive). 
[return to top]
Search Results
Along the top, an orange banner remembers your original query and allows you to quickly 
modify your results by adding, removing or replacing search terms. A second banner below 
serves as a header for your results, and indicates which type of matching was run. The results 
themselves are arranged into three tabbed tables for: Gene homology x Phenotypes/Diseases 
(shown), Genes, and Diseases.
 
1. Gene homology x Phenotypes/Diseases tab
	
		On this tab, a list of gene homologs (column 1: human, column 2: mouse) 
		and gene-associated phenotype (left side) or disease (right side) terms which matched 
		your search are returned as an interactive grid. Gene rows are only returned where 
		at least one phenotype or disease term has been annotated, and columns are only 
		displayed if at least one gene association has been reported. Matching 
		
		transgenes (denoted by "Tg(promoter-gene)lab_code"), where an artificial 
		construct has been introduced and expressed in a live mouse will also appear, but only 
		in the mouse column. This table compresses multiple alleles of a gene (if they exist) 
		into a single row.
		
		
		- Filtering: Results can be filtered by clicking the boxes that appear 
			adjacent to each row and above each column. This will filter
			all tabs.
				
					- Click to place a checkmark in the rows and/or columns you would like 
					to keep and click on the "Apply Filters" button or filter icon in the 
					top left of the grid. If a physiological system, disease or gene symbol 
					match is not relevant to your experimental question, leave them unselected to hide.
					
 
					- Filters can be removed by clicking on the "Remove filters" 
					note which will appear above all tabs, just below the orange Results header 
					bar.
					
 
				
			 
		- Left side of grid: If a mutant allele of a gene has been reported 
			to affect a particular anatomical/physiological system, a systems level phenotype
			 column will appear with a blue filled cell at the intersection of the gene and 
			 phenotype. These use a hierarchical structured 
			vocabulary, navigate the 
			Mammalian phenotype browser to find specific terms or click on the filled box 
			to see more details. 
				
					- Darker shades of blue indicate more annotations to aberrant phenotypes 
						within this anatomical/physiological system
 
					- Clear cells indicate no data.
 
					- If an aspect of a system has been specifically examined in the 
						context of mouse gene mutants and found to be normal, an N 
						will appear (with blue background fill). 
 
					- If researchers determined that the mouse carrying a particular 
						mutation appeared overall ‘normal’, a normal phenotype 
						column will be displayed on the far right of the Mammalian Phenotypes section
 
				
		
			 
		- Filled boxes on Mammalian Phenotypes side: Click individual boxes to 
			generate a pop-up window with genetic 
			and phenotypic details. The specific allele pairs and exact phenotypes will be 
			displayed. Clicking on an allele symbol or row will generate a new window 
			with the complete list of all phenotype annotations in all systems with 
			supporting references (J:#s). For help interpreting Allele Detail pages see 
			
			here.
			
			
			
			
 
		
		- Right side of grid: If mutations in this gene have been associated 
			with a human disease or reported as a mouse model of a human disease, columns 
			will appear to the right with colored fill indicating the species. 
				
					- Orange filled cells are used if the Gene x Disease association is 
					supported by human data. Human data annotations come from 
					OMIM, 
					NCBI curation, 
					Gene Reviews, 
					or Gene Tests.
 
					- If mutant mice have been reported as genetic models for this disease, 
					a blue fill will be used. MGI curators annotate this data based on author statements 
					in peer reviewed publications.
 
					- In cases where both mouse genetic models and human clinical cases 
					support involvement of orthologous (same gene in different species) genes, 
					a two-toned fill will appear. 
 
					- Clear indicates that the intersection of a given gene and human 
					disease has not been reported. 
 
				
 
		- Filled boxes on Human Disease associations side: Click individual boxes 
			to generate a pop-up with genetic and disease details. The specific allele 
			pairs used to model the disease in mice will be displayed. Clicking on an 
			allele symbol or row will generate a new tab with the complete list of all 
			phenotype annotations in all systems within this disease model, along with 
			supporting references (J:#s). Find disease-specific references by clicking 
			on the J:# in the disease box, as well as links to a
			 
			Human Disease and Mouse Model Detail page by clicking on the disease name, or
			the OMIM entry for that disease 
			by clicking on the OMIM ID.
			
			
			
			
 		
		- Unexpected extra diseases may appear in the grid if a phenotype or disease term was 
			used. 
			For example, a Disease search for "Alzheimer" will also return "Breast 
			Cancer" and "Schizophrenia" on the grid. These are returned because 
			the specific allele pair(s) which match models for "Alzheimer Disease" have 
			also been reported to exhibit the characteristic phenotypes of these other diseases:  
			
			Cav1tm1Mls/tm1Mls for both Alzheimer Disease and Breast Cancer, and 
			
			Plcb1tm1Hssh/tm1Hssh for Alzheimer Disease and Schizophrenia. 
			Use the filters to hide if these are not of interest, or go to the Diseases tab 
			to see only those diseases which matched the original search terms with their 
			gene annotations. 
	
 	
[return to top]
2. Genes tab
	
		On the Genes tab, the complete list of genes will be returned, with human and mouse 
		homologs listed on separate rows (see column 1: Organism). You may also note 
		that human gene standard nomenclature is in uppercase (ex. AMER1) while mouse genes are 
		written in sentence case (ex. Amer1). Genes matching your query but with no reported 	
		phenotypes or diseases will be included on this tab only, so you may see genes here 
		that do not appear on the grid view.
		
		
	
		- To filter this list, apply filters on the "Gene Homology x Phenotypes/Diseases" 
		tab.
 
		- This table may be sorted by using the arrows in certain column headers, and 
		data on this table (filtered or unfiltered) can be downloaded using the button indicated.
		Columns in the download are tab delimited and multiple "Associated Human Diseases" or "Abnormal 
		Mouse Phenotypes Reported in these Systems" terms are pipe separated (|).
 
		- Click on the Gene Symbol in the second column to go to a mouse Gene Detail 
		Page on MGI in rows where "Organism: mouse" or 
		Vertebrate Homology Page in rows where "Organism: human". Example: 
		mouse Apc
		and human APC.
 
		- The Associated Human Diseases column in a row that corresponds to Organism: 
		mouse will list diseases where mutant mice have been reported to display 
		phenotypes and symptoms matching the human disease. This also corresponds to a blue 
		filled cells in the right half "Gene homology x Phenotypes/Diseases" grid on the 
		previous tab.
 
		- The Associated Human Diseases column in a row that corresponds to Organism: 
		human indicates that mutations have been reported in human clinical cases of 
		this disease, or that variant associations have been made in human populations. 
		This corresponds to orange filled cells in the right half of the grid.
 
		- The References in MGI column provides a list of all MGI-curated references for 
		a gene, as well as a sub-list of Disease Relevant publications, where a mouse was 
		specifically reported by the authors as a model for one of the diseases represented 
		in the Associated Human Diseases column. Reference pages contain the full citation, 
		abstract, curated data, and direct links to the paper itself. These references 
		will focus on mouse genetic models as MGI does not curate human-only data.
 
		- The Find Mice (IMSR) column on the Genes tab contains links to the 
		International Mouse Strain Resource (IMSR) 
		which is a database indexing the major public and commercial mouse repositories. 
		The hyperlinked number indicates how many mouse strains carrying mutant alleles of 
		a gene are available for purchase. This may link to multiple unique alleles 
		and allele types. Use the Alleles column to see which allele is present 
		before placing an order to ensure that the expected phenotypes will be found. Order forms for each 
		strain are linked in the Repository column in the IMSR (see arrow). If the 
		"Find Mice (IMSR)" column on the Genes tab is blank, or the repositories listed with the IMSR do not 
		distribute your preferred model, it may be possible to obtain 
		mice by directly contacting the corresponding author of a publication which developed 
		the line. See the Original Reference at the bottom of MGI's Allele Detail pages.
 
	
 
[return to top]
3. Diseases tab
	
		Similar to the Genes tab, the Diseases tab returns a complete list of gene–associated 
		human disease terms, compiled from both human data and mouse models. If a 
		gene or transgene is listed in the Associated Mouse Markers column, this 
		indicates a mouse disease model has been reported for that Gene x Disease pair. If 
		a gene is listed in the Associated Human Markers column, a variant of this 
		gene has been implicated in human association studies or clinical cases. If an OMIM 
		autocomplete term was used as the starting search term, the list will be restricted 
		to the disease(s) specified, including if no gene associations have been reported. 
		If a phenotype term, gene symbol or position was used to run the search, all 
		diseases associated with the matched genes will be displayed.
		
		
		- The data displayed on this table can be downloaded using the button indicated.
 
		- The references column contains a list of publications where authors have 
		specifically reported that mutant or transgenic mice have been used to model the 
		characteristic phenotypes of the disease.
 
		- Click on the hyperlinked disease term to go to a Human Disease and Mouse 
		Model Detail page where gene links and genotype or allele specific information 
		can be viewed by clicking in the Mouse Models column.
		
			
				
					
						- The Human Disease and Mouse Model Detail page on MGI displays genes 
				associated with human data or mouse models as before, this time delineated 
				into up to three classes for "Associated in both", "Associated in mouse models", 
				and "Associated in human" indicated by the human and/or mouse graphics.
 
						- The OMIM entry for this disease can be reached by clicking on the 
						OMIM ID number just beneath the header.
 
					 
				 | 
				
				
				  | 
			
			
				
					
						- Clicking on the links in the Mouse Models column will 
						reveal details of the mouse genotype used as model, at least one 
						supporting reference and a link to a view of all phenotypes 
						that have been observed in mice with this combination of mutant 
						alleles and transgenes (where applicable).
 
					 
				 | 
				
				 
			 | 
			
		
 
	
		
 
[return to top]
Uploading a vcf File
A vcf file is a file format used to store gene sequence variations by position. Typically, 
these are the result of whole genome or exome sequencing. The format must include:
	- Column 1: CHROM - chromosome number
 
	- Column 2: POS - variant position in basepairs. 
	PLEASE ENSURE THAT THESE POSITIONS ARE BASED ON GRCh37 (HUMAN) OR GRCm38 (MOUSE).
	[Convert]
  
	- Column 3: ID - if the variant is known, a refSNP or other reference identifier 
		will appear. If unknown, this column may be blank or contain "." 
 
	- Column 4: REF - the reference allele based on alignment to a reference genome.
 
	- Column 5: ALT - the alternative allele that was detected in the sample
 
	- Column 6: QUAL - a quality score for sequence reads and base calling
 
	- Column 7: FILTER - indicates that the call is of sufficient confidence for 
		the filters and thresholds applied during analysis. "PASS", some other quality score, 
		blank or may contain "."
 
	- Column 8: INFO - includes descriptors of the variant
 
	- Column 9: FORMAT - includes descriptors of the genotyping depth and quality
 
	- Column 10... Sample1 - sample data. If more than one individual was sequenced,
		each sample is arranged in a new column.
 
If you edit your file, please clear column data rather than removing columns.
See sample files for:
Human and 
Mouse.
		
In this beta release...
The HMDC does not offer filtering or variant specific analyses in our initial release, 
but rather applies phenotype and disease terms to the genes and homologs which are represented. 
Note that this will return phenotypes associated with genes even if the SNP variant is 
benign. For this reason, we have set the default to filter out all variants that have a 
known rs:# or other identifiers in column 3 (ID), as well as reject all variants that 
are not tagged as "PASS" in column 7 (FILTER). We also strongly 
encourage users to pre-filter by presumed functional impact or on the basis of linkage 
(where possible) using a tool such as 
Exomiser.
These filters allow a phenotypic annotation to de novo, unreported (rare) or private 
gene variants. If your experimental design requires that known variants be included, you can 
bypass the ID filter by clearing column 3 in your uploaded file.
Maximum upload file size is 25MB. Only the first 100,000 lines will be processed.
This form has only been tested with VCF v4.0 and higher standard formats.
Please contact MGI
User Support to suggest important features that you would like to see implemented in 
future releases.
[return to top]