Using the MGI Batch Query
This help document answers the following questions about the batch query tool:
Use the MGI Batch Query to retrieve data about many MGI genes simultaneously. Currently, the tool retrieves;
Future versions may include additional options.
TopGiven a set of gene symbols or input identifiers (e.g. MGI accession, RefSNP, VEGA IDs) from a spreadsheet, you can:
You can also use the batch query to find duplications, ambiguities, and variations in data. You could check, for example, that an input list of MGI identifiers returns the same number of Ensembl, Entrez Gene, and VEGA IDs on different days or that switching the input and output IDs returns identical data. See Are there any examples? for additional ways you might use this query.
Note:Be sure to remove any quotation marks or other non-alphanumeric characters from any list you enter or upload. The only valid delimiters are tab, comma/space, space, and new line.
When typing or pasting IDs into the input box (you can copy and paste a column from an Excel spreadsheet):
For a given query, your list can be either mixed or all of the same type. You can also enter (mouse) gene symbols, synonyms, or orthologs.
MGI:96677 Trp53 Pax-6 P53 16590 ENSMUSG00000005672 OTTMUSG00000015949 247073 MI0000248 10379153
Input identifier type Representative example MGI Gene/Marker ID MGI:96677 Current Symbols Only Trp53
Pax6
D11Mit10All Symbols/Synonyms/Orthologs
(includes both current and old symbols)Pax6
Pax-6
Trp53
P53Entrez Gene ID 16590 Ensembl ID ENSMUSG00000028530 VEGA ID OTTMUSG00000015949 UniGene ID 247073 MiRBase ID MI0000248 GenBank/RefSeq ID* NM_001122899
AK033644
NP_666257UniProt ID P48356
A2AKJ2GO (Gene Ontology) GO:0019221 RefSNP ID rs3021544 Probeset ID 10379153 The following are additional types (not listed on the pulldown menu), followed by examples, of other types of IDs that the Batch Query recognizes:
* GenBank IDs are for nucleotide sequences only. RefSeq IDs are for either nucleotide or protein sequences. Top
Input identifier type Representative example BayGenomics LST083 DoTS
(Database Of Transcribed Sequences)DT.529646 DCFI
(Dana-Farber Cancer Institute)TC1572427 EC
(Enzyme Commission)2.7.10.1 FHCRC
(Fred Hutchinson Cancer Research Center)FHCRC-GT-S15-11C1 Homologene 20151 Lexicon OST128284 NIA Mouse Gene Index U167412 PDB
(Protein Data Bank)1HU8 Does the input list have to be in any particular form?
Yes.
Top
- If there are spaces in any list entry (as for example, Dickie's small eye, a synonym for Pax6), you must enclose that entry in double quotation marks (i.e.,Dickie's small eye).
- The Batch Query tool accepts space, tab, and newline-separated lists and removes any trailing commas from such lists.
What does "Search all input types" accomplish?
Since the Batch Query's default option is to search all input types, you do not have to identify what you enter or upload into the Batch Query. The tool determines whether they are of one type or a combination of IDs and or symbols. You may, however, select from the pulldown menu if you wish to constrain your query to a single type. See also When would I constrain my search to a single input type? below.
TopWhen would I constrain my search to a single input type?
You may wish to select a single input type from the Type list when you want the Batch Query to return:
Top
- Only current mouse gene symbols with MGI gene associations (for example, Acat1).
- Only current or former mouse symbols, synonyms, and orthologs with MGI gene associations.
- Results from one database and not another. Numerical identifiers (those without an alpha character), may recur in several databases (i.e., UniGene or Entrez Gene, and as an example, 12192).
How many IDs can I enter at one time?
There is no limit to the number of identifiers that you can enter all at once, but there is a limit to how many numbers different browsers can present and there is a time constraint for very large files.
Top
- Splitting long lists into sections speeds up return times. See the table under Why can I select all the choices under Gene Attributes, but I'm limited to only one choice under Additional Information? for examples of the amount of data that you can expect to return, depending on your input/output choices.
- Changing the default from Search all input types to the specific ID type is a good idea when you have a very large file containing one type of identifier (e.g., all symbols/synonyms/orthologs).
Are there choices for how to view query results?
Yes. You can customize your results in three areas: Gene Attributes, Additional Information, and Format. Click to make your selections.
Top
- Gene Attributes:
You can retrieve gene/marker data for any/all of the following gene attributes or identifiers by clicking the boxes beside them. Each selected item adds one or more columns to the report.
When you select... the resulting columns contain.... Nomenclature symbol, name, marker type Genome Location chromosome, start coordinate, end coordinate, strand, genome build Ensembl ID Ensembl ID Entrez Gene ID Entrez Gene ID VEGA ID VEGA ID - Additional Information:
You can find data about objects associated with (or annotated to) genes. Select one of the following options per query. The default is not to return any of these data.
When you select... the resulting columns contain.... Gene Ontology (GO) ID, term, code
NOT annotations do not appear in the results.Mammalian Phenotype Ontology (MP) ID, term
Normal annotations do not appear in the results.
See How do I interpret Mammalian Phenotype (MP) results?Alleles MGI allele identifier and allele symbols. Gene Expression Anatomical structure (Theiler stage and tissue name), assay results (total number of times the structure was examined, number of times expression was detected in the structure, number of times expression was found to be absent in the structure). These numbers summarize results obtained from wild type and mutant specimens. The detected counts also include specimens for which detected = ambiguous or not specified (as well as present). RefSNP ID RefSNP identifiers
Results include RefSNPs within 2 KB of the gene/marker.GenBank/RefSeq ID GenBank (nucleotide) or RefSeq (nucleotide or protein) sequence identifiers. UniProt ID UniProt sequence identifiers. Human Disease (OMIM) ID, term
See How do I interpret Human Disease (OMIM) results?- Format:
You can view the output of your query results in your browser in several formats:The results are initially returned in web format and then you can choose to Export them in Tab-delimited text or spreadsheet formats.
- Web (HTML)
- Tab-delimited text.
- Excel
Why can I select all the choices under Gene Attributes, but I'm limited to only one choice under Additional Information?
The purpose of the single Additional information choice is to limit results to a reasonable size. There is quite an increase in the amount of data returned when you select an Additional Information category. For example, if you enter symbols for 9 paired box genes (your input list is Pax1, Pax2, Pax3, ... Pax9), and you select:
Top
Nomenclature and ... the MGI Batch Query returns... Genome Location 9 rows, one for each gene, Pax1 - Pax9 UniProt ID 30+ matching rows Gene Ontology (GO) 200+ matching rows Mammalian Phenotype (MP) 650+ matching rows Gene Expression 1400+ matching rows RefSNP ID ~2900 matching rows Can I edit my original options and requery?
Yes, you can.
Notes:
- If you want to edit a list or change output options:
See Are there examples? for a sample of a query modification.
- When your results appear, click the banner at the top of the page (Click to modify search). (The banner title changes to Click to hide search.) The window that appears contains your original search criteria.
- Edit the input list or change any Output options.
- Click Search.
- Repeat steps 1 and 2 until the desired results appear.
- If you want to upload a different file:
- When your results appear, click the banner at the top of the page (Click to modify search). (The banner title in changes to Click to Hide Search.) The window that appears contains your original search criteria.
- Click the Upload File tab and then the Choose File button.
- Select the appropriate file. See How do I use this tool? for more information about uploading files.
Top
- You can requery as many times as you like.
- Alternatively, click Reset to return to the original Batch Query form.
How do I interpret results?
TopMGI Batch Query results
All Batch Query results appear in the form of a table in either web (HTML) format, or in tab-delimited text depending on your Output selection.
- A summary of your query parameters appears at the top of the page under You searched for.... It lists:
- the total number of IDs/symbols you entered (or uploaded)
- the input identifier type of those IDs (e.g., Search all input types, MGI Gene/Marker ID, Current Symbols Only, and so on)
- your Output options (e.g., Nomenclature,Genome Location, UniProt ID, and so on)
- For each ID entered, at least one row is returned for that ID, its corresponding MGI gene/marker ID, plus columns for whichever attributes or additional information you selected (see Are there choices for how to view query results?).
- If a gene has more than one associated identifier, a row returns for each association (for example, there may be more than one Ensembl, Entrez Gene or VEGA ID; many GO or MP terms; lots of RefSNPs).
- Background row shading alternates by marker symbol. For example, if you were to enter Pax6 and Kit, all Pax6 associations would appear in one shading (e.g., light) and all Kit associations would appear in a contrast shading (e.g., dark).
- Entries in the Feature Type column (beneath Nomenclature) identify the category and/or subcategory of the marker (e.g., snoRNA Gene, QTL, Pseudogene, Complex/Cluster/Region, and so on.)
- When you select Gene Ontology (GO), three columns return: ID (with an entry such as GO:0005525), term (with an entry such as GTP binding), and Code (with an entry such as IEA. See Guide to GO Evidence Codes at the Gene Ontology website for the definitions).
Mammalian Phenotype (MP) results
The resulting list of Mammalian Phenotype Ontology terms associated with a gene is a combination of all terms associated with all mutant alleles of that gene.
- Mammalian Phenotype (MP) terms appear by gene.
- Each term describes a mouse phenotype with some mutation in that gene.
- The term does not necessarily imply that mutations in that gene contribute to or cause the phenotype.
- Analyzed mice may have causative mutations in other genes.
- Wide phenotypic variation exists due to homozygotes vs. heterozygotes and different strain backgrounds.
See also MGI Batch Query results for information on the other fields.
For detailed information, use the Phenotypes, Alleles & Disease Models Query Form to find your gene of interest and Mammalian Phenotype terms associated with specific genotypes and strains.
Human Disease (OMIM) results
Human Disease terms appear by gene, followed by an ID and the Mammalian Phenotype vocabulary term entry.
- Each term listed indicates that a mutant allele of this gene is involved in a mouse genotype used as a disease model.
- The term does not necessarily imply that mutations in that gene contribute to or cause the disease.
- Analyzed mice may have causative mutations in other genes.
- Wide variation exists due to homozygotes vs. heterozygotes and different strain backgrounds.
See also MGI Batch Query results for information on other fields.
For detailed information, use the Phenotypes, Alleles & Disease Models Query Form to find your gene of interest and view Human Disease terms as they are associated with specific allelic mutations and strains.
Gene Expression results
If there is expression data from a curated reference in the Gene Expression Database, the anatomical structure examined (listed by Theiler stage and structure name) appears, followed by columns indicating:
- the total number of assay results for this gene/tissue
- the number of positive assay results (+)
- the number of negative assay results (−).
- the detected counts also include specimens for which detected = ambiguous or not specified (as well as present).
Note: Present appears if expression is reported as detected but the author did not specify the level. Not specified is used when the authors do not report whether a gel band is present or absent. Ambiguous is used when the curators cannot discern from the authors’ description whether expression is present or absent.
See also MGI Batch Query results for information on the other fields.
Use the Genes and Markers Query Form to find a gene of interest and view additional expression results (e.g., Literature Summary, Data Summary, Theiler Stages, Assay Types, cDNA source data, External Resources).
What do the acronyms in the Gene Ontology Code column mean?
See Guide to GO Evidence Codes at the Gene Ontology website.
TopWhat can I do to speed up my query?
There are several options for speeding up a query.
Top
- Break the ID or symbol list into pieces and run several queries.
- Eliminate Gene Attributes (remove the click next to Nomenclature) or select fewer of them (for example, only Ensembl ID, only Entrez Gene ID, or only VEGA ID).
- Click None beneath Additional Information or select something other than RefSNP ID. If you want RefSNP IDs, using smaller input files will speed up the query.
My query returned no results or No associated gene in one of the report columns. Why?
- No results are returned if you upload a data file and do not:
- specify the column containing your IDs or symbols correctly.
- select the correct file type (tab delimited or comma separated).
- No associated gene appears:
- In the row entry under MGI Gene/Marker ID, when the search does not find any annotations to/associations for that ID.
- For some (or all) entries, if a file is not tab delimited or comma separated. Hint: Save as tab delimited or comma separated and rerun the query.
- If you do not enclose any gene synonym (or identifiers) containing spaces in quotation marks.
(e.g., change Dickie's small eye to "Dickie's small eye").Note: If you get an error message before your query completes, try using a smaller list of IDs or selecting fewer output categories.
TopAre there examples?
I want to see all phenotype annotations for two genes.
- From Gene Attributes, click any other data you want to see in the output report (or leave the default, Nomenclature).
- From Additional Information, click Human Disease.
- Change Input Type to Current Symbols Only (or leave the default, Search all input types).
- Type the gene symbols in the ID/Symbols List box.
- Click Search.
I want to add another item to an input list and see if there are different results depending on whether I use the default (Search all input types) or select an input type.
- Follow the instructions for the first example and click Modify when the results appear.
- Add an item to the list.
- Leave the default.
- Click Search again. The ID type is current symbol for some items and human synonym for others.
- Click Modify again.
- Don't edit the list but do change the Input Type to Current Symbols Only and click Search. All results are identified as current symbol.
I have a text file of MGI Marker IDs and I want corresponding GenBank/RefSeq IDs.
- Input box:
- Next to Type, leave the default (Search all input types).
- Beneath Source, click the Upload File tab. A menu appears.
- Beneath File Type, check that the delimiter choice (tab-delimited or comma separated) and ID/Symbol Column number match the content of your text file.
- Beneath ID/Symbols File, click Browse.
- Locate and select your text file, then check that the filename that appears (beside Browse) is correct.
- Output box: Beneath Additional Information, click GenBank/RefSeqID.
- Bottom of form: Click Search.
I want a list of MGI alleles for Entrez Gene IDs located in the third column of an Excel spreadsheet.
- In the Input box, next to Type, click the arrow and select Entrez Gene ID (or leave the default, Search all input types) on the pulldown list.
- In the output box, beneath Additional Information: click Alleles.
- In Excel, click the top of the column to select it, then Edit->Copy.
- In your browser, click inside the input box, and then Edit->Paste.
- In the Input box, click EntrezGene on the pulldown list (or leave the default, Search all input types).
- On the bottom of form, click Search.
I used the Genes and Markers Query Form and selected tab-delimited output for my results, and I now have a summary report with 67 matching items. How do I get a column of this data into the MGI Batch Query?
Note: The steps below do not work if you are using the Firefox browser. Check their website for a workaround or use a different browser for saving MGI Batch Query results in Excel.
- In the summary report:
- Note the column number containing the data of interest.
- Name the file and save the file.
- Under Save as type:, select Text (Tab delimited).
- On the MGI Batch Query form, Input box:
- Beneath Source, click Upload File, and then Browse to locate the file.
- Click to select the File Type (tab-delimited), enter the column number (e.g., 6) and, if desired, make a selection beside from the list next to Type (e.g., MGI Gene/Marker ID) or leave the default (Search all input types).
- In the Output box, click the desired Gene Attributes and Additional Information options.
- On the bottom of the form, click Search.
- To re-query:
- Click Modify
- Change the Input and/or Output option(s).
- Click Search again.
Note: If your initial data are not in tab-delimited or comma-separated format, copy and paste the file into a spreadsheet, save it in one of those formats, and then use the MGI Batch Query to upload the desired column (be sure to identify the proper File Type and column). Top