This help document answers the following questions about the batch query tool:
Use the MGI Batch Query to retrieve data about many genes in MGI simultaneously. Currently, the tool retrieves gene/marker attributes (e.g., nomenclature; genome location; Ensembl, Entrez Gene, or VEGA IDs); annotations (e.g., gene ontology or mammalian phenotype terms; gene expression tissue/assay data; MGI allele, GenBank/RefSeq, RefSNP, or UniProt IDs). Future versions may include additional options.
You can also use the batch query to find duplications, ambiguities, and variations in data. You could check, for example, that an input list of MGI identifiers returns the same number of Ensembl, Entrez Gene, and VEGA IDs on different days or that switching the input and output IDs returns identical data. See Are there any examples? for additional ways that you might use this query.
Back to topThe Batch Query tool lets you either type or copy and paste in a list of IDs or upload a file containing that list.
In either case, no special characters are allowed in the list, so be sure to remove any quotation marks or other non-alphanumeric characters. Valid delimiters are tab, comma/space, space, and new line.
The process for using the Batch Query Tool is:
When typing or pasting IDs into the input box (you can copy and paste a column from an Excel spreadsheet):
If you are uploading a file:
For a given query, your list can be either mixed or all of the same type. You can also enter (mouse) gene symbols, synonyms, or orthologs.
MGI:96677 Trp53 Pax-6 P53 16590 ENSMUSG00000005672 OTTMUSG00000015949 247073 MI0000248 10379153
Input identifier type Representative example MGI Gene/Marker ID MGI:96677 Current Symbols Only Trp53
Pax6
D11Mit10All Symbols/Synonyms/Orthologs
(includes both current and old symbols)Pax6
Pax-6
Trp53
P53Entrez Gene ID 16590 Ensembl ID ENSMUSG00000028530 VEGA ID OTTMUSG00000015949 UniGene ID 247073 MiRBase ID MI0000248 GenBank/RefSeq ID* NM_001122899
AK033644
NP_666257UniProt ID P48356
Q3UFR6
A2AKJ2GO (Gene Ontology) GO:0019221 RefSNP ID rs3021544 Probeset ID 10379153 The following are additional types (not listed on the pulldown menu), followed by examples, of other types of IDs that the Batch Query recognizes:
* GenBank IDs are for nucleotide sequences only. RefSeq IDs are for either nucleotide or protein sequences. Back to top
Input identifier type Representative example BayGenomics LST083 DoTS
(Database Of Transcribed Sequences)DT.529646 DCFI
(Dana-Farber Cancer Institute)TC1572427 EC
(Enzyme Commission)2.7.10.1 FHCRC
(Fred Hutchinson Cancer Research Center)FHCRC-GT-S15-11C1 Homologene 20151 Lexicon OST128284 NIA Mouse Gene Index U167412 PDB
(Protein Data Bank)1HU8 TIGEM
(Telethon Institute of Genetics and Medicine)GC0196 TreeFam TF320146 Does the input list have to be in any particular form?
Yes.
Back to top
- If there are spaces in any list entry (as for example, Dickie's small eye, a synonym for Pax6), you must enclose that entry in double quotation marks (i.e.,Dickie's small eye).
- The Batch Query tool accepts space, tab, and newline-separated lists and removes any trailing commas from such lists.
What does "Search all input types" accomplish?
Since the Batch Query's default option is to search all input types, you do not have to identify what you enter or upload into the Batch Query. The tool determines whether they are of one type or a combination of IDs and or symbols. You may, however, select from the pulldown menu if you wish to constrain your query to a single type. See also When would I constrain my search to a single input type? below.
Back to topWhen would I constrain my search to a single input type?
You may wish to select a single input type from the Type list when you want the Batch Query to return:
Back to top
- Only current mouse gene symbols with MGI gene associations (for example, Acat1).
- Only current or former mouse symbols, synonyms, and orthologs with MGI gene associations.
- Results from one database and not another. Numerical identifiers (those without an alpha character), may recur in several databases (i.e., UniGene or Entrez Gene, and as an example, 12192).
How many IDs can I enter at one time?
There is no limit to the number of identifiers that you can enter all at once, but there is a limit to how many numbers different browsers can present. In general, you may wish to limit this number by splitting long lists into sections in order to speed up the return time. See the table under Why can I select all the choices under Gene Attributes, but I'm limited to only one choice under Additional Information? for examples of the amount of data that you can expect to return, depending on your input/output choices.
Back to topAre there choices for how to view query results?
Yes. You can customize your results in three areas: Gene Attributes, Additional Information, and Format. Click to make your selections.
Back to top
- Gene Attributes:
You can retrieve gene/marker data for any/all of the following gene attributes or identifiers by clicking the boxes beside them. Each selected item adds one or more columns to the report.
When you select... the resulting columns contain.... Nomenclature symbol, name, marker type Genome Location chromosome, start coordinate, end coordinate, strand, genome build Ensembl ID Ensembl ID Entrez Gene ID Entrez Gene ID VEGA ID VEGA ID - Additional Information:
You can find data about objects associated with (or annotated to) genes. Select one of the following options per query. The default is not to return any of these data.
When you select... the resulting columns contain.... Gene Ontology (GO) ID, term, code
NOT annotations do not appear in the results.Mammalian Phenotype Ontology (MP) ID, term
Normal annotations do not appear in the results.
See How do I interpret Mammalian Phenotype (MP) results?Alleles MGI allele identifier and allele symbols. Gene Expression Anatomical structure (Theiler stage and tissue name), assay results (total number of times the structure was examined, number of times expression was detected in the structure, number of times expression was found to be absent in the structure). These numbers summarize results obtained from wild type and mutant specimens. RefSNP ID RefSNP identifiers
Results include RefSNPs within 2 KB of the gene/marker.GenBank/RefSeq ID GenBank (nucleotide) or RefSeq (nucleotide or protein) sequence identifiers. UniProt ID UniProt sequence identifiers. Human Disease (OMIM) ID, term
See How do I interpret Human Disease (OMIM) results?- Format:
You can view the output of your query results in your browser in two formats:The default is web format.
- Web (HTML)
- Tab-delimited text.
If your results are greater than 300,000, the display is 300,000+. When the maximum is exceeded for a selected format, split your inquiry into smaller sections in order to see everything.
- For web format, the maximum number of results you can view at once is 10,000.
- For tab-delimited results, it is 200,000.
- If there are additional results, a count appears on the page (e.g., 10,000 out of 10,566).
Why can I select all the choices under Gene Attributes, but I'm limited to only one choice under Additional Information?
The purpose of the single Additional information choice is to limit results to a reasonable size. There is quite an increase in the amount of data returned when you select an Additional Information category. For example, if you enter symbols for 9 paired box genes...
Back to top
Your input list is Pax1, Pax2, Pax3, ... Pax9 You select Nomenclature and ... the MGI Batch Query returns... Genome Location 9 rows, one for each gene, Pax1 - Pax9 UniProt ID ~50 matching rows Gene Ontology (GO) ~192 matching rows Mammalian Phenotype (MP) ~556 matching rows Gene Expression ~1400 matching rows GenBank/RefSNP ID ~3000 matching rows How many results can I get at one time?
The number of results that can appear at once depends on the output format you select.
- For web format, the maximum is 10,000.
- For tab-delimited, the maximum is 200,000.
If the result total is greater than the maximum allowed, a line at the top of the MGI Batch Query Results report lists the number found. If your results are greater than 300,000, the display is 300,000+, and you'll have to split your query into smaller sections in order to see everything. See What can I do to speed up my query? for other suggestions on how to limit results.
Back to topCan I edit my original options and requery?
Yes, you can.
Notes:
- If you want to edit a list or change output options:
- On the query results page, click Modify. The window that appears contains the original input/output data.
- Edit the input list or change any Output options.
- Click the Search Again button.
- If you want to upload a different file:
- On the query results page, click Modify. The window that appears contains the original input/output data.
- Click Start New Batch Query. An empty Input/Output window appears.
- Click Upload file and make the appropriate selections. See How do I use this tool? for more uploading steps.
Back to top
- You can requery as many times as you like.
- Alternatively, you can click Start New Query to return to the original Batch Query form.
How do I interpret the MGI Batch Query Results report?
All results appear in the form of a table in either web (HTML) format, or in tab-delimited text depending on your Output selection.
Back to top
- A summary of your query parameters appears at the top of the page under You searched for.... This includes the total number of IDs you entered (or uploaded), the input identifier type of those IDs (e.g., current symbol, synonym, MGI, old symbol, and so on), and the Output options you selected for inclusion (e.g., Nomenclature, UniProt ID, and so on.)
- If the number of results exceed the display limit (10,000 lines for web output, 200,000 for tab-delimited), a line of text identifies the total found. If your results are greater than 300,000, the display is 300,000+.
- For each ID entered, at least one row is returned for that ID, its corresponding MGI gene/marker ID, plus columns for whichever attributes or additional information you selected (see Are there choices for how to view query results?).
- If a gene has more than one identifier associated to it, a row returns for each association (for example, there may be more than one Ensembl, Entrez Gene or VEGA ID; many GO or MP terms; lots of RefSNPs).
- Background row shading alternates by marker symbol. For example, if you were to enter Pax6 and Kit, all Pax6 associations would appear in one shading (e.g., light) and all Kit associations would appear in a contrast shading (e.g., dark).
- When you select Gene Ontology (GO), three columns return: ID (with an entry such as GO:0005525), term (with an entry such as GTP binding), and Code (with an entry such as IEA. See Guide to GO Evidence Codes at the Gene Ontology website for the definitions).
How do I interpret Mammalian Phenotype (MP) results?
The list of Mammalian Phenotype Ontology terms associated with a gene is a combination of all terms associated with all mutant alleles of that gene.
- Mammalian Phenotype (MP) terms appear by gene.
- Each term describes a mouse phenotype with some mutation in that gene.
- The term does not necessarily imply that mutations in that gene contribute to or cause the phenotype.
- Analyzed mice may have causative mutations in other genes.
- Wide phenotypic variation exists due to homozygotes vs. heterozygotes and different strain backgrounds.
See also How do I interpret the MGI Batch Query Results report? for information on the other fields.
For detailed information, use the Phenotypes, Alleles & Disease Models Query Form to find your gene of interest and Mammalian Phenotype terms associated with specific genotypes and strains.
Back to topHow do I interpret Human Disease (OMIM) results?
- Human Disease terms are listed by gene, followed by an ID and Term entry.
- Each term listed indicates that a mutant allele of this gene is involved in a mouse genotype used as a disease model.
- The term does not necessarily imply that mutations in that gene contribute to or cause the disease.
- Analyzed mice may have causative mutations in other genes.
- Wide variation exists due to homozygotes vs. heterozygotes and different strain backgrounds.
See also How do I interpret the MGI Batch Query Results report? for information on the other fields.
For detailed information, use the Phenotypes, Alleles & Disease Models Query Form to find your gene of interest and view Human Disease terms as they are associated with specific allelic mutations and strains.
Back to topHow do I interpret Gene Expression results?
If there is expression data from a reference in the Gene Expression Database, the anatomical structure examined (listed by Theiler stage and structure name) appears, followed by columns indicating:
- the total number of assay results for this gene/tissue
- the number of positive assay results (+)
- the number of negative assay results (−).
See also How do I interpret the MGI Batch Query Results report? for information on the other fields.
Use the Genes and Markers Query Form to find a gene of interest and view additional expression results (e.g., Literature Summary, Data Summary, Theiler Stages, Assay Types, cDNA source data, External Resources).
Back to topWhy is a gene listed multiple times in the report?
The relationship between a gene and its attribute/additional information categories is frequently not one-to-one. For example, a gene may have several Ensembl, Entrez Gene, or VEGA IDs associated with it; multiple gene ontology (GO) and/or Mammalian Phenotype (MP) terms; and many refSNP IDs. Based on your output criteria, the MGI Batch Query report returns a line of text for every one of these associations. See Why can I select all the choices under Gene Attributes, but I'm limited to only one choice under Additional Information? for the amount of data that the Batch Query returns, depending upon criteria.
Back to topWhat do the acronyms in the Gene Ontology "code" column mean?
See Guide to GO Evidence Codes at the Gene Ontology website.
Back to topHow do I save my query results in Excel?
- Copy the desired data (highlight and click Ctrl-C) on the MGI Batch Query results page.
- In Excel, click Edit and select Paste Special.
- In the Paste Special dialog box, select Unicode, and then click OK.
The data now appears separated into the same columns as on the MGI Batch Query report. You can add to it, subtract from it, rearrange or sort columns, and so on.Note: Some gene/marker symbols are interpreted by Excel as dates (as, for example, with the Septin family). To get around this, instead of using the steps for Paste Special above, do the following:
- Copy the desired text on the summary page into a text editor (e.g., Notepad or TextPad) and save this as a text file (e.g., with a .txt extension).
- Import the file into Excel (use Data->Import External Data->Import data).
- In the Text Import Wizard dialog boxes:
- Step1 of 3: Under Original data type, click Delimited and then Next.
- Step2 of 3: Under Delimiters, click Tab and then Next.
- Step3 of 3: Find the column containing the gene symbol and click to highlight it. Under Column data format, click Text (so that a gene name such as Sept5 is not rendered as a date, i.e., 5-Sep). Click Finish.
- Under Import Data, click the desired worksheet (existing or new) and then click OK.
The data now appears, arranged in columns, as on the MGI Batch Query report.Note: If you are importing from Excel into the MGI Batch Query tool, save the Excel file as text and choose tab-delimited or comma-separated whenever you have the option to do so.
See Excel help documentation for additional information.
Back to topWhat can I do to speed up my query?
There are several options for speeding up a query.
- Break the ID or symbol list into pieces and run several queries.
- Eliminate Gene Attributes (remove the click next to Nomenclature) or select fewer of them (for example, only Ensembl ID, only Entrez Gene ID, or only VEGA ID).
- Click None beneath Additional Information or select something other than RefSNP ID. If you want RefSNP IDs, using smaller input files will speed up the query.
See How many results can I get at one time? for the maximum display sizes.
Back to topMy query returned no results or No associated gene in one of the report columns. Why?
- No results are returned if:
- You upload a data file and don't specify the column containing your IDs or symbols correctly.
- You upload a data file and don't select the correct file type (tab-delimited or comma separated).
- No associated gene appears:
- In the row entry under MGI Gene/Marker ID, when the search does not find any annotations to or associations for that ID.
- For some (or all) entries, if you do not present a tab-delimited or comma-separated list.
Hint: Save as tab delimited or comma separated and rerun the query.- If you neglect to use quotation marks to enclose any gene synonym (or identifiers) containing spaces (e.g., Dickie's small eye must appear as "Dickie's small eye" on a list).
Note: If you get an error message before your query completes, try using a smaller list of IDs or selecting fewer output categories.
Back to topAre there examples?
Back to top
- I want to know all the phenotype annotations for a list of three genes.
- From Gene Attributes, click any other data to appear in the output report (or leave the default, Nomenclature).
- From Additional Information, click Mammalian Phenotype (MP).
- Change Input Type to Current Symbols Only (or leave the default, Search all input types).
- Type Brca1, Brca2, and Bard1 into the ID/Symbols List box.
- Click Search.
- I want to add a fourth item to my input list of Brca1, Brca2, and Bard1 and I want to experiment with using the default versus picking an input type to see what differences, if any, are in the results.
- Follow the instructions for the first example and click Modify when the results appear.
- Add A (or a) to the list.
- Leave the default.
- Click Search again. Five results appear. The ID type is current symbol for four of the items and human synonym for the fifth item.
- Click Modify again.
- Don't edit the list but do change the Input Type to Current Symbols Only and click Search. Four results appear, all identified as current symbol.
- I have a text file of MGI Marker IDs and I want to know the corresponding GenBank/RefSeq IDs.
- In the Input column next to Type, leave the default (Search all input types).
- Under Source:, click the Upload File tab.
- From the menu that appears, beneath File Type, check that the delimiter and column defaults are accurate.
- Beneath ID/Symbols File, click Choose file. Check that the file name that appears (beside the button) is correct.
- In the Output box, under Additional Information, click GenBank/RefSeqID.
- Click Search.
- I want a list of MGI alleles for Entrez Gene IDs located in the third column of an Excel spreadsheet.
- In the Input column next to Type, click the arrow and select Entrez Gene ID (or leave the default, Search all input types) on the pulldown list.
- In the Output column, Beside Additional Information: click Alleles.
- In Excel:
- Click the top of the column to select it, then click Edit->Copy.
- In your browser, click inside the input box, and then click Edit->Paste.
- For Input type click EntrezGene on the pulldown list (or leave the default, Search all input types).
- Click Search.
- I used the Genes and Markers Query Form and selected tab-delimited output for my results, and I now have a summary report with 67 matching items. How do I get a column of this data into the MGI Batch Query?
Note: The steps below do not work if you are using the Firefox browser. Check their website for a workaround or use a different browser for saving MGI Batch Query results in Excel.
- Note the column number containing the data of interest.
- Name and save the file. Under Save as type:, select Text (Tab delimited).
- On the MGI Batch Query, click the Upload File tab, and Choose File to browse and locate the file.
- Click the File Type (i.e., tab-delimited), enter the column number (e.g., 5), and, if desired, select the input identifier type (e.g., MGI Gene/Marker ID) or leave the default (Search all input types).
- Click the desired output options.
- Click Search.
- Click Modify button and change either the Input or Output option(s) to requery.
Note: If your initial data are not in tab-delimited or comma-separated format, copy and paste the file into a spreadsheet, save it in one of those formats, and then use the MGI Batch Query to upload the desired column (be sure to identify the proper File Type and column).
Contributing Projects:
Mouse Genome Database (MGD), Gene Expression Database (GXD), Mouse Tumor Biology (MTB), Gene Ontology (GO), MouseCycCiting These Resources
Funding Information
Warranty Disclaimer & Copyright Notice
Send questions and comments to User Support.last database update
11/20/2009
MGI_4.31
Web browser compatibility![]()