Using the MGI Batch Query

This help document answers the following questions about the batch query tool:


What can I use the MGI Batch Query to find?

Use the MGI Batch Query to retrieve data about many MGI genes simultaneously. Currently, the tool retrieves;

Future versions may include additional options.

Top

How do I use this tool?

Given a set of gene symbols or input identifiers (e.g. MGI accession, RefSNP, VEGA IDs) from a spreadsheet, you can:

You can also use the batch query to find duplications, ambiguities, and variations in data. You could check, for example, that an input list of MGI identifiers returns the same number of Ensembl, Entrez Gene, and VEGA IDs on different days or that switching the input and output IDs returns identical data. See Are there any examples? for additional ways you might use this query.

Note:

Be sure to remove any quotation marks or other non-alphanumeric characters from any list you enter or upload. The only valid delimiters are tab, comma/space, space, and new line.

Typing or pasting IDs

When typing or pasting IDs into the input box (you can copy and paste a column from an Excel spreadsheet):

  1. Beneath Source, click the Enter Text tab.
  2. Type or paste your list into the ID/Symbols List box. The Batch Query tool accepts space, tab, and newline-separated lists and removes any trailing commas.
  3. Beside Type, either leave the default (Search all input types), or (when your list consists of a single input type and you want to make this distinction), click the arrow and select from the pulldown list. See When would I constrain my search to a single input type? for details.
  4. In the Output area, click to select Gene Attributes, Additional Information, and Format to appear in the output report. (See Are there choices for how to view query results? for more information.)
  5. Click Search.

Uploading a file:

  1. If this is spreadsheet data, be sure to save the file in tab- or comma-delimited format.
  2. In the Input column of the query form, leave the default (Search all input types) in the Type box, or click the arrow and choose a type from the list, if you wish to constrain the query. See When would I constrain my search to a single input type? for more details.
  3. Beneath Source:, click the Upload File tab. A menu of choices appears.
  4. Click Choose File and browse to locate your data. Once selected, the name of your file appears in the blank.
  5. Beneath File Type and ID/Symbols column, check that the defaults (tab-delimited and 1) are correct for your uploaded file.
  6. In the Output area, click to select Gene Attributes, Additional Information, and Format to appear in the output report. (See Are there choices for how to view query results? for more information.)
  7. Click Search.

Top

What are the accepted input identifier (ID) types?

For a given query, your list can be either mixed or all of the same type. You can also enter (mouse) gene symbols, synonyms, or orthologs.


The following are representative examples for each type listed on the pulldown menu:
Input identifier typeRepresentative example
MGI Gene/Marker IDMGI:96677
Current Symbols OnlyTrp53
Pax6
D11Mit10
All Symbols/Synonyms/Orthologs
(includes both current and old symbols)
Pax6
Pax-6
Trp53
P53
Entrez Gene ID16590
Ensembl ID ENSMUSG00000028530
VEGA IDOTTMUSG00000015949
UniGene ID247073
MiRBase IDMI0000248
GenBank/RefSeq ID*NM_001122899
AK033644
NP_666257
UniProt IDP48356
A2AKJ2
GO (Gene Ontology)GO:0019221
RefSNP IDrs3021544
Affy Probeset ID10379153
 
*    GenBank IDs are for nucleotide sequences only. RefSeq IDs are for either nucleotide or protein sequences.
The following are additional types (not listed on the pulldown menu), followed by examples, of other types of IDs that the Batch Query recognizes:
Input identifier typeRepresentative example
EC
(Enzyme Commission)
2.7.10.1
Homologene20151
PDB
(Protein Data Bank)
1HU8
Consensus CDSCCDS16941.1
Protein OntologyPR:000004803

Top

Does the input list have to be in any particular form?

Yes.

Top

What does "Search all input types" accomplish?

Since the Batch Query's default option is to search all input types, you do not have to identify what you enter or upload into the Batch Query. The tool determines whether they are of one type or a combination of IDs and or symbols. You may, however, select from the pulldown menu if you wish to constrain your query to a single type. See also When would I constrain my search to a single input type? below.

Top

When would I constrain my search to a single input type?

You may wish to select a single input type from the Type list when you want the Batch Query to return:

Top

How many IDs can I enter at one time?

There is no limit to the number of identifiers that you can enter all at once, but there is a limit to how many numbers different browsers can present and there is a time constraint for very large files.

Top

Are there choices for how to view query results?

Yes. You can customize your results in three areas: Gene Attributes, Additional Information, and Format. Click to make your selections.

Top

Why can I select all the choices under Gene Attributes, but I'm limited to only one choice under Additional Information?

The purpose of the single Additional information choice is to limit results to a reasonable size. There is quite an increase in the amount of data returned when you select an Additional Information category. For example, if you enter symbols for 9 paired box genes (your input list is Pax1, Pax2, Pax3, ... Pax9), and you select:

Nomenclature and … the MGI Batch Query returns...
Genome Location 9 rows, one for each gene, Pax1 - Pax9
UniProt ID 30+ matching rows
Gene Ontology (GO) 200+ matching rows
Mammalian Phenotype (MP) 650+ matching rows
Gene Expression 1400+ matching rows
RefSNP ID ~2900 matching rows

Top

Can I edit my original options and requery?

Yes, you can.

See Are there examples? for a sample of a query modification.

Notes:

Top

How do I interpret results?

MGI Batch Query results

All Batch Query results appear in the form of a table in either web (HTML) format, or in tab-delimited text depending on your Output selection.

Mammalian Phenotype (MP) results

The resulting list of Mammalian Phenotype Ontology terms associated with a gene is a combination of all terms associated with all mutant alleles of that gene.

See also MGI Batch Query results for information on the other fields.

For detailed information, use the Phenotypes, Alleles & Disease Models Query Form to find your gene of interest and Mammalian Phenotype terms associated with specific genotypes and strains.

Human Disease (DO) results

Human Disease terms appear by gene, followed by an ID and the Disease Ontology vocabulary term entry.

See also MGI Batch Query results for information on other fields.

For detailed information, use the Phenotypes, Alleles & Disease Models Query Form to find your gene of interest and view Human Disease terms as they are associated with specific allelic mutations and strains.

Gene Expression results

If there is expression data from a curated reference in the Gene Expression Database, the anatomical structure examined (listed by Theiler stage and structure name) appears, followed by columns indicating:

See also MGI Batch Query results for information on the other fields.

Use the Genes and Markers Query Form to find a gene of interest and view additional expression results (e.g., Literature Summary, Data Summary, Theiler Stages, Assay Types, cDNA source data, External Resources).

Top

What do the acronyms in the Gene Ontology Code column mean?

See Guide to GO Evidence Codes at the Gene Ontology website.

Top

My query returned no results or No associated gene in one of the report columns. Why?

Note: If you get an error message before your query completes, try using a smaller list of IDs or selecting fewer output categories.

Top

How often are data updated?

MGI curators add new annotations from the literature every day. Sequence data are download from those databases weekly and undergo MGI curation. The MGI web site is updated with these and other new data once a week.

Top

Are there examples?

I want to see all phenotype annotations for two genes.

I want to add another item to an input list and see if there are different results depending on whether I use the default (Search all input types) or select an input type.

I have a text file of MGI Marker IDs and I want corresponding GenBank/RefSeq IDs.

I want a list of MGI alleles for Entrez Gene IDs located in the third column of an Excel spreadsheet.

I used the Genes and Markers Query Form and selected Text File output for my results, and I now have a summary report with 67 matching items. How do I get a column of this data into the MGI Batch Query?

Note: The steps below may not work with some versions of the Firefox browser. Check their website for a workaround or use a different browser for saving MGI Batch Query results in Excel. The Genes and Markers Query Form also provides the option to forward your results directly to the Batch Query.

Note: If your initial data are not in tab-delimited or comma-separated format, copy and paste the file into a spreadsheet, save it in one of those formats, and then use the MGI Batch Query to upload the desired column (be sure to identify the proper File Type and column).

Top