About   Help   FAQ
Using the MGI Batch Query

This help document answers the following questions about the batch query tool:


What can I use the MGI Batch Query to find?

Use the MGI Batch Query to retrieve data about many genes in MGI simultaneously. Currently, the tool retrieves gene/marker attributes (e.g., nomenclature; genome location; Ensembl, Entrez Gene, or VEGA IDs); annotations (e.g., gene ontology or mammalian phenotype terms; gene expression tissue/assay data; MGI allele, GenBank/RefSeq, RefSNP, or UniProt IDs). Future versions may include additional options.

You can also use the batch query to find duplications, ambiguities, and variations in data. You could check, for example, that an input list of MGI identifiers returns the same number of Ensembl, Entrez Gene, and VEGA IDs on different days or that switching the input and output IDs returns identical data. See Are there any examples? for additional ways that you might use this query.

Back to top

How do I use this tool?

The Batch Query tool lets you either type or copy and paste in a list of IDs or upload a file containing that list.

In either case, no special characters are allowed in the list, so be sure to remove any quotation marks or other non-alphanumeric characters. Valid delimiters are tab, comma/space, space, and new line.

The process for using the Batch Query Tool is:

  1. Enter (type and paste or upload) IDs (or symbols).
  2. Select the fields to appear in the output.
  3. Choose an output format.

When typing or pasting IDs into the input box (you can copy and paste a column from an Excel spreadsheet):

  1. Beneath Source, click the Enter Text tab.
  2. Type or paste your list into the ID/Symbols List box. The Batch Query tool accepts space, tab, and newline-separated lists and removes any trailing commas.
  3. Beside Type, either leave the default (Search all input types), or (when your list consists of a single input type and you want to make this distinction), click the arrow and select from the pulldown list. See When would I constrain my search to a single input type? for details.
  4. In the Output area, click to select Gene Attributes, Additional Information, and Format to appear in the output report. (See Are there choices for how to view query results? for more information.)
  5. Click Search.

If you are uploading a file:

  1. If this is spreadsheet data, be sure to save the file in tab- or comma-delimited format.
  2. In the Input column of the query form, leave the default (Search all input types) in the Type box, or click the arrow and choose a type from the list, if you wish to constrain the query. See When would I constrain my search to a single input type? for more details.
  3. Beneath Source:, click the Upload File tab. A menu of choices appears.
  4. Click the Choose File button and browse to locate your data. Once selected, the name of your file appears beside the button.
  5. Beneath File Type and ID/Symbols column, check that the defaults (tab-delimited and 1) are correct for your uploaded file.
  6. In the Output area, click to select Gene Attributes, Additional Information, and Format to appear in the output report. (See Are there choices for how to view query results? for more information.)
  7. Click Search.

Back to top

What are the accepted input identifier (ID) types?

For a given query, your list can be either mixed or all of the same type. You can also enter (mouse) gene symbols, synonyms, or orthologs.

Back to top

Does the input list have to be in any particular form?

Yes.

Back to top

What does "Search all input types" accomplish?

Since the Batch Query's default option is to search all input types, you do not have to identify what you enter or upload into the Batch Query. The tool determines whether they are of one type or a combination of IDs and or symbols. You may, however, select from the pulldown menu if you wish to constrain your query to a single type. See also When would I constrain my search to a single input type? below.

Back to top

When would I constrain my search to a single input type?

You may wish to select a single input type from the Type list when you want the Batch Query to return:

Back to top

How many IDs can I enter at one time?

There is no limit to the number of identifiers that you can enter all at once, but there is a limit to how many numbers different browsers can present. In general, you may wish to limit this number by splitting long lists into sections in order to speed up the return time. See the table under Why can I select all the choices under Gene Attributes, but I'm limited to only one choice under Additional Information? for examples of the amount of data that you can expect to return, depending on your input/output choices.

Back to top

Are there choices for how to view query results?

Yes. You can customize your results in three areas: Gene Attributes, Additional Information, and Format. Click to make your selections.

Back to top

Why can I select all the choices under Gene Attributes, but I'm limited to only one choice under Additional Information?

The purpose of the single Additional information choice is to limit results to a reasonable size. There is quite an increase in the amount of data returned when you select an Additional Information category. For example, if you enter symbols for 9 paired box genes...

Your input list is Pax1, Pax2, Pax3, ... Pax9
You select Nomenclature and ...the MGI Batch Query returns...
Genome Location9 rows, one for each gene, Pax1 - Pax9
UniProt ID~50 matching rows
Gene Ontology (GO)~192 matching rows
Mammalian Phenotype (MP)~556 matching rows
Gene Expression~1400 matching rows
GenBank/RefSNP ID~3000 matching rows

Back to top

How many results can I get at one time?

The number of results that can appear at once depends on the output format you select.

If the result total is greater than the maximum allowed, a line at the top of the MGI Batch Query Results report lists the number found. If your results are greater than 300,000, the display is 300,000+, and you'll have to split your query into smaller sections in order to see everything. See What can I do to speed up my query? for other suggestions on how to limit results.

Back to top

Can I edit my original options and requery?

Yes, you can.

Notes:

Back to top

How do I interpret the MGI Batch Query Results report?

All results appear in the form of a table in either web (HTML) format, or in tab-delimited text depending on your Output selection.

Back to top

How do I interpret Mammalian Phenotype (MP) results?

The list of Mammalian Phenotype Ontology terms associated with a gene is a combination of all terms associated with all mutant alleles of that gene.

See also How do I interpret the MGI Batch Query Results report? for information on the other fields.

For detailed information, use the Phenotypes, Alleles & Disease Models Query Form to find your gene of interest and Mammalian Phenotype terms associated with specific genotypes and strains.

Back to top

How do I interpret Human Disease (OMIM) results?

  • Human Disease terms are listed by gene, followed by an ID and Term entry.
  • See also How do I interpret the MGI Batch Query Results report? for information on the other fields.

    For detailed information, use the Phenotypes, Alleles & Disease Models Query Form to find your gene of interest and view Human Disease terms as they are associated with specific allelic mutations and strains.

    Back to top

    How do I interpret Gene Expression results?

    If there is expression data from a reference in the Gene Expression Database, the anatomical structure examined (listed by Theiler stage and structure name) appears, followed by columns indicating:

    See also How do I interpret the MGI Batch Query Results report? for information on the other fields.

    Use the Genes and Markers Query Form to find a gene of interest and view additional expression results (e.g., Literature Summary, Data Summary, Theiler Stages, Assay Types, cDNA source data, External Resources).

    Back to top

    Why is a gene listed multiple times in the report?

    The relationship between a gene and its attribute/additional information categories is frequently not one-to-one. For example, a gene may have several Ensembl, Entrez Gene, or VEGA IDs associated with it; multiple gene ontology (GO) and/or Mammalian Phenotype (MP) terms; and many refSNP IDs. Based on your output criteria, the MGI Batch Query report returns a line of text for every one of these associations. See Why can I select all the choices under Gene Attributes, but I'm limited to only one choice under Additional Information? for the amount of data that the Batch Query returns, depending upon criteria.

    Back to top

    What do the acronyms in the Gene Ontology "code" column mean?

    See Guide to GO Evidence Codes at the Gene Ontology website.

    Back to top

    How do I save my query results in Excel?

    1. Copy the desired data (highlight and click Ctrl-C) on the MGI Batch Query results page.
    2. In Excel, click Edit and select Paste Special.
    3. In the Paste Special dialog box, select Unicode, and then click OK.
      The data now appears separated into the same columns as on the MGI Batch Query report. You can add to it, subtract from it, rearrange or sort columns, and so on.

    Note: Some gene/marker symbols are interpreted by Excel as dates (as, for example, with the Septin family). To get around this, instead of using the steps for Paste Special above, do the following:

    1. Copy the desired text on the summary page into a text editor (e.g., Notepad or TextPad) and save this as a text file (e.g., with a .txt extension).
    2. Import the file into Excel (use Data->Import External Data->Import data).
    3. In the Text Import Wizard dialog boxes:

    Note: If you are importing from Excel into the MGI Batch Query tool, save the Excel file as text and choose tab-delimited or comma-separated whenever you have the option to do so.

    See Excel help documentation for additional information.

    Back to top

    What can I do to speed up my query?

    There are several options for speeding up a query.

    1. Break the ID or symbol list into pieces and run several queries.
    2. Eliminate Gene Attributes (remove the click next to Nomenclature) or select fewer of them (for example, only Ensembl ID, only Entrez Gene ID, or only VEGA ID).
    3. Click None beneath Additional Information or select something other than RefSNP ID. If you want RefSNP IDs, using smaller input files will speed up the query.

    See How many results can I get at one time? for the maximum display sizes.

    Back to top

    My query returned no results or No associated gene in one of the report columns. Why?

    Note: If you get an error message before your query completes, try using a smaller list of IDs or selecting fewer output categories.

    Back to top

    Are there examples?

    Back to top

    Contributing Projects:
    Mouse Genome Database (MGD), Gene Expression Database (GXD), Mouse Tumor Biology (MTB), Gene Ontology (GO), MouseCyc
    Citing These Resources
    Funding Information
    Warranty Disclaimer & Copyright Notice
    Send questions and comments to User Support.
    last database update
    11/20/2009
    MGI_4.31
    Web browser compatibility
    The Jackson Laboratory