About   Help   FAQ
Using the Mouse Sequence Query Form
This help document answers the following questions:

See also:

What can I use the Mouse Sequence Query Form to find?

Use this query form to find mouse sequences by entering some or all of the following:

The query retrieves the first 5000 sequences that match your parameters. See Is there a limit on the number of sequences that a query returns? for more details. Note: A query will only display the number indicated in Maximum number of items returned. The default is 100.

You can determine the sort order of the results. See How do the sorting and output format options work?.

How do I use this form to find sequences?

The result is either a Sequence Summary report listing all the sequences found or a text file containing these sequences in tab-delimited or FASTA format. See Using Sequence Summary Reports.

See also:

Back to top

Is there a limit on the number of sequences that a query returns?

You can get back a maximum of 5000 sequences matching your parameters. The default is set to return a maximum of 100 sequences. You can change this number in the Sorting and output format section of the query form.

See also:

Back to Top

What values are acceptable in each of the query form fields?

Sequence attributes | Sequence source | Annotated genes & markers | Map position | GO Classifications | Protein Domains | Expression | Phenotype/Disease | Sorting & output format

Field Description
Sequence attributes
Sequence TypeClick one or more items from the selection list. The default is to return sequences from Any type on the list.
Sequence ProviderClick one or more items from the selection list. The default is to return sequences from Any provider on the list. See Valid Sequence Provider and Type Combinations for the allowed combinations.
Description From Sequence ProviderEnter something you think the provider is likely to use in the text description of the sequence. The system searches the MGI database for something containing whatever you enter. Note: This search is restricted to sequences associated to a mouse gene or marker.
Sequence Length
  • Select an operator or use the default (greater than or equal to).
  • Type in the number of base pairs or amino acids. Use and for ranges (i.e., if the sequence length is between 500 and 1000 base pairs, enter 500 and 1000).
  • Use equals (=) when you know the exact length of a particular sequence.
Sequence source
Strain (or species if other than laboratory mouse)/
Tissue/
Library
Information about the sequence from the source provider, but resolved to standard MGI vocabularies.

Browse the strain/species, tissue, or library list to locate desired items:

  1. Type or paste one or more strain/species, tissue, or library name in the box.
  2. Select an operator or use the default (begins).
  3. Click the NOT box to exclude a particular strain (or species if other than laboratory mouse), tissue, or library.
  4. Note: The standard strain (or species if other than laboratory mouse) list is only a partial list of what you may query on.
Clone collectionUse this field to restrict your search for sequences to those derived from a clone collection set (clone libraries from one or more collections of libraries are called clone collections). For example, the IMAGE clone collection is the group of clones from all IMAGE clone libraries. The default is to not restrict the search to any particular clone collection.
  • To restrict the search, select one or more of the clone collections from the drop-down list. (For details such as the differences between the sets on this list, see MGI Clone Collections.)
  • Select ANY if you do not want to restrict the search by clone collection. If you select multiple clone collections, the search returns data from any of the selected collections.
  • To make multiple selections, see How do I select more than one option on a list?
Annotated genes and markers Use this field to find sequences associated with different types of MGI markers.
Marker Symbol/Name
  • Pick an operator from the selection list (the default is contains).
  • Enter the gene symbol/name/synonym. Note: If the symbol/name or synonym contains a comma, put the entire term in quotation marks to ensure that the search is on the exact phrase (example: "cyclin B1, related sequence").
  • Use the default search (current symbols/names and synonyms) or make another choice from the selection list.
TypeThe kind of marker, as defined in MGI.
  • Use the default (no particular type) or pick from the selection list.
  • Click the NOT check box to exclude types from the search.
  • Map positionUse these parameters to limit the scope of retrieved sequences to those associated with mapped genes and markers. There are several ways to use the map position parameters. If you use the chromosome parameter, you may also define a cytogenetic band, cM position, or genome coordinate. However, you cannot combine cytogenetic band, cM position, and genome coordinate with one another.
    Chromosome
  • Choose an operator (on/not on)
  • Select one or more items from the chromosome selection list.
  • Cytogenetic Band
  • Specify all or a portion of a cytogenetic band offset. The default is equals (=), which limits the search to exact matches.
  • Select an alternative operator if you want to search using part of a cytogenetic band offset.
  • Click the NOT box to exclude whatever you type in the box.
  • cM position
    1. Specify a chromosomal region BETWEEN two points on one chromosome selected from the Chromosome list by entering the cM values to begin and end the region, separated by and (for example, 50 and 75).
    2. Specify a region WITHIN an area surrounding a marker by entering the current, approved symbol and the number of cM on either side of the marker. For this option, you do not need to not select a chromosome.
    Genome Coordinates
    1. Make a selection on the Chromosome list.
    2. Enter the genome assembly coordinate(s) for a mapped gene or marker.
    3. Use the default (Between) or click the down arrow to select the <= (less than or equal to) or >= (greater than or equal to) operator.
    4. Click the other down arrow and select either bp (base pairs) or Mbp (mega base pairs) as the unit of measurement. (The default is base pairs.)
    5. Enter the coordinate(s):

    6. If you use...Type or paste in the box...Examples: (be sure to select the proper unit: Mbp or bp)
      BetweenCoordinates indicating the area that includes the marker of interest, separated by either AND or - . The maximum number of characters is 23. 125618206 AND 125622026
      125618206 - 125622026
      125.618 AND 125.622
      125.618 - 125.622
      < =A single coordinate less than or equal to that of a mapped gene or marker. 125618205
      125.61
      > =A single coordinate greater than or equal to a mapped gene or marker. 125622026
      125.63
    Note: The build number appearing on the query form is that of the most current NCBI assembly sequence. See Assembling Genomic Sequences for complete details.

    The Mouse Sequence Query Form returns all sequences associated with markers annotated to the map positions you enter. Thus, if a marker located at your map position has 10 sequences associated with it, the query returns 10 sequences rather than just 1 marker.

    Gene Ontology (GO) ClassificationsUse this field to find sequences associated with genes annotated with GO classification terms. See Using the Marker Query Form - Gene Ontology (GO) Classifications for details on entering GO terms.

    The Mouse Sequence Query Form returns all sequences associated with markers annotated to the GO terms you enter. Thus, if a marker annotated with your GO term has 10 sequences associated with it, the query returns 10 sequences rather than just 1 marker. As an example, try:

    Gene Ontology Term(s) contains prenylation
    Biological Process  checked

    This query returns over 100 sequences associated to MGI markers containing prenylation in the Gene Ontology (GO) classification field on the Marker/Gene detail page.

    Protein DomainsUse this field to find sequences associated with genes annotated with a specific InterPro accession ID for a protein family, domain, or functional site.

    The Mouse Sequence Query Form returns a list of all sequences associated with markers annotated to the protein domain(s) you enter. Thus, if a marker annotated to your protein domain has 10 sequences associated with it, the query returns 10 sequences rather than just 1 marker.

    • If you click view all the InterPro domains, you can scroll to select an Interpro ID (e.g., IPR000001) from the list that appears, then copy and paste it into the field.
    • If you click InterPro/EBI, a new window opens to the InterPro-EBI home page where you can continue your search.
    • The default is a "contains" search. This means that you'll bet back any item from the Interpro database with an ID or description containing the characters you type in the box. You can change this to an = (equals), begins, ends, or like search by selecting one of these items from the drop-down menu.
    • You can enter more than one InterPro ID or description and use the logical operator AND between them. A search of term1 AND term2 returns markers annotated to both term1 and term2.
    Expression

    Use this field to find sequences associated to genes annotated with specific expression data.
    Developmental Stage(s) Use this field and its operators (in or not in) to select one or more Theiler stages (TS) to focus your search on a particular stage of embryonic development. A dictionary for TS 27 is not yet available. See Developmental Stage(s) for more details.
    Anatomical Structure(s) The anatomical structures are taken from the Anatomical Dictionary. Clicking on a stage brings up a list of anatomical terms for structures present at that stage of development. Enter a single item or multiple items, separated by commas. For each stage, the Anatomical Dictionary is organized as a hierarchy of structures. You can choose to include either substructures (children), superstructures (parents), both, or neither in your search. The default is to Include substructures. See Anatomical Structure(s) for more details.

    Assay Type(s) Selection list of assay types. Use this field and its operators (in and not in) to limit your search to assays of one or more selected types. The default assay type is ANY.

    The Mouse Sequence Query Form returns a list of sequences associated with markers annotated with expression data from your query. Thus, if the expression data you seek is annotated to a gene with 10 sequences associated with it, the query returns those 10 sequences rather than just 1 marker. As an example, try:

    Developmental Stage(s) in TS:26
    Anatomical structure(s)  contains brain
    Substructures checked
    Assay Type in RNase protection

    This query should return at least 100 sequences.

    Phenotype/Disease Use this field to find sequences associated with genes annotated with specific phenotype classification terms. See Phenotype/Disease for details.
    Sorting and output formatUse this field to limit your search results and to choose what items to view in your Sequence Summary.
    Maximum number of items returnedIf you want to limit the number of sequences returned, type in a number (5000 or less). The default is 100; the intent is not to speed up your search by limiting the scope but rather to let you restrict (or expand) your results. If you use the default and get a message on the Sequence Summary Report that says, for example, 100 of 171 sequences displayed, you can examine only the first 100 results. If you want to see the additional 71 sequences, rerun the query using the same parameters, but type 171 (or a somewhat larger number) in this field.
    Show in resultsClick the box next to select all to see results from every category listed next to Sequence Attributes, Sequence Source, and Properties of Annotated Markers in/on the Sequence Query Form. If you do not check select all, the report columns contain only the items checked. Click items to select/deselect.
    Sort byChoose an order from the selection list: Type, Provider, Length, Strain/Species, Tissue, Library, Clone Collection, Marker Symbol, Map Position, Genome Coordinates.
    Output formatChoose from the list depending on how you wish to view the Sequence Summary report columns you've selected.
    Web (default)Web page with MGI toolbar and all the selected columns.
    Tab-delimited textTab-delimited text returned to your browser with all the selected fields.
    Tab-delimited text to FTP siteTab-delimited text saved on our public FTP site available for 72 hours. Be sure to save the URL or print the page with the information for later access.
    FASTA formatTab-delimited FASTA-formatted results returned to your browser. Note:Selected output fields and sorting options do not apply to FASTA output files.
    FASTA format to FTP siteTab-delimited FASTA-formatted results saved on our public FTP site for 72 hours. Be sure to save the URL or print the page with the information for later access.

    Output retrieval:

    If you select...The 1st sort is by...The 2nd sort is by ... The 3rd sort is by...The 4th sort is by...
    ProviderRefSeq, GenBank/EMBL/DDBJ:Other, DFCI, DoTS, NIA Mouse Gene Index,
    SWISS-PROT, TrEMBL
    RNA, DNA, polypeptideLength (descending) 
    Type (default)RNA, DNA, polypeptideRefSeq, GenBank/EMBL/DDBJ:Other, DFCI, DoTS, NIA Mouse Gene Index, SWISS-PROT, TrEMBL Length (descending) 
    LengthLength (descending)
    Strain/Species, Tissue, LibraryAlphabetical (ascending). Not Specified, Not Resolved appear at the end.Provider: RefSeq, GenBank/EMBL/DDBJ:Other, DFCI/DoTS, NIA Mouse Gene Index, SWISS-PROT, TrEMBLType: RNA, DNA, polypeptideLength (descending)
    Clone Collection Riken (FANTOM), RPCI-23, RPCI-24, IMAGE, NIA, and so on.
    Marker SymbolSymbol (ascending)Provider: RefSeq, GenBank/EMBL/DDBJ:Other, DFCI/DoTS, NIA Mouse Gene Index, SWISS-PROT, TrEMBLType: RNA, DNA, polypeptideLength (descending)
    Map PositionChromosome (ascending)Numerical cM position (if any) (ascending)Alphanumeric by cytogenetic band (ascending)Length (descending)
    Genome CoordinatesChromosome (ascending)Coordinates (ascending)Symbol/name 

    Back to Top

    I got a summary report with a "Link to Results on FTP Site" message. What do I do now?

    See Using Sequence Summary Reports.

    Back to Top

    How do I interpret the summary results of my query?

    See Using Sequence Summary Reports.

    Back to Top

    How do I interpret the detail results of my query?

    See the Using Sequence Detail Reports.

    Back to Top

    Are there any sample queries?

    The following examples identify the field values to set for the query. Default values for any other fields are assumed.

    1. Find GenBank RNA sequences with fewer than 2000 base pairs containing the description interleukin.
      Sequence Type:   RNA
      Provider:  GenBank/EMBL/DDBJ
      Description From Sequence Provider:   interleukin
      Sequence Length:   <=2000
      Maximum number of items returned:  500
    2. Find SWISS-PROT polypeptide sequences containing between 200 and 300 amino acids.
    3. Sequence Type:   polypeptide
      Sequence Provider: SWISS-PROT
      Sequence Length: between 200 and 300
      Maximum number of items returned  500
    4. Find any C57BL/6 sequences associated to markers on chromosome 13 from the RPCI-23 library.
      Strain:  begins C57BL/6
      Library: begins   RPCI-23
      Chromosome: on 13
      Maximum number of items returned:  500
    5. Find sequences associated to genes with symbols/names containing cd2 mapped to Chr 3 between 30 and 50 cM.
      Marker Symbol/Name: contains cd2 searching current symbols/names and synonyms
      Chromosome: on 3
      cM Position: between 30 and 50
      Sort by: Type
      Maximum number of items returned:  500
    6. Find sequences associated to genes with annotated GO terms containing prenylation.
      Gene Ontology Term(s): contains prenylation
      Biological Process  checked
      Maximum number of items displayed   500
    7. Find sequences associated to markers with expression results detected by in situ assays at TS 26 in structures containing brain.
      Developmental Stage(s)TS 26
      Anatomical Structure(s): contains brain
      Substructures  checked
      Assay Type(s): in RNA in situ
      Maximum number of items returned: 500
    8. Find sequences associated with markers annotated with any coat color anomalies.
      Phenotype Classifications: coat color: anomalies
      Maximum number of items returned: 500
    9. Find sequences associated with genes that have allele annotations containing eye.
      Phenotype Text Search: eye
      Maximum number of items returned: 1000

    Back to Top

    What should I do if I get a status message?

    MessageProblemRecommendation
    100 of nnn sequences displayedThe query form default for Maximum number of items returned is 100. This message lets you know that your query is successful, but the resulting summary contains only nnn of your "hits." If you want to examine all nnn of these sequences, rerun your query (using the same parameters), but change the display maximum (in the Sorting and output format section) so that it's equal to or greater than nnn.
    Warning
    Your query matches more than 5000 sequences, the MGI designated limit.
    Below are 100 of these sequences.
    • To change the number of sequences displayed, go to the Mouse Sequence Query Form, change the value for the Maximum number of items returned in the Sorting and output format section, and rerun the search. You may need to reenter your original search parameters.
    • To refine your search to find fewer than 5000 sequences, go to the Mouse Sequence Query Form, select additional search constraints, and rerun the search. To read more about this, see Sequence Query Hints.
    • To retrieve more than 5000 sequences, please contact MGI User Support.
    Your search returned more sequences than we can process in a reasonable amount of time. For example, asking to see all mRNA from GenBank produces this status message. To find fewer than 5000 sequences, return to the query form and refine your search. See Sequence Query Hints for how to do this. If you need to see 5000+ sequences, contact MGI User Support and ask for a custom SQL query.
    Link to Results on FTP site
    ftp://ftp.informatics.jax.org/wirequests/nnnnnn/
    Results will be available for 72 hours.
    Results will be available for 72 hours. Your results will be available until n/nn/200n at hh:mm pm EST.
    Print or save this page to save the URL to your results.
    Your query report is deleted from the FTP site after 72 hours. If you want to retrieve your results but don't wish to do so at this time, print or save this page to retain the URL.
    Results Summary
    nnn of nnn sequences were successfully retrieved in FASTA format. The following sequences could not be retrieved.
    Please contact MGI User Support if you have questions about these missing sequences.
    The reason for the unreturned sequences cannot be determined at this point. Ask MGI User Support to research the issue and get back to you. Be sure to save this URL and to contact User Support within the 72 hours.
    MGI Query Error
    Your query did not complete. The cause is likely to be too much system activity at the present time. Please try your query again later.
    Many users are querying from the forms right now, or some queries require many of the system resources, or your query alone returns a large number of sequences. Try to narrow your query so that it returns fewer results. See Sequence Query Hints for how to do this. If you still get this message, try your query again later.

    Back to Top

    Contributing Projects:
    Mouse Genome Database (MGD), Gene Expression Database (GXD), Mouse Tumor Biology (MTB), Gene Ontology (GO), MouseCyc
    Citing These Resources
    Funding Information
    Warranty Disclaimer & Copyright Notice
    Send questions and comments to User Support.
    last database update
    08/07/2008
    MGI_4.11
    Web browser compatibility
    The Jackson Laboratory