Interpreting the MouseBLAST Output Report
This help document answers the following questions:
Back to Using MouseBLAST - Overview
What does the MouseBLAST output report contain?
The MouseBLAST output report contains:
- WU-BLAST output
- Links to MGI gene/marker records that the query and matching sequences are associated with. For these genes/markers, the following data appear:
- gene/marker symbol
- gene/marker name
- the chromosome that the gene/marker has been mapped to (if known)
- RepeatMasker output - details of detected repeats, including their location.
- Repeat masked sequence - masked regions are replaced for N characters.
Alignment summary of matching sequences
The alignment summary provides a listing of the:
- Matching sequences in descending levels of similarity. For each matching sequence, the following data appear:
- Sequence accession ID (click to view sequence details on provider's web site)
- Sequence description (truncated if too long)
- Score of the highest scoring HSP alignment
- Probability, P(N), score
Note: Even the top sequence may not be biologically significant. Inspect the alignment to judge its significance (see How do I interpret the results of my search?).
- MGI genes and markers that the matching sequences are associated with:
- Gene/marker symbol
- Gene/marker name
- Chromosome gene/marker has been mapped to (if known).
From the alignment summary you can also:
- Download one or more of the matching sequences in FASTA format.
- Forward one or more matching sequences to MouseBLAST.
To do either:
- Select the sequences by clicking the check boxes next to the sequence identifiers.
- Select an operation:
- download in FASTA format
- forward to MouseBLAST
- Click Go.
Back to top
Pairwise Alignments
- For each matching sequence, one or more pairwise alignments appear. For each matching sequence, the following information is available:
- Sequence identifier with link to provider's web site
- Sequence description
- Length of matching sequence
- Data about MGI genes and markers that matching sequences are associated with:
- Gene/marker symbol
- Gene/marker name
- Chromosome gene/marker has been mapped to (if known).
- Functionality to:
- Download one or more of the matching sequences in FASTA format
- Forward one or more matching sequences to MouseBLAST.
To do either:
- Select the sequences by checking the check boxes by the sequence identifiers
- Select an operation:
- download in FASTA format
- forward to MouseBLAST.
- Click Go.
For each pairwise alignment for a matching sequence, the following appears. (Note: Plus strand alignments appear before minus strand alignments.)
- Score
- Expectation value
- Probability
- Level of identity between two sequences expressed as:
- Number of identical base pairs or amino acids in sequence alignment
- Length of alignment
- Percent identity.
- Level of similarity between two sequences expressed as:
- Number of similar base pairs or amino acids in sequence alignment
- Length of alignment
- Percent similarity.
- The actual alignment.
Pairwise alignments between the query and matching sequence appear in blocks called High-Scoring Pairs or ASP. ASPs determine how BLAST detects whether two sequences are similar to one another. Gapped BLAST algorithms, such as WU-BLAST 2.0, attempt to extend all HSPs so that they join, if possible. However, sometimes there are multiple HSPs for a given matching sequence; for example, the alignment of the genomic sequence to a transcript results in an HSP for each exon represented in the transcript.
Back to top
RepeatMasker Output
The RepeatMasker output section (appearing only when you run RepeatMasker) contains a table summarizing repeats detected in the query sequence. The following is an example of the table:
SW perc perc perc query position in query matching repeat position in repeat
score div. del. ins. sequence begin end (left) repeat class/family begin end (left) ID
225 20.0 0.0 0.0 M15131 1083 1127 (211) + (TTTA)n Simple_repeat 1 45 (0)
This table contains:
- the Smith-Waterman score of the alignment between the region of the query sequence and the repeat consensus sequence
- the percentage of substitutions in the matching region compared to the consensus
- the percentage of bases opposite a gap in the query sequence (deleted bp)
- the percentage of bases opposite a gap in the repeat consensus (inserted bp)
- the name of query sequence
- the starting position of the match in the query sequence
- the ending position of the match in the query sequence
- the number of bases in the query sequence past the ending position of the match
- the direction of match, either + or -, with respect to the consensus sequence in the database
- the name of the matching interspersed repeat
- the class of the repeat, in this case a simple repeat
- the starting position of the match in the database sequence (using top-strand numbering)
- the ending position of the match in the database sequence
- the number of bases in (the complement of) the repeat consensus sequence
prior to the beginning of the match (so 0 means that the match extended all the way to the end of the repeat consensus sequence).
Back to top
Repeat Masked Sequence
The Repeat Masked Sequence section appears only when you run RepeatMasker and contains the masked sequence in FASTA format. Any regions of the query sequence that contain repeats found in the sequence are replaced with N characters.
Back to top
How do I interpret the results of my search?
Most people judge the significance of a sequence alignment by two criteria: the length of the alignment and the percent of the identity. The questions to ask are:
- How much of the query sequence aligns to the subject (i.e., database) sequence?
- How much of the subject (i.e., database) sequence aligns to the query sequence?
To be sure that two sequences are similar to one another, you should do a bidirectional best hit comparison. In this comparison, the two sequences in question should be the best matches to one another in your database of sequences. For example, assume that you have two mouse GenBank sequences, Sequence A and Sequence B. You hypothesize that Sequence A is most similar to Sequence B. To support this hypothesis, you must first search Sequence A against all GenBank mouse sequences, and then search Sequence B against all GenBank mouse sequences. If Sequence A best matches Sequence B in the first alignment, and Sequence B best matches Sequence A it suggests that these sequences are most similar to one another among all GenBank mouse sequences.
Back to top
What can I do with a matching sequence?
You can:
- Download it in FASTA format.
- Forward it to MouseBLAST for subsequent analysis.
To do either:
- Click the check boxes to the left of the sequences you wish to download or forward.
- Select the proper action from the pull-down menu (download in FASTA format or forward to MouseBLAST)
- Click Go.
Back to top
Are there any sample searches?
See
Using MouseBLAST - Sample Queries.
Back to top