MGI 6.24Interpreting the MouseBLAST Output Report

This help document answers the following questions:

What does the MouseBLAST output report contain?
How do I interpret the results of my search?
What can I do with a matching sequence?
Are there any sample searches?
Using MouseBLAST - Overview

What does the MouseBLAST output report contain?

The MouseBLAST output report contains:

WU-BLAST output
- Alignment summary of matching sequences
- Pairwise alignments with scores, percent identity, alignment length, and P-value
Links to MGI gene/marker/allele records that the query and matching sequences are associated with. For these records, the following data appear:
- gene/marker/allele symbol
- gene/marker/allele name
- the chromosome that the gene/marker has been mapped to (if known)
RepeatMasker output - details of detected repeats, including their location.
Repeat masked sequence - masked regions are replaced for N characters.

Alignment summary of matching sequences

The alignment summary provides a listing of the:

Matching sequences in descending levels of similarity. For each matching sequence, the following data appear:
- Sequence accession ID (click to view sequence details on provider's web site)
- Sequence description (truncated if too long)
- Score of the highest scoring HSP alignment
- Probability, P(N), score
Note: Even the top sequence may not be biologically significant. Inspect the alignment to judge its significance (see How do I interpret the results of my search?).
MGI genes and markers that the matching sequences are associated with:
- Gene/marker symbol
- Gene/marker name
- Chromosome gene/marker has been mapped to (if known).

From the alignment summary you can also:

Download one or more of the matching sequences in FASTA format.
Forward one or more matching sequences to MouseBLAST.

To do either:

Select the sequences by clicking the check boxes next to the sequence identifiers.
Select an operation:
- download in FASTA format
- forward to MouseBLAST
Click Go.

Pairwise Alignments

For each matching sequence, one or more pairwise alignments appear. For each matching sequence, the following information is available:

Sequence identifier with link to provider's web site
Sequence description
Length of matching sequence
Data about MGI genes and markers that matching sequences are associated with:

Gene/marker symbol
Gene/marker name
Chromosome gene/marker has been mapped to (if known).

Functionality to:

Download one or more of the matching sequences in FASTA format
Forward one or more matching sequences to MouseBLAST.
To do either:

Select the sequences by checking the check boxes by the sequence identifiers
Select an operation:

download in FASTA format
forward to MouseBLAST.

Click Go.

For each pairwise alignment for a matching sequence, the following appears. (Note: Plus strand alignments appear before minus strand alignments.)

Score
Expectation value
Probability
Level of identity between two sequences expressed as:

Number of identical base pairs or amino acids in sequence alignment
Length of alignment
Percent identity.

Level of similarity between two sequences expressed as:

Number of similar base pairs or amino acids in sequence alignment
Length of alignment
Percent similarity.

The actual alignment.

Pairwise alignments between the query and matching sequence appear in blocks called High-Scoring Pairs or ASP. ASPs determine how BLAST detects whether two sequences are similar to one another. Gapped BLAST algorithms, such as WU-BLAST 2.0, attempt to extend all HSPs so that they join, if possible. However, sometimes there are multiple HSPs for a given matching sequence; for example, the alignment of the genomic sequence to a transcript results in an HSP for each exon represented in the transcript.

RepeatMasker Output

The RepeatMasker output section (appearing only when you run RepeatMasker) contains a table summarizing repeats detected in the query sequence. The following is an example of the table:
 SW     perc perc perc  query     position in query    matching repeat        position in  repeat
 score  div. del. ins.  sequence  begin  end (left)    repeat   class/family  begin  end (left)  ID
 
  225   20.0  0.0  0.0  M15131    1083   1127 (211) +  (TTTA)n  Simple_repeat    1   45    (0)      
This table contains:

the Smith-Waterman score of the alignment between the region of the query sequence and the repeat consensus sequence

the percentage of substitutions in the matching region compared to the consensus

the percentage of bases opposite a gap in the query sequence (deleted bp)
the percentage of bases opposite a gap in the repeat consensus (inserted bp)

the name of query sequence
the starting position of the match in the query sequence
the ending position of the match in the query sequence
the number of bases in the query sequence past the ending position of the match
the direction of match, either + or -, with respect to the consensus sequence in the database
the name of the matching interspersed repeat
the class of the repeat, in this case a simple repeat
the starting position of the match in the database sequence (using top-strand numbering)
the ending position of the match in the database sequence
the number of bases in (the complement of) the repeat consensus sequence prior to the beginning of the match (so 0 means that the match extended all the way to the end of the repeat consensus sequence).

Repeat Masked Sequence

The Repeat Masked Sequence section appears only when you run RepeatMasker and contains the masked sequence in FASTA format. Any regions of the query sequence that contain repeats found in the sequence are replaced with N characters.

Top

How do I interpret the results of my search?

Most people judge the significance of a sequence alignment by two criteria: the length of the alignment and the percent of the identity. The questions to ask are:

How much of the query sequence aligns to the subject (i.e., database) sequence?
How much of the subject (i.e., database) sequence aligns to the query sequence?

To be sure that two sequences are similar to one another, you should do a bidirectional best hit comparison. In this comparison, the two sequences in question should be the best matches to one another in your database of sequences.

As an example, assume that you have two GenBank mouse sequences, Sequence A and Sequence B. You hypothesize that Sequence A is most similar to Sequence B. To support this hypothesis, you must first search Sequence A against all GenBank mouse sequences, and then search Sequence B against all GenBank mouse sequences. If Sequence A best matches Sequence B in the first alignment, and Sequence B best matches Sequence A, it suggests that these sequences are most similar to one another among all GenBank mouse sequences.

Top

What can I do with a matching sequence?

You can:

Download it in FASTA format.
Forward it to MouseBLAST for subsequent analysis.

To do either:

Click the check boxes to the left of the sequences you wish to download or forward.
Select the proper action from the pull-down menu (download in FASTA format or forward to MouseBLAST)
Click Go.

Top

Are there any sample searches?

See Using MouseBLAST - Sample Queries.

Top