This help document answers the following questions:
The alignment summary provides a listing of the:
From the alignment summary you can also:
For each pairwise alignment for a matching sequence, the following appears. (Note: Plus strand alignments appear before minus strand alignments.)
- For each matching sequence, one or more pairwise alignments appear. For each matching sequence, the following information is available:
- Sequence identifier with link to provider's web site
- Sequence description
- Length of matching sequence
- Data about MGI genes and markers that matching sequences are associated with:
- Gene/marker symbol
- Gene/marker name
- Chromosome gene/marker has been mapped to (if known).
- Functionality to:
To do either:
- Download one or more of the matching sequences in FASTA format
- Forward one or more matching sequences to MouseBLAST.
- Select the sequences by checking the check boxes by the sequence identifiers
- Select an operation:
- download in FASTA format
- forward to MouseBLAST.
- Click Go.
- Expectation value
- Level of identity between two sequences expressed as:
- Number of identical base pairs or amino acids in sequence alignment
- Length of alignment
- Percent identity.
- Level of similarity between two sequences expressed as:
- Number of similar base pairs or amino acids in sequence alignment
- Length of alignment
- Percent similarity.
- The actual alignment.
Pairwise alignments between the query and matching sequence appear in blocks called High-Scoring Pairs or ASP. ASPs determine how BLAST detects whether two sequences are similar to one another. Gapped BLAST algorithms, such as WU-BLAST 2.0, attempt to extend all HSPs so that they join, if possible. However, sometimes there are multiple HSPs for a given matching sequence; for example, the alignment of the genomic sequence to a transcript results in an HSP for each exon represented in the transcript.
The RepeatMasker output section (appearing only when you run RepeatMasker) contains a table summarizing repeats detected in the query sequence. The following is an example of the table:SW perc perc perc query position in query matching repeat position in repeat score div. del. ins. sequence begin end (left) repeat class/family begin end (left) ID 225 20.0 0.0 0.0 M15131 1083 1127 (211) + (TTTA)n Simple_repeat 1 45 (0)This table contains:
- the Smith-Waterman score of the alignment between the region of the query sequence and the repeat consensus sequence
- the percentage of substitutions in the matching region compared to the consensus
- the percentage of bases opposite a gap in the query sequence (deleted bp)
- the percentage of bases opposite a gap in the repeat consensus (inserted bp)
- the name of query sequence
- the starting position of the match in the query sequence
- the ending position of the match in the query sequence
- the number of bases in the query sequence past the ending position of the match
- the direction of match, either + or -, with respect to the consensus sequence in the database
- the name of the matching interspersed repeat
- the class of the repeat, in this case a simple repeat
- the starting position of the match in the database sequence (using top-strand numbering)
- the ending position of the match in the database sequence
- the number of bases in (the complement of) the repeat consensus sequence prior to the beginning of the match (so 0 means that the match extended all the way to the end of the repeat consensus sequence).
The Repeat Masked Sequence section appears only when you run RepeatMasker and contains the masked sequence in FASTA format. Any regions of the query sequence that contain repeats found in the sequence are replaced with N characters.
To be sure that two sequences are similar to one another, you should do a bidirectional best hit comparison. In this comparison, the two sequences in question should be the best matches to one another in your database of sequences.
As an example, assume that you have two GenBank mouse sequences, Sequence A and Sequence B. You hypothesize that Sequence A is most similar to Sequence B. To support this hypothesis, you must first search Sequence A against all GenBank mouse sequences, and then search Sequence B against all GenBank mouse sequences. If Sequence A best matches Sequence B in the first alignment, and Sequence B best matches Sequence A, it suggests that these sequences are most similar to one another among all GenBank mouse sequences.Top