Using MouseBLAST - Masking Programs
More Help

The help document answers the following questions:

Should I use a repeat masking program?

Repeat masking filters are especially useful when searching with genomic sequence. You can choose from among the following programs:

ProgramWhat it masks
RepeatMaskerLow-complexity and interspersed repeats in nucleotide sequences using species-specific repeat libraries 1
DUSTSimple repeats in nucleotide when searching nucleotide sequence databases with BLASTN2
SEGLow-complexity regions in protein sequences 3
XNUShort tandem repeats in protein sequences 4.

How does RepeatMasker work?

RepeatMasker filters out repetitive sequences using a set of scoring matrices specific to rodent, primate, other mammals, Drosophila, or Arabidopsis thaliana repeats. You can run RepeatMasker with one of these scoring matrices to mask repetitive sequences in your query sequence before running WU-BLAST. If RepeatMasker finds repetitive regions in the sequence, it replaces those nucleotide symbols with N.

See RepeatMasker Output for a table describing the location and identity of repeats that RepeatMasker finds.

For more information and documentation on RepeatMasker, see http://www.repeatmasker.org/.

What about low-complexity or short-periodicity internal repeats?

If a query sequence contains low-complexity or short-periodicity internal repeats, the search results may contain artificial or insignificant alignments. DUST, SEG and XNU filter out these regions of unusual composition.

References

  1. Smit, AFA and Green, P. RepeatMasker (1999) unpublished.
  2. Tatusov, RL and Lipman, DJ. DUST unpublished.
  3. Wootton, JC and Federhen, S. Statistics of local complexity in amino acid sequences and sequence databases. Computers in Chemistry (1993), 17, 149-163.
  4. Claverie, J-M and States, DJ. Information enhancement methods in large scale sequence analysis. Computers in Chemistry, (1993), 17, 191-201.

Top