Using MouseBLAST - Sequence Formats
More Help

This help document answers the following questions about sequence formats:

Must I use a particular format when entering sequence on the MGI MouseBLAST query form?

You may enter your sequence in FASTA, GCG, or free-text format. MouseBLAST converts the data into FASTA format for input to WU-BLAST.

What is FASTA format?

FASTA format, also called "Pearson format" after its author Will Pearson, is the format that the WU-BLAST and NCBI BLAST programs use. It is called FASTA format since is was first used by the FASTA alignment program.

What does a sequence in FASTA format look like?

A sequence in FASTA format starts with a one-line description that begins with a ">" character. The description line usually has a sequence identifier immediately following the ">" character, and then the sequence description.

For example:

>gi|309398 interleukin 1-beta (AAA39276.1)
MATVPELNCEMPPFDSDENDLFFEVDGPQKMKGCFQTFDLGCPDESIQLQISQQHINKSFRQAVSLIVAV
EKLWQLPVSFPWTFQDEDMSTFFSFIFEEEPILCDSWDDDDNLLVCDVPIRQLHYRLRDEQQKSLVLSDP
YELKALHLNGQNINQQVIFSMSFVQGEPSNDKIPVALGLKGKNLYLSCVMKDGTPTLQLESVDPKQYPKK
KMEKRFVFNKIEVKSKVEFESAEFPNWYISTSQAEHKPVFLGNNSGQDIIDFTMESVSS

Another example ( a partial gene trap sequence):

>Aqr cDNA sequence
CCGGGACTTCTCAGGTTCCGGGAGGAACCGTGGGAGCCCAGCGTGTCTGTGCCGGGCGCT
CTGGCACCTCGGGGCTGGGTCGCCGCCGCTGCTTGAAAGCTTTCGAGAGTCGCCGCTACG
AGCCTCTGGTCAGCTTCAGTGGCGATCGCTGCGATGGCGGCTCCTGCGCAGCCCAAGAAA
ATCGTGGCCCCCACGGTGTCCCAGATCAACGCGGAGTTCGTCACTCAGCTAGCATGTAAA
TACTGGGCTCCTCATATCAAGAAGAAATCACCGTTTGATATAAAAGTAATTGAAGAAATA
TATGAAAAAGAGATCGTCAAATCACGGTTTGCCATCAGGAAAATAATGCTGCTGGAATTT
...

What is Genetics Computer Group (GCG) format?

Accelrys GCG, formerly known as the GCG Wisconsin Package, is a set of sequence analysis programs that use the GCG sequence format. This format was developed by the Genetics Computer Group.

Top

What does a sequence in GCG format look like?

GCG format starts with a description followed by a ".." separator and the sequence as shown in the following example.

!!NA_SEQUENCE 1.0
gi|198293|gb|M15131.1|MUSIL1BA Mouse interleukin 1-beta (IL-1-beta) mRNA,
gi-198293.seq  Length: 1339  November 8, 1999 13:29  Type: N  Check: 1315  ..
       1  TGCAGGGTTC GAGGCCTAAT AGGCTCATCT GGGATCCTCT CCAGCCAAGC 
      51  TTCCTTGTGC AAGTGTCTGA AGCAGCTATG GCAACTGTTC CTGAACTCAA 
     101  CTGTGAAATG CCACCTTTTG ACAGTGATGA GAATGACCTG TTCTTTGAAG 
     151  TTGACGGACC CCAAAAGATG AAGGGCTGCT TCCAAACCTT TGACCTGGGC 
     201  TGTCCAGATG AGAGCATCCA GCTTCAAATC TCACAGCAGC ACATCAACAA 
     251  GAGCTTCAGG CAGGCAGTAT CACTCATTGT GGCTGTGGAG AAGCTGTGGC 
     301  AGCTACCTGT GTCTTTCCCG TGGACCTTCC AGGATGAGGA CATGAGCACC 
     351  TTCTTTTCCT TCATCTTTGA AGAAGAGCCC ATCCTCTGTG ACTCATGGGA 
     401  TGATGATGAT AACCTGCTGG TGTGTGACGT TCCCATTAGA CAGCTGCACT 
     451  ACAGGCTCCG AGATGAACAA CAAAAAAGCC TCGTGCTGTC GGACCCATAT 
     501  GAGCTGAAAG CTCTCCACCT CAATGGACAG AATATCAACC AACAAGTGAT 
     551  ATTCTCCATG AGCTTTGTAC AAGGAGAACC AAGCAACGAC AAAATACCTG 
     601  TGGCCTTGGG CCTCAAAGGA AAGAATCTAT ACCTGTCCTG TGTAATGAAA 
     651  GACGGCACAC CCACCCTGCA GCTGGAGAGT GTGGATCCCA AGCAATACCC 
     701  AAAGAAGAAG ATGGAAAAGC GGTTTGTCTT CAACAAGATA GAAGTCAAGA 
     751  GCAAAGTGGA GTTTGAGTCT GCAGAGTTCC CCAACTGGTA CATCAGCACC 
     801  TCACAAGCAG AGCACAAGCC TGTCTTCCTG GGAAACAACA GTGGTCAGGA 
     851  CATAATTGAC TTCACCATGG AATCTGTGTC TTCCTAAAGT ATGGGCTGGA 
     901  CTGTTTCTAA TGCCTTCCCC AGGGCATGTG AAGGAGCTCC CTTGTCATGA 
     951  ATGAGCAGAC AGCTCAATCT CTAGGACACT CCTTAGTCCT CGGCCAAGAC 
    1001  AGGTCGCTCA GGGTCACAAG AAACCATGGC ACATTCTGTT CAAAGAGAGC 
    1051  CTGTGTTTCC TCCTTGCCTC TGATGGGCAA CCACTTACCT ATTTATTTAT 
    1101  GTATTTATTG ATTGGTTGAT CTATTTAAGT TGATTCAAGG GGACATTAGG 
    1151  CAGCACTCTC TAGAACAGAA CCTAGCTGTC AACGTGTGGG GGATGAATTG 
    1201  GTCATAGCCT TGCACTTGAG GTCTTTCATT GAAGCTGAGA ATAAATAGGT 
    1251  TCCTATAATA TGGATGAGAA TTTTTATGAA TGAAGCATTA GCACATTGCT 
    1301  TTGATGAGTA TGAAATAAAT TTCATTAAAC AAACAAACA

Top

What is free-text format?

This format provides a string of characters representing the sequence without description or annotation.

What does a sequence in free-text format look like?

The following is an example of sequence in free-text format.

MATVPELNCEMPPFDSDENDLFFE
VDGPQKMKGCFQTFDLGCPDESIQLQISQQHINKSFRQAVSLIVAV
EKLWQLPVSFPWTFQDEDMSTFFSFIFEEEP
ILCDSWDDDDNLLVCDVPIRQLHYRLRDEQQKSLVLSDP
           YELKALHLNGQNINQQVIFSMSFV
QGEPSNDKIPVALGLKGKNLYLSCVMKDGTPTLQLESVDPKQYPKK
KMEKRFVFNKIEVKSKVEFESAEFPNWYISTSQAEHKPVFLGNNSGQDIIDFTMESVSS

Top