This help document answers the following questions about sequence formats:
You may enter your sequence in FASTA, GCG, or free-text format. MouseBLAST converts the data into FASTA format for input to WU-BLAST.
FASTA format, also called "Pearson format" after its author Will Pearson, is the format that the WU-BLAST and NCBI BLAST programs use. It is called FASTA format since is was first used by the FASTA alignment program.
A sequence in FASTA format starts with a one-line description that begins with a ">" character. The description line usually has a sequence identifier immediately following the ">" character, and then the sequence description.
For example:
>gi|309398 interleukin 1-beta (AAA39276.1) MATVPELNCEMPPFDSDENDLFFEVDGPQKMKGCFQTFDLGCPDESIQLQISQQHINKSFRQAVSLIVAV EKLWQLPVSFPWTFQDEDMSTFFSFIFEEEPILCDSWDDDDNLLVCDVPIRQLHYRLRDEQQKSLVLSDP YELKALHLNGQNINQQVIFSMSFVQGEPSNDKIPVALGLKGKNLYLSCVMKDGTPTLQLESVDPKQYPKK KMEKRFVFNKIEVKSKVEFESAEFPNWYISTSQAEHKPVFLGNNSGQDIIDFTMESVSS
Another example ( a partial gene trap sequence):
>Aqr cDNA sequence CCGGGACTTCTCAGGTTCCGGGAGGAACCGTGGGAGCCCAGCGTGTCTGTGCCGGGCGCT CTGGCACCTCGGGGCTGGGTCGCCGCCGCTGCTTGAAAGCTTTCGAGAGTCGCCGCTACG AGCCTCTGGTCAGCTTCAGTGGCGATCGCTGCGATGGCGGCTCCTGCGCAGCCCAAGAAA ATCGTGGCCCCCACGGTGTCCCAGATCAACGCGGAGTTCGTCACTCAGCTAGCATGTAAA TACTGGGCTCCTCATATCAAGAAGAAATCACCGTTTGATATAAAAGTAATTGAAGAAATA TATGAAAAAGAGATCGTCAAATCACGGTTTGCCATCAGGAAAATAATGCTGCTGGAATTT ...
GCG format starts with a description followed by a ".." separator and the sequence as shown in the following example.
!!NA_SEQUENCE 1.0
gi|198293|gb|M15131.1|MUSIL1BA Mouse interleukin 1-beta (IL-1-beta) mRNA,
gi-198293.seq Length: 1339 November 8, 1999 13:29 Type: N Check: 1315 ..
1 TGCAGGGTTC GAGGCCTAAT AGGCTCATCT GGGATCCTCT CCAGCCAAGC
51 TTCCTTGTGC AAGTGTCTGA AGCAGCTATG GCAACTGTTC CTGAACTCAA
101 CTGTGAAATG CCACCTTTTG ACAGTGATGA GAATGACCTG TTCTTTGAAG
151 TTGACGGACC CCAAAAGATG AAGGGCTGCT TCCAAACCTT TGACCTGGGC
201 TGTCCAGATG AGAGCATCCA GCTTCAAATC TCACAGCAGC ACATCAACAA
251 GAGCTTCAGG CAGGCAGTAT CACTCATTGT GGCTGTGGAG AAGCTGTGGC
301 AGCTACCTGT GTCTTTCCCG TGGACCTTCC AGGATGAGGA CATGAGCACC
351 TTCTTTTCCT TCATCTTTGA AGAAGAGCCC ATCCTCTGTG ACTCATGGGA
401 TGATGATGAT AACCTGCTGG TGTGTGACGT TCCCATTAGA CAGCTGCACT
451 ACAGGCTCCG AGATGAACAA CAAAAAAGCC TCGTGCTGTC GGACCCATAT
501 GAGCTGAAAG CTCTCCACCT CAATGGACAG AATATCAACC AACAAGTGAT
551 ATTCTCCATG AGCTTTGTAC AAGGAGAACC AAGCAACGAC AAAATACCTG
601 TGGCCTTGGG CCTCAAAGGA AAGAATCTAT ACCTGTCCTG TGTAATGAAA
651 GACGGCACAC CCACCCTGCA GCTGGAGAGT GTGGATCCCA AGCAATACCC
701 AAAGAAGAAG ATGGAAAAGC GGTTTGTCTT CAACAAGATA GAAGTCAAGA
751 GCAAAGTGGA GTTTGAGTCT GCAGAGTTCC CCAACTGGTA CATCAGCACC
801 TCACAAGCAG AGCACAAGCC TGTCTTCCTG GGAAACAACA GTGGTCAGGA
851 CATAATTGAC TTCACCATGG AATCTGTGTC TTCCTAAAGT ATGGGCTGGA
901 CTGTTTCTAA TGCCTTCCCC AGGGCATGTG AAGGAGCTCC CTTGTCATGA
951 ATGAGCAGAC AGCTCAATCT CTAGGACACT CCTTAGTCCT CGGCCAAGAC
1001 AGGTCGCTCA GGGTCACAAG AAACCATGGC ACATTCTGTT CAAAGAGAGC
1051 CTGTGTTTCC TCCTTGCCTC TGATGGGCAA CCACTTACCT ATTTATTTAT
1101 GTATTTATTG ATTGGTTGAT CTATTTAAGT TGATTCAAGG GGACATTAGG
1151 CAGCACTCTC TAGAACAGAA CCTAGCTGTC AACGTGTGGG GGATGAATTG
1201 GTCATAGCCT TGCACTTGAG GTCTTTCATT GAAGCTGAGA ATAAATAGGT
1251 TCCTATAATA TGGATGAGAA TTTTTATGAA TGAAGCATTA GCACATTGCT
1301 TTGATGAGTA TGAAATAAAT TTCATTAAAC AAACAAACA
This format provides a string of characters representing the sequence without description or annotation.
The following is an example of sequence in free-text format.
MATVPELNCEMPPFDSDENDLFFE
VDGPQKMKGCFQTFDLGCPDESIQLQISQQHINKSFRQAVSLIVAV
EKLWQLPVSFPWTFQDEDMSTFFSFIFEEEP
ILCDSWDDDDNLLVCDVPIRQLHYRLRDEQQKSLVLSDP
YELKALHLNGQNINQQVIFSMSFV
QGEPSNDKIPVALGLKGKNLYLSCVMKDGTPTLQLESVDPKQYPKK
KMEKRFVFNKIEVKSKVEFESAEFPNWYISTSQAEHKPVFLGNNSGQDIIDFTMESVSS