This directory contains the JGI v4.1 assembly of the Xenopus tropicalis
genome (xenTro2, Aug. 2005) from the DOE Joint Genome Institute (JGI).

Files included in this directory:

xenTro2.2bit - contains the complete X. tropicalis/xenTro2 genome sequence
    in the 2bit file format.  Repeats from RepeatMasker and Tandem Repeats
    Finder (with period of 12 or less) are shown in lower case; non-repeating
    sequence is shown in upper case.  The utility program, twoBitToFa (available
    from the kent src tree), can be used to extract .fa file(s) from
    this file.  A pre-compiled version of the command line tool can be
    found at:
    See also:

xenTro2.fa.gz - scaffold FASTA, with repetitive sequences identified by
    RepeatMasker and TRF (maxPeriod=12) masked to lower case.

xenTro2.hardmasked.fa.gz - scaffold FASTA, with repetitive sequences 
    masked to N.

xenTro2.rmsk.out.gz - RepeatMasker output combined from two runs: one run 
    with the default -species "xenopus tropicalis" libraries for version 
    open-3-1-5 (March 20, 2006, lib releast 20060315), and another run with
    as the library (-lib instead of -species).  

xenTro2.trf.bed.gz - TRF (Tandem Repeats Finder) output, translated to 
    UCSC's BED format.  The simple repeats in this file have not been 
    filtered to retain only maxPeriod=12 -- a maxPeriod=12 subset was 
    extracted from this set to use in our masking.

md5sum.txt - MD5 checksum of these files to verify correct transmission.

upstream1000.fa.gz - Sequences 1000 bases upstream of annotated
    transcription starts for MGC Genes with annotated 5' UTRs.  
    This file is updated weekly so it could be slightly out
    of sync with the MGC Gene data which is updated daily for most 

upstream2000.fa.gz - Same as upstream1000, but 2000 bases.

upstream5000.fa.gz - Same as upstream1000, but 5000 bases.

xenTro2.chrom.sizes - Two-column tab-separated text file containing assembly
    sequence names and sizes.

If you plan to download a large file or multiple files from this 
directory, we recommend you use ftp rather than downloading the files 
via our website. To do so, ftp to, then go to 
the directory goldenPath/xenTro2/bigZips. To download multiple files, 
use the "mget" command:

    mget <filename1> <filename2> ...
    - or -
    mget -a (to download all the files in the directory)

Preliminary drafts of the X. tropicalis sequence are made freely available
before scientific publication by the JGI and the X. tropicalis Genome 
Consortium, with the following understanding:

1. The data may be freely downloaded, used in analyses, and repackaged 
   in databases. 
2. Users are free to use the data in scientific papers analyzing 
   particular genes and regions if the provider of this data 
   (DOE Joint Genome Institute) is properly acknowledged. 
3. Additional shotgun sequencing is ongoing, and future assembly 
   releases will be made in a timely fashion. We expect to publish an 
   initial analysis of a high quality draft X. tropicalis genome sequence 
   in 2005 (with submission targeted for the spring of 2005) which will 
   include descriptions of the large scale organization of the frog 
   genome as well as genome-scale comparisons of the frog sequence and 
   gene set with those of other animals. Others who would like to 
   coordinate other genome-wide analysis with this work should contact
   Paul Richardson (, JGI. We welcome a coordinated 
   approach to describing this community resource. 
4. Any redistribution of the data should carry this notice. 

