Burge-1997 1.0 (tar/gz)
Release notes:
The files in this directory contain the sequence sets described in the paper: Burge, C. & Karlin, S. (1997) Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78-94. The file train-LIB.genbank contains the set of 380 genes designated with a script L in the paper in GenBank flatfile format. This set contains 238 multi-exon genes and 142 single-exon genes. The multi-exon genes in this set contain a total of 1492 exons and 1254 introns. The file train-coding-cDNA.genbank contains the set of 1619 cDNA sequences which, together with the 380 genes described above, form the set designated with a script C in the paper. Again, this file is in GenBank flatfile format. Note that these sequences have been edited slightly from their original GenBank formats by trimming away the 5' and 3' UTR portions, leaving only the coding portion of the cDNA (ATG -> stop codon). (the above comments were copied and slightly edited from original README file by C. Burge)
Changelog: