Release notes:
The files in this directory contain the sequence sets
described in the paper:
Burge, C. & Karlin, S. (1997) Prediction of complete gene
structures in human genomic DNA. J. Mol. Biol. 268, 78-94.
The file train-LIB.genbank contains the set of 380 genes
designated with a script L in the paper in GenBank flatfile
format. This set contains 238 multi-exon genes and 142
single-exon genes. The multi-exon genes in this set contain
a total of 1492 exons and 1254 introns.
The file train-coding-cDNA.genbank contains the set of 1619
cDNA sequences which, together with the 380 genes described
above, form the set designated with a script C in the paper.
Again, this file is in GenBank flatfile format. Note that
these sequences have been edited slightly from their original
GenBank formats by trimming away the 5' and 3' UTR portions,
leaving only the coding portion of the cDNA (ATG -> stop codon).
(the above comments were copied and slightly edited from
original README file by C. Burge)
Changelog:
|